Posted 18 June, 2026
AI/ML Engineer (Voice Models, Cloning, TTS, STT, ASR)
Client of Prasha Consultancy Services Private Limited
Bhavnagar, GJ, IN
Full Time
Reference: ea1506e06913bf78
Job Description
Immediate or early Joiners preferred.\n\nA US Based IT MNC is looking for a seasoned AI/ML Engineer with hands-on experience in building and optimizing voice models, for one its Reputed client in Enterprise class voice solution domain. Candidate will be working on developing, training, and refining AI models for voice synthesis, voice cloning, speech recognition, and/or voice transformation.\n\nWork Mode: Remote\n\nAn ideal candidate would be someone who has:\n\nDeveloped and optimized text-to-speech models that achieved human-like voice synthesis, maintaining the unique style of voice actors across multiple languages.\nImplemented real-time processing solutions that reduced inference time to under 1 second, enhancing user interaction and experience.\nManaged large-scale datasets for voice cloning projects, ensuring high performance and reliability while supporting multilingual transcriptions.\n\nKey Responsibilities\nDesign, develop, and fine-tune deep learning models for voice synthesis (e.g., TTS, voice cloning).\nImplement and optimize neural network architectures such as Tacotron, FastSpeech, WaveNet, or similar.\nCollect, preprocess, and augment speech datasets.\nCollaborate with product and engineering teams to integrate voice models into production systems.\nPerform evaluation and quality assurance of voice model outputs.\nResearch and stay current on advancements in speech processing, audio generation, and machine learning.\n\nRequired Qualifications\nBachelor’s or Master’s degree in Computer Science, Electrical Engineering, or related field.\nStrong experience with Python and machine learning libraries (e.g., PyTorch, TensorFlow).\nHands-on experience with speech/audio processing and relevant toolkits (e.g., Librosa, ESPnet, Kaldi).\nFamiliarity with voice model architectures (TTS, ASR, vocoders).\nUnderstanding of deep learning concepts and model training processes.\n\nPreferred Qualifications\nExperience with deploying models to real-time applications or mobile devices.\nKnowledge of data labeling, voice dataset creation, and noise handling techniques.\nExperience with cloud-based AI/ML infrastructure (e.g., AWS, GCP).\nContributions to open-source projects or published papers in speech/voice-related domains.