Skip to main content
Posted 16 June, 2026

Al Engineer

HabileLabs
Jaipur, RJ, IN Full Time
Reference: 35c93cef4f1032d1

Job Description

AI Engineer — Voice & Language

4+ Years · Full-Time· Jaipur ·Competitive Salary


About the Role

We're hiring a senior AI Engineer to design, build, and ship production AI systems — with strong emphasis on Voice AI. You'll own the full lifecycle: architecture, training, deployment, and monitoring across language and voice modalities.


What You'll Do

  • LLM & GenAI: Fine-tune and deploy LLMs; build RAG pipelines and agentic workflows (LangChain, LlamaIndex).
  • Voice Pipelines: Architect real-time ASR → LLM → TTS pipelines with <300 ms latency
  • Voice Agents: Build production voice agents with turn-taking, barge-in handling, and emotion-aware dialogue.
  • Speech Fine-Tuning: Adapt ASR/TTS models for domain-specific accents, terminology, and speaking styles.
  • MLOps: Build reproducible ML pipelines (Kubeflow / MLflow); maintain CI/CD, monitoring, and model versioning.
  • Inference Optimization: Apply quantization (GGUF, GPTQ), distillation, and hardware-aware inference (TensorRT, vLLM) to cut cost and latency.
  • APIs & Services: Ship high-performance inference APIs in Python (FastAPI) or Go on Kubernetes.
  • Data & Evaluation: Curate text + speech corpora; define eval harnesses covering WER, MOS, latency P95, and safety.


Requirements


Must-Have

  • 4+ yrs ML/software engineering; 2+ yrs on production AI systems
  • Strong Python; PyTorch or TensorFlow
  • LLM fine-tuning: LoRA / QLoRA / PEFT
  • End-to-end ML pipeline experience (train → serve)
  • Cloud (AWS / GCP / Azure) + Docker / Kubernetes
  • ASR & TTS integration in real-time streaming systems
  • VAD, noise suppression, and barge-in handling
  • Telephony APIs (Twilio, Vonage) or WebRTC experience


Nice-to-Have

  • Whisper / wav2vec fine-tuning for domain adaptation
  • Audio-language models (AudioPaLM, Qwen-Audio, Gemini Audio)
  • Speaker diarization (pyannote.audio) or voice biometrics
  • Prosody control, SSML, expressive TTS synthesis
  • Multilingual ASR/TTS and code-switching pipelines
  • RLHF / Constitutional AI alignment
  • Vector DBs (Pinecone, Weaviate, pgvector)
  • Open-source contributions or published research


Tech Stack


Core

Python

PyTorch

FastAPI / Go

Kubernetes

MLflow


LLM & GenAI

OpenAI / HuggingFace

LangChain

LlamaIndex

vLLM

RAG / Agents


️ Voice AI

STT

TTS

WebRTC / WebSockets

pyannote.audio

Twilio / Vonage


️ Audio Processing

librosa / FFmpeg

Silero VAD

openWakeWord

SSML / Prosody

AEC / Noise Suppression


Sign up for Job Alerts