Senior AI Engineer
Job Description
This role emphasizes engineering excellence: API/service design, testing, observability, release governance, and cost/latency optimization for LLM systems.\n\nThe ideal candidate thrives in a fast-paced, agile environment and is passionate about leveraging data to solve real-world healthcare challenges.\n\nResponsibilities\n1) NLP/LLM Solution Architecture & Product Delivery\nTranslate business workflows into NLP/LLM solution designs (RAG, classification, extraction, summarization, routing/triage, agents).\nDefine what data is needed (first-/third-party, events, text, image, claims/transactions, IoT), data quality thresholds, and labelling strategy.\nDefine north-star metrics (online and offline) and decision boundaries; craft counterfactuals and baselines (e.g., business-as-usual) to quantify impact. Connect model metrics to business outcomes.\nOwn end-to-end delivery: design → build → test → deploy → monitor → iterate.\nDefine system requirements including SLAs/SLOs, latency budgets, accuracy targets, cost ceilings, and safety constraints.\nWrite and maintain AI System Design Specs (problem statement, users, decision loop, constraints, risk posture, evaluation plan, rollout strategy, and guardrails).\n\n2) LLM/NLP Development (Hands-on Build)\nBuild RAG pipelines: corpus ingestion, chunking strategies, embedding selection, indexing, retrieval/reranking, grounding, citations, and fallback strategies.\nDevelop prompt/tool schemas and agent designs: function calling, tool routing, memory patterns, and multi-step workflows.\nApply modern NLP methods where appropriate: token classification, sequence labeling, semantic similarity, topic modeling, and hybrid IR (BM25 + dense retrieval).\nEnsure correctness through unit/integration tests, robust error handling, and deterministic behavior where needed.\nAI/ ML Accelerators development:\nBuild and maintain reusable ML accelerators (Cookiecutter, Feature Engineering Toolkit, AutoML , Unified Evaluation Harness, Observability Blueprints, Responsible AI Pack etc) that standardize feature engineering, model training, and evaluation across tasks.\n\n3) Evaluation, Quality, and Reliability (LLMOps)\nBuild and maintain evaluation harnesses:\nOffline test sets, golden datasets, and regression suites\nLLM-as-judge where appropriate (with controls)\nHuman-in-the-loop review loops for high-risk workflows\nDefine and track quality metrics: groundedness, faithfulness, toxicity/safety, extraction accuracy, retrieval precision/recall, and task success rates.\nImplement guardrails: policy filters, PHI/PII handling, prompt injection defenses, output constraints, and safe-completion behaviors.\n\n4) Production Engineering, MLOps & Observability\nProductionize services using containerization and orchestration (e.g., Docker, Kubernetes) and CI/CD pipelines.\nImplement observability: structured logging, traces, prompt/version tracking, vector DB metrics, and cost monitoring.\nMonitor performance and drift signals; define retraining/re-indexing/re-prompting strategies and release governance.\nOptimize for performance and cost: caching, batching, streaming, quantization where relevant, and efficient retrieval.\n\n5) Collaboration & Stakeholder Engagement\nPartner with cross-functional teams—including actuaries, clinicians, engineers, and product managers—to align technical solutions with strategic objectives.\nFacilitate technical workshops and presentations to ensure clarity and buy-in across diverse audiences.\nAct as a subject matter expert on analytics, data science methodologies and best practices.\n\n6) Governance & Compliance\nEnsure adherence to data privacy regulations and implement security best practices across all data science workflows.\nAdvocate for responsible AI by incorporating fairness, explainability, and bias detection into model development.\nMaintain comprehensive audit trails and documentation for regulatory compliance and internal governance.\n\nCandidate Profile\nExperience and Qualifications\nBachelor’s or Master’s degree in Computer Science, Engineering, Machine Learning, NLP, or related field.\n~8–10 years of industry experience building production systems (with at least 2–3 years in NLP/LLM or applied ML engineering).\n\nTechnical Expertise\nProgramming & Data Foundations\nStrong proficiency in Python / Pyspark (data wrangling, EDA, modeling) and SQL for working with large, complex datasets; advanced Excel for analysis and validation.\nReproducible analytic workflows (modular code, notebooks, documentation) and robust data handling across heterogeneous sources.\n\nAnalytical Rigor & Problem Solving\nExperience in defining evaluation taxonomies and acceptance criteria across initiatives; balances statistical and operational risk.\nExperience in codifing analytical playbooks and institutionalizes measurement frameworks across products/teams. Arbitrates trade-offs (accuracy, fairness, latency, interpretability) for high impact decisions.\n\nCore AI & Generative AI Expertise\nFramework Mastery : Deep proficiency in Python and industry-standard machine learning frameworks such as PyTorch, Hugging Face, or TensorFlow.\nAdvanced Architecture : Strong knowledge of neural network patterns, specifically Transformer architectures, Large Language Models (LLMs), and Small Language Models (SLMs).\nAgentic AI & Orchestration : Experience architecting multi-agent systems and expert routing using frameworks like LangChain, LangGraph, LlamaIndex, or CrewAI.\nRAG & Vector Data : Hands-on experience optimizing Retrieval-Augmented Generation (RAG) pipelines using vector databases such as Pinecone, Milvus, or Weaviate.\nModel Optimization .
Expertise in fine-tuning, prompt engineering, hyperparameter tuning, and context-chaining techniques\n\nSoftware Engineering & MLOps Infrastructure\nProduction Engineering : Solid software development fundamentals, including clean architecture, version control (Git), writing automated unit/integration tests, and CI/CD pipelines.\nCloud & Containerization : Experience hosting and scaling models on major cloud infrastructure platforms like AWS, GCP, or Azure using Docker and Kubernetes.\nLLMOps & Observability : Utilization of specialized monitoring tools (e.g., Langfuse, Weights & Biases, PromptLayer) to track model evaluation, latency, drifts, and token spend optimizations.\nData Pipelines : Familiarity with structuring knowledge graphs, processing multi-modal data streams, and querying database engines.\n\nCloud & Data Platforms (Microsoft Azure)\nExperience with Azure Databricks, Data bricks, for scalable data processing, model training, and orchestration\n\nGovernance, Privacy & Responsible AI\nKnowledge of data privacy/security best practices across workflows.\nKnowledge of applying Responsible AI principles into model building, comprehensive documentation and audit trails for compliance experience.\nExperience in establishes documentation guidelines and review checkpoints\n\nGenAI-first & Vibe Coding\nExperience in GenAI vibe-coding workflow by default (generate–refine–test–document), while maintaining code quality, reviews, and reproducibility.\nExperience in using Agentic AI/ GenAI tools to draft design specs, model cards, experiment summaries, runbooks, and to automate repetitive analysis/engineering tasks to drive measurable efficiency and productivity gains.\n\nCompetencies & Core Characteristics:\nWe are seeking professionals who embodies the following competencies and characteristics essential for success in our scale-up environment:\nTechnical Domain Expertise (Modelling)\nAnalytical Rigor & Problem Solving\nUnifier & Cross-Functional Influencer\nAdaptable & Resilient Operator\nCuriosity & Innovation\nResponsible & Governed AI