Skip to main content
Posted 22 May, 2026

Azure Cloud/DevOps Specialist Consultant Bangalore

KPMG
Hyderabad,Telangana,IN,500034 Full Time
Reference: 218_549848_30040739

Mandatory Skills

  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field, with 3-7 years of hands-on experience in cloud, DevOps, MLOps, LLMOps, AgentOps

  • Proven experience as a DevOps Engineer, MLOps Engineer, Cloud Architect, or Software Engineer in an agile, cloud-native environment.

  • Hands-on experience implementing endtoend MLOps pipelines, including data ingestion, model training, model registry, CI/CD for ML, and feature stores.

  • Experience working with frameworks/tools like MLflow, Kubeflow, Feast, Airflow, Seldon, BentoML, or similar.

  • Knowledge of model monitoring, drift detection, model governance, and ML observability.

  • Experience deploying, monitoring, and optimizing LLMs and generative AI workflows, including prompt workflows, vector stores, evaluation pipelines, and safety guardrails.

  • Familiarity with LLMOps best practices, including model versioning, evaluation, deployment automation, and RAG-based systems.

  • Understanding of AI agents, orchestration frameworks, and agent workflows using tools like LangChain, Semantic Kernel, AutoGen, or similar.

  • Experience building taskoriented or multiagent systems, integrating LLMs with enterprise systems, APIs, and cloud-native services.

  • Ability to design scalable AI agent architectures with robust error handling, grounding, and monitoring

  • Hands-on experience with cloud platforms (AWS, Azure, GCP) - AI/ML services experience preferred.

  • Proficient in infrastructure automation tools such as Terraform, Helm Charts, Ansible, etc.

  • Strong understanding of DevOps practices/tools, including CI/CD, Pipelines, GitHub Actions, and Infrastructure as Code.

  • Proficient in Python, including automation scripting, data processing, and integration with cloud/ML workflows.

  • Basic scripting knowledge in Python, able to write automation scripts in Python and Groovy DSL.

  • Experienced in managing the delivery of high-availability infrastructure to support mission-critical systems and services.

  • Experience with source version control tools like Azure Repos and GitHub.

  • Experience with containerization (Kubernetes, Docker) and orchestration.

  • Strong knowledge of networking concepts (TCP/IP, VNet/Subnet, DNS, Load Balancers, Application Gateway, etc.) and security practices (VPN, Firewalls, encryption).

  • Familiarity with databases and caching mechanisms used in our stack, such as PostgreSQL and Redis.

  • Experience in handling complex IT infrastructure solution design and implementation.

  • Excellent communication and interpersonal skills, effective problem-solving skills, logical thinking ability, and strong commitment to professional and client service excellence.

  • Excellent teamwork skills and the ability to direct efforts of cross-functional teams for a collaborative proposition. - Strong synthesis and analytical skills.

  • Experience working in a consulting environment is an added advantage.

  • Bachelor's degree in Computer Science, Engineering.
  • 3+ years of experience in Cloud, DevOps and MLOps role
  • Excellent problem-solving skills

Primary Roles and Responsibilities

  • Architect, design, and govern scalable, secure, and highly available cloud infrastructure across Azure, GCP, and AWS to support modern application stacks (Django, Python, Node.js, React.js).

  • Design and implement enterprise-grade MLOps pipelines, covering feature engineering, training workflows, model registry, CI/CD for ML, and automated deployment.

  • Architect and operationalize LLM and generative AI platforms, including RAG pipelines, vector databases, evaluation frameworks, and safety guardrails.

  • Build internal AI agent frameworks using LangChain, Semantic Kernel, AutoGen, or custom orchestration engines.

  • Oversee model monitoring, drift detection, model auditing, compliance, and ML observability tools (MLflow, Kubeflow, Feast, BentoML, etc.).

  • Partner with data science and product teams to accelerate ML lifecycle maturity and improve model delivery velocity.

  • Lead endtoend cloud transformation initiatives, ensuring alignment with organizational strategy, security, and costoptimization goals.

  • Provide technical leadership for hybrid/multicloud architectures, including VPC design, networking, IAM policies, and disaster recovery planning.

  • Establish cloud governance frameworks, standards, and best practices for infrastructure modernization.

  • Possess in-depth knowledge of Azure and GCP troubleshooting methodologies, including:

Deep diagnostic methodologies

Clusterlevel and nodelevel performance analysis

Distributed system debugging

Root-cause analysis across infra, networking, app, and data layers

  • Lead the design and optimization of CI/CD ecosystems using Azure DevOps, GitHub Actions, Jenkins, GitLab, or similar.

  • Drive organization-wide adoption of Infrastructure as Code, implementing reusable Terraform modules, Helm charts, and Kubernetes blueprints.

  • Establish DevOps standards for release governance, deployment automation, environment provisioning, and platform stability.

  • Implement automation at scale using Python, Bash, PowerShell, and Groovy DSL.

  • Architect and manage complex Kubernetes environments, including multi-cluster, multi-region, and hybrid setups.

  • Oversee workload security, cluster governance, autoscaling policies, service mesh strategies, ingress/East-West traffic control, and runtime optimizations.

  • Mentor teams in container best practices, image security, and enterprise-grade Helm/K8s deployments.

Preferred Skills
MustHave Technical Skills
- Azure DevOps (ADO), GitHub Actions, Bitbucket, Azure Repos
- Helm, Kubernetes, Docker, and enterprisegrade containerization practices
- Terraform and Infrastructure as Code principles
- Artifactory or similar artifact management solutions
- Strong Python and Groovy scripting experience for automation and troubleshooting
- Handson troubleshooting skills across Azure / AWS / GCP, including advanced diagnostics and cloud-native service debugging
- Expertise in scalable resource deployment, best practices, cost optimization, compliance, and operational governance

Sign up for Job Alerts