Skip to main content
Posted 19 June, 2026

AI Operations

Allianz Commercial
Pune, MH, IN Full Time
Reference: b7697c1a0eeeee50

Job Description

This job is with Allianz Commercial, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly.\n\nJob Title\nAI Automation - Operations Engineer\nOverview\nWe are hiring an AI Automation Operations Engineer to own operational excellence for our AI & Automation products and AIOps platforms. This role spans end-to-end reliability across infrastructure, application, middleware, and AI/GenAI layers.

You will design monitoring and health checks, lead platform upgrades and high‑availability setups, drive stability and incident management, enable product adoption, document production processes, and contribute to pre‑prod testing and release readiness.\nCore Responsibilities\nMonitoring and Observability\nDesign and implement comprehensive monitoring, alerting, and health‑check frameworks across infra, app, middleware, and AI/GenAI layers.\nBuild dashboards and SLO/SLA telemetry using Grafana, Dynatrace, Azure Monitor, Application Insights, Log Analytics, or equivalent.\nDefine key metrics (availability, latency, error rates, model drift, pipeline throughput) and set automated alerts and escalation paths.\nAutomate health checks and synthetic transactions for critical user journeys and model inference paths.\nUpgrades, High Availability, and Roadmap\nLead platform and product upgrades , including Active‑Active, Active‑Passive, blue/green and canary deployment strategies.\nPlan and own upgrade roadmaps in collaboration with Ops, GCC, Engineering, Product, and stakeholders; coordinate maintenance windows and rollback plans.\nValidate upgrades in pre‑prod and staging, ensure zero/low downtime cutovers, and document upgrade runbooks.\nStability, Incident and Problem Management\nOwn incident lifecycle from detection to resolution and RCA; run incident response and post‑mortems.\nDrive reliability engineering practices: capacity planning, performance tuning, chaos testing, and resilience patterns.\nImplement automation for remediation, runbook execution, and incident mitigation to reduce MTTR.\nMaintain SLAs and report availability and reliability metrics to stakeholders.\nEnablement and Adoption\nDeliver enablement sessions , workshops, and demos to internal teams and customers on how to use AI Automation products.\nCreate and maintain user manuals, quick start guides, runbooks, and FAQs tailored to operators, developers, and business users.\nAct as SME for onboarding, troubleshooting, and best practices for GenAI/LLM usage and safe model operations.\nProduction Process Control and Documentation\nMap and document production processes , data flows, deployment pipelines, and operational dependencies.\nCreate runbooks, SOPs, and playbooks for routine operations, change management, and emergency procedures.\nEstablish governance for change approvals, configuration management, and access controls.\nTesting and Release Support\nContribute to pre‑prod testing : functional, integration, performance, load, and model validation tests.\nCoordinate release readiness with QA, DevOps, and engineering; validate CI/CD pipelines and rollback mechanisms.\nSupport canary and staged rollouts , monitor metrics during releases, and authorize promotion to production.\nCross‑Functional Collaboration and Vendor Management\nWork closely with Dev, SRE, Security, QA, and Product to prioritize reliability work and roadmap items.\nCoordinate with cloud providers and third‑party vendors for escalations, upgrades, and capacity planning.\nCommunicate status and risks to leadership and stakeholders with clear, actionable reports.\nRequired Technical Skills\nProgramming and Scripting : Python or Node.js for automation, monitoring scripts, and tooling.\nMonitoring and Observability : Hands‑on with Grafana, Dynatrace, Azure Monitor, Application Insights, Log Analytics, Prometheus, or equivalent.\nCloud Platforms : Experience with Azure (preferred) or AWS/GCP; infrastructure provisioning and cost optimization.\nContainers and Orchestration : Docker and Kubernetes (AKS/EKS/GKE) operational experience.\nCI/CD and DevOps : Git, Jenkins/GitHub Actions/GitLab CI, pipeline troubleshooting and release automation.\nITSM : ServiceNow or equivalent for incident, change, and problem management.\nDatabases and Storage : Monitoring and basic troubleshooting for SQL and NoSQL systems.\nAI/GenAI Operations : Familiarity with LLMOps/MLOps concepts, model deployment, inference monitoring, and model drift detection.\nPlatform Upgrades : Experience planning and executing upgrades, migrations, and HA configurations (Active‑Active, DR).\nPreferred Experience and Certifications\n2-6+ years in IT operations, SRE, or platform engineering with exposure to AI/Automation stacks.\nExperience supporting production GenAI services and automation/orchestration platforms (e.g., Amelia or similar).\nCertifications such as Azure Administrator/Architect, Kubernetes (CKA/CKAD), ITIL, or relevant cloud/DevOps certifications are a plus.\nSoft Skills and Behaviors\nStrong communicator able to translate technical status to non‑technical stakeholders.\nProactive problem solver with a bias for automation and continuous improvement.\nCollaborative team player who can lead cross‑functional initiatives.\nOrganized and accountable with experience in on‑call rotations and incident leadership\nAllianz Group is one of the most trusted insurance and asset management companies in the world. Caring for our employees, their ambitions, dreams and challenges, is what makes us a unique employer. Together we can build an environment where everyone feels empowered and has the confidence to explore, to grow and to shape a better future for our customers and the world around us.\n\nAt Allianz, we stand for unity: we believe that a united world is a more prosperous world, and we are dedicated to consistently advocating for equal opportunities for all.

And the foundation for this is our inclusive workplace, where people and performance both matter, and nurtures a culture grounded in integrity, fairness, inclusion and trust.\n\nWe therefore welcome applications regardless of ethnicity or cultural background, age, gender, nationality, religion, social class, disability or sexual orientation, or any other characteristics protected under applicable local laws and regulations.\n\nGreat to have you on board. Let's care for tomorrow.\n\nNote: Having different strengths, experiences, perspectives and approaches is an integral part of Allianz' company culture. One means to achieve this is a regular rotation of Allianz Executive employees across functions, Allianz entities and geographies.

Therefore, the company expects from its employees a general openness and a high motivation to regularly change positions and collect experiences across Allianz Group.\n]]>

Sign up for Job Alerts