Skip to main content
Posted 02 June, 2026

Watts Application Support Engineer - Manager - MFT - KGS CH

KPMG
Bangalore,Karnataka,IN,560103 Full Time
Reference: 218_549848_30044687

Role title

Manager - AI Application Support (Level 2)

Role purpose (why the job exists)

Lead and mature the Level 2 support capability for AI-based applications, ensuring timely triage and resolution of incidents and service requests escalated from L1, and effective escalation to L3 when required. Own L2 processes, knowledge management, and continuous improvement to support rapid scale from ~20 to 100+ AI applications.

SFIA 8 accountability level

SFIA Level 5 - Ensure / Advise

  • Accountable for service performance and L2 operational outcomes
  • Influences stakeholders, defines ways of working, drives continual improvement
  • Ensures governance, controls, and quality for L2 support services

Required skills & experience (must-have)

  • Strong AI platform awareness: OpenAI/Azure AI capabilities, common failure patterns (auth, rate limits, latency, content filters, model availability).
  • Strong cloud fundamentals across Azure + (AWS/GCP desirable): identity, networking basics, logs/monitoring, API gateways, secrets/key vaults.
  • Proven ITSM experience (ServiceNow): incident/request/problem workflows, SLA management, categorization, KB.
  • Excellent triage & troubleshooting skills: isolate app vs platform vs integration issues.
  • Strong documentation: create/runbooks, known error database entries, KAs; ability to write for L1 consumption.
  • Experience leading support teams and interfacing with engineering/product.

Desirable (nice-to-have)

  • Experience supporting AI/ML systems in production (prompt pipelines, RAG/vector search, MLOps/LLMOps).
  • Familiarity with monitoring tools (Azure Monitor, App Insights, CloudWatch, Stackdriver, Splunk, etc.)
  • Automation/scripting for support (KQL, Python, PowerShell) to speed diagnostics.

Key performance indicators (KPIs)

  • SLA attainment (response/resolution) for L2 queue
  • MTTR for priority incidents; time-to-triage from L1 escalation
  • First-time-right routing to L3 (quality of escalations)
  • Recurring incident reduction (problem management effectiveness)
  • Knowledge coverage (% apps with runbooks; KB reuse rate)

Education / qualifications

  • Bachelor's in Engineering/CS/IT or equivalent experience.
  • ITIL foundation (preferred). Cloud fundamentals certification (preferred).

Key responsibilities (what success looks like)

Service Operations Leadership

  • Own day-to-day L2 operations for AI applications: queue management, prioritization, major incident readiness, and SLA adherence.
  • Establish and run triage standards to classify tickets as functional query vs technical incident, route appropriately, and manage escalations to L3.
  • Ensure robust handoffs between L1 L2 and L2 L3 (clear reproduction steps, logs, prompt traces, API payloads, environment details).

Incident / Problem / Knowledge Management

  • Lead Major Incident coordination for AI apps (as L2 lead), ensuring communications, timelines, and post-incident reviews.
  • Drive problem management: trend analysis, root cause investigations with L3/engineering, and preventive actions.
  • Own knowledge management for L2: define standards for Support Manuals, Runbooks, Troubleshooting Guides, and Knowledge Articles.

Scale, Governance & Continuous Improvement

  • Build scalable support model for 100+ apps: capacity planning, shift/coverage model (if applicable), standard tooling, and automation opportunities.
  • Define and monitor KPIs (examples below), run weekly/monthly service reviews, and implement improvement plans.
  • Ensure compliance with enterprise controls: access, data handling, audit trails, change governance.

Stakeholder & Vendor/Platform Collaboration

  • Primary interface with L3 Product Support (Advisory UK), Platform teams (Azure/AWS/GCP), Security, and App Owners.
  • Work with AI platform teams (OpenAI/Azure AI) on usage limits, reliability, upgrades, and incident patterns.
  • Support release/change planning with app teams to reduce ticket spikes during go-lives.

Sign up for Job Alerts