Skip to main content
Posted 16 June, 2026

AI-Native SRE Architect / Reliability Transformation Lead

HCLTech
Bengaluru, KA, IN Full Time
Reference: 2b2e070e039cacef

Job Description

Experience -15 to 20 years

Location - Bengaluru/Chennai/Hyderabad/Pune/Noida

We're Hiring: AI-Native SRE Architect / Reliability Transformation Lead

Are you passionate about building resilient, scalable, and automation-first enterprise platforms?

We are looking for experienced SRE leader to drive enterprise-wide reliability transformation across cloud, digital, SaaS, industrial, AI-enabled, and mission-critical platforms. This role combines technical expertise with strategic transformation leadership to modernize operations through SRE, AI-driven observability, autonomous operations, platform engineering, and intelligent automation practices.

As a trusted advisor and hands-on transformation leader, you will partner with engineering, operations, cloud, platform, architecture, security and AI-engineering teams to institutionalize modern Site Reliability Engineering practices at scale.

Key Responsibilities

  • Define and drive enterprise SRE strategy, governance, and reliability standards.
  • Establish SLOs, SLIs, Error Budgets, and reliability KPIs.
  • Define modern observability standards leveraging OpenTelemetry, AI-powered analytics, event intelligence, and intelligent monitoring platforms.
  • Drive autonomous operations through AI-assisted troubleshooting, predictive incident prevention, intelligent remediation, self-healing, and operational copilots.
  • Enable automation-first operations using Infrastructure as Code (IaC), GitOps, CI/CD, platform engineering, and policy-driven operational workflows.
  • Partner with platform engineering teams to improve developer experience through self-service platforms, deployment patterns, golden paths, and reliability guardrails.
  • Drive reliability engineering practices for AI-enabled platforms, including operational resilience, observability, governance, scalability, and performance optimization for LLM-based and intelligent systems.
  • Lead incident management initiatives including intelligent RCA, operational analytics, resiliency engineering, and continuous reliability improvement programs.
  • Coach and mentor engineering and operations teams on modern reliability engineering principles.
  • Collaborate with cloud, platform, architecture, and security teams on enterprise modernization initiatives.

Requirements

  • Strong expertise in SRE, Platform Engineering, Production Systems Engineering, DevOps or Cloud Engineering leadership roles.
  • Hands-on experience with cloud platforms, observability, automation, and operational excellence.
  • Expertise with modern observability and telemetry platforms such as Dynatrace, Splunk, OpenTelemetry, and modern monitoring platforms.
  • Experience in incident management, RCA, resiliency engineering, and automation.
  • Strong stakeholder management, mentoring, and communication skills.
  • Experience across enterprise ecosystems including Azure, Java/.NET, SAP, Salesforce, SaaS/COTS, and legacy platforms is a plus.
  • Exposure to reliability and operational governance for AI/ML or LLM-enabled systems is highly desirable.

Leadership Attributes

  • Enterprise transformation mindset with ability to influence cross-functional engineering and operations organizations.
  • Strong balance of strategic thinking and hands-on technical depth.
  • Ability to drive cultural transformation toward reliability ownership, automation-first engineering, and platform-centric operating models.
  • Passion for modern engineering practices, AI-enabled operations, developer productivity, and operational innovation.

Sign up for Job Alerts