Posted 16 June, 2026

AI-Native SRE Architect / Reliability Transformation Lead

HCLTech

Bengaluru, KA, IN Full Time

Reference: 2b2e070e039cacef

Job Description

Experience -15 to 20 years

Location - Bengaluru/Chennai/Hyderabad/Pune/Noida

We're Hiring: AI-Native SRE Architect / Reliability Transformation Lead

Are you passionate about building resilient, scalable, and automation-first enterprise platforms?

We are looking for experienced SRE leader to drive enterprise-wide reliability transformation across cloud, digital, SaaS, industrial, AI-enabled, and mission-critical platforms. This role combines technical expertise with strategic transformation leadership to modernize operations through SRE, AI-driven observability, autonomous operations, platform engineering, and intelligent automation practices.

As a trusted advisor and hands-on transformation leader, you will partner with engineering, operations, cloud, platform, architecture, security and AI-engineering teams to institutionalize modern Site Reliability Engineering practices at scale.

Key Responsibilities

Define and drive enterprise SRE strategy, governance, and reliability standards.
Establish SLOs, SLIs, Error Budgets, and reliability KPIs.
Define modern observability standards leveraging OpenTelemetry, AI-powered analytics, event intelligence, and intelligent monitoring platforms.
Drive autonomous operations through AI-assisted troubleshooting, predictive incident prevention, intelligent remediation, self-healing, and operational copilots.
Enable automation-first operations using Infrastructure as Code (IaC), GitOps, CI/CD, platform engineering, and policy-driven operational workflows.
Partner with platform engineering teams to improve developer experience through self-service platforms, deployment patterns, golden paths, and reliability guardrails.
Drive reliability engineering practices for AI-enabled platforms, including operational resilience, observability, governance, scalability, and performance optimization for LLM-based and intelligent systems.
Lead incident management initiatives including intelligent RCA, operational analytics, resiliency engineering, and continuous reliability improvement programs.
Coach and mentor engineering and operations teams on modern reliability engineering principles.
Collaborate with cloud, platform, architecture, and security teams on enterprise modernization initiatives.

Requirements

Strong expertise in SRE, Platform Engineering, Production Systems Engineering, DevOps or Cloud Engineering leadership roles.
Hands-on experience with cloud platforms, observability, automation, and operational excellence.
Expertise with modern observability and telemetry platforms such as Dynatrace, Splunk, OpenTelemetry, and modern monitoring platforms.
Experience in incident management, RCA, resiliency engineering, and automation.
Strong stakeholder management, mentoring, and communication skills.
Experience across enterprise ecosystems including Azure, Java/.NET, SAP, Salesforce, SaaS/COTS, and legacy platforms is a plus.
Exposure to reliability and operational governance for AI/ML or LLM-enabled systems is highly desirable.

Leadership Attributes

Enterprise transformation mindset with ability to influence cross-functional engineering and operations organizations.
Strong balance of strategic thinking and hands-on technical depth.
Ability to drive cultural transformation toward reliability ownership, automation-first engineering, and platform-centric operating models.
Passion for modern engineering practices, AI-enabled operations, developer productivity, and operational innovation.

Apply to this Job

AI-Native SRE Architect / Reliability Transformation Lead

Job Description

Sign up for Job Alerts

Share this Job