Lead Site Reliability Engineer
Job Description
About the Role :
As a Lead Site Reliability Engineer, you will own the reliability and availability of our production systems. You will...
Job Description
About the Role :
As a Lead Site Reliability Engineer, you will own the reliability and availability of our production systems. You will champion SRE principles across engineering teams — defining SLOs, managing error budgets, and leading a culture of blameless incident response. This is a hands-on leadership role where you will partner closely with product and engineering teams to balance the pace of innovation with the stability our customers depend on.
- Title: Site Reliability Engineer
- Shift- General/UK Shift
- Location: India, Remote Any location near CNX offices
Responsibilities:
- Reliability Ownership
- · Define, implement, and own Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets across critical services.
- · Use error budget policies to drive data-informed conversations between engineering and product on release velocity vs. reliability trade-offs.
- · Conduct capacity planning and proactive risk assessments to prevent incidents before they occur.
- Incident Management
- · Lead incident response as incident commander — coordinating teams, driving resolution, and maintaining clear stakeholder communication during outages.
- · Facilitate thorough, blameless postmortems and ensure action items are tracked, prioritized, and resolved.
- · Develop and continuously improve runbooks, escalation paths, and on-call practices to reduce MTTD and MTTR.
- Observability & Monitoring
- · Design and maintain observability strategies using modern tooling (Prometheus, Grafana, OpenTelemetry, ELK) to ensure full visibility into system health.
- · Define intelligent alerting that is actionable and minimizes alert fatigue.
- · Drive adoption of distributed tracing and structured logging across services.
- Toil Reduction & Automation
- · Identify and measure toil across the engineering organization and lead initiatives to eliminate it through automation.
- · Build internal tooling and self-service capabilities that improve developer productivity and system reliability.
Infrastructure & Platform Reliability
- · Collaborate with platform and infrastructure teams on cloud-native patterns for fault tolerance, auto-scaling, and disaster recovery.
- · Provide SRE input into CI/CD pipelines and deployment strategies (e.g., canary releases, blue/green deployments) to minimize production risk.
- · Manage infrastructure using IaC practices (Terraform or equivalent) with a focus on reliability and consistency.
Leadership & Culture
- · Mentor and grow junior SREs, fostering a culture of ownership, curiosity, and continuous improvement.
- · Act as an SRE advocate across engineering — embedding reliability thinking into the software development lifecycle.
- · Partner with key stakeholders to align SRE strategy with broader organizational goals.
- · Conduct regular 1:1s with direct reports and participate in team rituals.
AI Expectations
- As with all engineers at our organization, this role requires an AI-native mindset. Specifically, you will be expected to:
- · Embed AI tools and practices into how we build and run our platform — deploying AI-powered capabilities and shipping real AI features into production.
- · Support engagement and solutioning for AI-powered offerings, translating technical capabilities into tangible business value.
- · Collaborate with cross-functional partners — including Product, Data, Security, and Legal — to ensure AI is delivered safely, effectively, and in compliance with relevant standards.
- Skills you will need:
- 7+ years of experience in SRE, platform engineering, or a related discipline.
- Proven experience defining and managing SLOs, SLIs, and error budgets in a production environment.
- Strong incident management experience, including leading postmortems and driving reliability improvements.
- Hands-on experience with observability tooling (Prometheus, Grafana, OpenTelemetry, or similar).
- Solid understanding of cloud platforms (AWS, Azure, or GCP) and containerized environments (Kubernetes).
- Proficiency in at least one scripting or programming language (Python, Go, or Bash).
- Nice to Have
- Experience with chaos engineering tools (e.g., Chaos Monkey, Gremlin, LitmusChaos).
- Familiarity with IaC tooling such as Terraform or Pulumi.
- Knowledge of DevSecOps practices and security tooling.
- Experience with GitOps workflows and CI/CD pipelines.
- Bilingual proficiency (English & Spanish).
- Complete all assigned, mandatory training within the timeframe provided.
- Conduct and/or participate in regularly scheduled 1:1 meetings with your direct manager and/or direct reports
Below are some other jobs we think you might be interested in.
-
Lead Site Reliability Engineer
- HDFC Bank
- Guwahati, AS, IN
Job Description Job Role: Help build a Site Reliability Engineering culture by sharing the best practices, approaches, documentation, and code with...20 Jun -
Lead Site Reliability Engineer
- Luxoft
- Bengaluru, KA, IN
Job Description Project Description: Luxoft partner with next-generation digital bank, built from the ground up to deliver seamless, secure, and...17 Jun -
Lead Site Reliability Engineer
- Cvent
- Gurugram, HR, IN
Job Description Cvent is a leading meetings, events, and hospitality technology provider with more than 5,000+ employees and 24,000+ customers...22 Jun -
Lead Site Reliability Engineer
- HighLevel
- India
About HighLevel: HighLevel is an AI-powered business operating system that gives agencies, entrepreneurs and SMBs the infrastructure to build,...12 Jun -
Lead Site Reliability Engineer
- Cvent
- Haryāna, HR, IN
Job Description Cvent is a leading meetings, events, and hospitality technology provider with more than 5,000+ employees and 24,000+ customers...22 Jun -
Lead Site Reliability Engineer
- Zeta
- Hyderabad
About Zeta Build the future of banking. Zeta is a next-generation banking technology company providing cloud-native, fully...29 May -
Lead Site Reliability Engineer
- Cvent
- Faridabad, HR, IN
Job Description Cvent is a leading meetings, events, and hospitality technology provider with more than 5,000+ employees and 24,000+ customers...24 Jun -
Lead Site Reliability Engineer-JIS137569
- MM Management Consultant
- Hyderabad, TG, IN
Job Description Lead Site Reliability Engineer (SRE) We are seeking an experienced Lead Site Reliability Engineer (SRE) to lead a team of 5–6 engineers...22 Jun -
Lead Site Reliability Engineer
- Genpact
- Rajasthan,Jaipur,India
About Company: Genpact (NYSE: G) is an agentic and advanced technology solutions company. We leverage process intelligence and artificial intelligence...15 Jun -
Lead Site Reliability Engineer
- Genpact
- India
About Company: Genpact (NYSE: G) is an agentic and advanced technology solutions company. We leverage process intelligence and artificial intelligence...15 Jun -
Technical Lead - Site Reliability Engineer
- Mumba Technologies, Inc.
- Kolkata, WB, IN
Job Description About your role:\nTechnical Leadership & Architecture\nOwn and drive the technical direction for your team's infrastructure systems,...23 Jun -
Technical Lead - Site Reliability Engineer
- Mumba Technologies, Inc.
- Narela, DL, IN
Job Description About your role: Technical Leadership & Architecture Own and drive the technical direction for your team's infrastructure systems,...24 Jun -
Technical Lead - Site Reliability Engineer
- Mumba Technologies, Inc.
- Indore, MP, IN
Job Description About your role: Technical Leadership & Architecture Own and drive the technical direction for your team's infrastructure systems,...24 Jun -
Technical Lead - Site Reliability Engineer
- Mumba Technologies, Inc.
- Bhopal, MP, IN
Job Description About your role: Technical Leadership & Architecture Own and drive the technical direction for your team's infrastructure systems,...24 Jun -
Technical Lead - Site Reliability Engineer
- Mumba Technologies, Inc.
- Bikaner, RJ, IN
Job Description About your role: Technical Leadership & Architecture Own and drive the technical direction for your team's infrastructure systems,...24 Jun -
Technical Lead - Site Reliability Engineer
- Mumba Technologies, Inc.
- Vellore, TN, IN
Job Description About your role: Technical Leadership & Architecture Own and drive the technical direction for your team's infrastructure systems,...24 Jun -
Technical Lead - Site Reliability Engineer
- Mumba Technologies, Inc.
- Kurnool, AP, IN
Job Description About your role: Technical Leadership & Architecture Own and drive the technical direction for your team's infrastructure systems,...21 Jun -
Technical Lead - Site Reliability Engineer
- Mumba Technologies, Inc.
- Kottayam, KL, IN
Job Description About your role: Technical Leadership & Architecture Own and drive the technical direction for your team's infrastructure systems,...24 Jun -
Technical Lead - Site Reliability Engineer
- Mumba Technologies, Inc.
- Alappuzha, KL, IN
Job Description About your role: Technical Leadership & Architecture Own and drive the technical direction for your team's infrastructure systems,...23 Jun -
Technical Lead - Site Reliability Engineer
- Mumba Technologies, Inc.
- Mangalore, KA, IN
Job Description About your role: Technical Leadership & Architecture Own and drive the technical direction for your team's infrastructure systems,...24 Jun