AgenticOps Platform Engineer Lead
Job Description
We are looking for a senior, hands-on AgentOps Platform Engineer to design, build, and operate the cloud-native infrastructure that...
Job Description
We are looking for a senior, hands-on AgentOps Platform Engineer to design, build, and operate the cloud-native infrastructure that powers our AI agents at scale.
This is a lead-by-example role:
- You write the Terraform
- You build the pipelines
- You own the platform in production
GCP is your primary environment, but you will design with multi-cloud in mind (AWS, Azure), ensuring portability, resilience, and long-term flexibility. This role sits at the intersection of DevOps, MLOps, and AgentOps, with deep responsibility for reliability, security, observability, and cost.
KEY RESPONSIBILITIES
Platform & Infrastructure Ownership
- Design, build, and operate production-grade infrastructure for AI agents and LLM services
- Own Terraform-based Infrastructure as Code for all environments (dev, uat, prod)
- Lead infrastructure decisions through hands-on implementation, not diagrams
- Build scalable foundations for: Agent orchestration Inference services RAG pipelines Vector stores
- Optimise cloud resources for performance and cost efficiency
AgentOps & AI Platform Enablement
- Enable safe, continuous operation of autonomous agents
- Design agent runtime environments with: Isolation & sandboxing Failover and recovery strategies Controlled rollout mechanisms
- Support prompt/version management, agent configuration, and tool/plugin lifecycle
- Work closely with Agentic RAG engineers to operationalise research into production
CI/CD & Automation
- Build and maintain CI/CD pipelines for: Infrastructure Agent services Prompt and config changes Model/version rollouts
- Automate workflows for: Vector DB updates RAG index refreshes Agent memory stores Tool registration and validation
- Reduce manual ops toil aggressively through automation
Observability & Production Readiness
- Design and implement deep observability for agent systems: Platform health Agent execution metrics Latency, cost, and throughput Failure modes and retries
- Build dashboards, alerts, and telemetry using: Prometheus Grafana OpenTelemetry (or equivalent)
- Enable visibility into agent decision traces and runtime behavior
Security, Safety & Reliability
- Implement secure cloud architecture and IAM best practices
- Own production reliability, incident response, and recovery
- Enforce operational guardrails and safety controls for agent APIs
- Support responsible AI practices from an infrastructure and runtime perspective
Collaboration & Technical Leadership
- Work closely with: Agentic RAG engineers AI engineers Product & CTO Office
- Define SLOs, reliability targets, and operational metrics
- Set the technical bar for AgentOps at BridgeAI
- Mentor engineers by example and code, not process overhead
REQUIRED SKILLS & EXPERIENCE
Core Platform & DevOps
- 5+ years in DevOps, Platform Engineering, SRE, or MLOps
- Strong, hands-on experience with GCP: GKE / Compute Engine Cloud Run / Functions Cloud Storage, Pub/Sub Vertex AI (or equivalent)
- Deep experience with Terraform (mandatory)
Containers, CI/CD & Automation
- Docker, Kubernetes, Helm
- CI/CD tooling (GitHub Actions, Jenkins, ArgoCD)
- Python and Bash for automation and platform glue code
Agentic & AI Systems
- Experience supporting LLM-based systems in production
- Understanding of: Prompt/version management Context handling & caching Model rollout strategies
- Hands-on experience with vector databases (Weaviate, FAISS, Pinecone)
- Familiarity with RAG pipelines and agent execution patterns
Observability & Security
- Monitoring and telemetry using Prometheus, Grafana, OpenTelemetry
- Strong understanding of cloud security, IAM, and operational safety
NICE TO HAVE
- Multi-cloud experience (AWS, Azure)
- Exposure to agent frameworks (LangChain, LangGraph, AutoGen, CrewAI)
- Event-driven systems (Temporal, Airflow)
- Experience with responsible AI operations or safety monitoring
WHAT SUCCESS LOOKS LIKE
- Infrastructure is reproducible, observable, and boring (in a good way)
- Agent failures are visible, debuggable, and recoverable
- Cloud costs are understood and controlled
- Engineers trust the platform and move faster because of it
- You are the go-to authority for AgentOps at BridgeAI
WHAT THIS ROLE IS (AND IS NOT)
- Deeply hands-on
- Terraform-first
- Production ownership
- Sets standards by building
- Not a people-manager role
- Not a ticket-based ops role
- Not a “just keep the lights on” job
Below are some other jobs we think you might be interested in.
-
AgenticOps Platform Engineer Lead
- Bridge AI
- West Bunghmun, MZ, IN
Job Description We are looking for a senior, hands-on AgentOps Platform Engineer to design, build, and operate the cloud-native infrastructure that...04 Jun -
AgenticOps Platform Engineer Lead
- Bridge AI
- Meerut, UP, IN
Job Description We are looking for a senior, hands-on AgentOps Platform Engineer to design, build, and operate the cloud-native infrastructure that...04 Jun -
AgenticOps Platform Engineer Lead
- Bridge AI
- Aligarh, UP, IN
Job Description We are looking for a senior, hands-on AgentOps Platform Engineer to design, build, and operate the cloud-native infrastructure that...04 Jun -
AgenticOps Platform Engineer Lead
- Bridge AI
- Tiruppūr, TN, IN
Job Description We are looking for a senior, hands-on AgentOps Platform Engineer to design, build, and operate the cloud-native infrastructure that...04 Jun -
AgenticOps Platform Engineer Lead
- Bridge AI
- Palakkad, KL, IN
Job Description We are looking for a senior, hands-on AgentOps Platform Engineer to design, build, and operate the cloud-native infrastructure that...02 Jun -
AgenticOps Platform Engineer Lead
- Bridge AI
- Hosūr, TN, IN
Job Description We are looking for a senior, hands-on AgentOps Platform Engineer to design, build, and operate the cloud-native infrastructure that...04 Jun -
AgenticOps Platform Engineer Lead
- Bridge AI
- Kanpur, UP, IN
Job Description We are looking for a senior, hands-on AgentOps Platform Engineer to design, build, and operate the cloud-native infrastructure that...04 Jun -
AgenticOps Platform Engineer Lead
- Bridge AI
- Ghaziabad, UP, IN
Job Description We are looking for a senior, hands-on AgentOps Platform Engineer to design, build, and operate the cloud-native infrastructure that...04 Jun -
AgenticOps Platform Engineer Lead
- Bridge AI
- Srikakulam, AP, IN
Job Description We are looking for a senior, hands-on AgentOps Platform Engineer to design, build, and operate the cloud-native infrastructure that...04 Jun -
AgenticOps Platform Engineer Lead
- Bridge AI
- Morādābād, UP, IN
Job Description We are looking for a senior, hands-on AgentOps Platform Engineer to design, build, and operate the cloud-native infrastructure that...04 Jun -
AgenticOps Platform Engineer Lead
- Bridge AI
- Secunderabad, TG, IN
Job Description We are looking for a senior, hands-on AgentOps Platform Engineer to design, build, and operate the cloud-native infrastructure that...04 Jun -
AgenticOps Platform Engineer Lead
- Bridge AI
- Nashik, MH, IN
Job Description We are looking for a senior, hands-on AgentOps Platform Engineer to design, build, and operate the cloud-native infrastructure that...04 Jun -
AgenticOps Platform Engineer Lead
- Bridge AI
- Panchkula, HR, IN
Job Description We are looking for a senior, hands-on AgentOps Platform Engineer to design, build, and operate the cloud-native infrastructure that...04 Jun -
AgenticOps Platform Engineer Lead
- Bridge AI
- Tirupur, TN, IN
Job Description We are looking for a senior, hands-on AgentOps Platform Engineer to design, build, and operate the cloud-native infrastructure that...04 Jun -
AgenticOps Platform Engineer Lead
- Bridge AI
- Pune, MH, IN
Job Description We are looking for a senior, hands-on AgentOps Platform Engineer to design, build, and operate the cloud-native infrastructure that...04 Jun -
AgenticOps Platform Engineer Lead
- Bridge AI
- Ajit, RJ, IN
Job Description We are looking for a senior, hands-on AgentOps Platform Engineer to design, build, and operate the cloud-native infrastructure that...01 Jun -
AgenticOps Platform Engineer Lead
- Bridge AI
- Amravati, MH, IN
Job Description We are looking for a senior, hands-on AgentOps Platform Engineer to design, build, and operate the cloud-native infrastructure that...04 Jun -
AgenticOps Platform Engineer Lead
- Bridge AI
- Davangere, KA, IN
Job Description We are looking for a senior, hands-on AgentOps Platform Engineer to design, build, and operate the cloud-native infrastructure that...02 Jun -
AgenticOps Platform Engineer Lead
- Bridge AI
- Guwahati, AS, IN
Job Description We are looking for a senior, hands-on AgentOps Platform Engineer to design, build, and operate the cloud-native infrastructure that...04 Jun -
AgenticOps Platform Engineer Lead
- Bridge AI
- Dehradun, UT, IN
Job Description We are looking for a senior, hands-on AgentOps Platform Engineer to design, build, and operate the cloud-native infrastructure that...04 Jun