Posted 28 May, 2026
Devops Kubernetes
Diverse Lynx
bengaluru,Karnataka,560063
Full Time
Reference: 365_569689_26-00550
Description:
| We are looking for a skilled and pragmatic DevOps Engineer to own and evolve our infrastructure across the EMEIA region. This is a dual-horizon role: you will keep our existing VM-based systems healthy while leading a greenfield effort to design and build the managed environment that those solutions will migrate onto. A significant proportion of what we build is produced rapidly using AI-assisted, structured development. That means our solutions can move from idea to deployment faster than ever, and our infrastructure needs to keep pace. We need someone who thrives in a fast-moving, ambiguous environment, can absorb change quickly, and treats adaptability as a core part of the job rather than an occasional demand. The new managed environment is most likely to be based on Kube — client's internal Kubernetes (EKS) deployment — though the final architecture will be a team decision and client specific AWS remains an option for workloads requiring greater control. You will help inform that decision and then own the build-out, regardless of which direction is chosen. You will work closely with data engineers, developers, and analysts, acting as the infrastructure backbone for a team that moves quickly and expects you to move with it. The role also involves working directly with third-party vendors who support some of the tools being deployed, and collaborating with teams outside of EMEIA — including WorldWide — to align on standards, share solutions, and resolve cross-regional dependencies. KEY RESPONSIBILITIES Platform Migration & Environment Design · Lead the design and build-out of a new managed container environment to replace existing VM-based infrastructure — the most likely candidate is Kube (client's internal Kubernetes/EKS cluster), but the final decision will be made collaboratively as a team · Contribute meaningfully to the environment selection decision: weigh trade-offs between managed solutions (Kube) and more directly controlled alternatives (client specific AWS), considering maintenance overhead, operational control, and team capability · Own the migration of existing VM-based workloads onto the new platform, managing sequencing, risk, and continuity of service throughout · Establish and maintain the standard workflow for deploying solutions: build locally → containerise → publish to Kube → configure connectivity to client internal system dependencies Client Internal Networking & Connectivity · Configure and maintain networking between Kube and client's internal systems, including Shield, Snowflake, Floodgate, and any other platform dependencies the team relies on · Own namespace and compute provisioning on the shared Kube cluster, ensuring workloads are appropriately isolated and correctly configured · Manage credentials, service accounts, and access controls across the full connectivity chain — from container to downstream service · Act as the go-to expert on how things connect within client's internal network topology Infrastructure Management · Own and manage cloud infrastructure across EMEIA using internal cloud tooling (client cloud and connected systems including Shield) · Manage certificates, firewalls, resource pools, networking, and access controls · Ensure infrastructure is appropriately sized, resilient, and cost-efficient · Maintain accurate documentation of infrastructure topology and configuration VM Provisioning & Automation (Existing Estate) · Maintain and operate existing virtual machines, primarily on RHEL, while migration to the new environment is in progress · Build and maintain standardised, repeatable provisioning processes (e.g. via Ansible, Terraform, or equivalent IaC tooling) · Manage package deployment, software repositories, databases, and web servers · Own the patching and update lifecycle for managed systems Monitoring & Reliability · Implement and maintain monitoring, alerting, and observability across both the existing VM estate and the new container environment · Proactively identify risks, bottlenecks, and failure patterns before they impact users · Define and track appropriate SLIs/SLOs for critical services · Conduct post-incident reviews and drive lasting improvements Supporting AI-Augmented Development · A large proportion of the solutions you will support are built rapidly using structured AI-assisted development — you must be comfortable working with codebases and configurations that evolve quickly, may not have deep documentation histories, and may have been substantially generated with AI tooling · Provide the infrastructure scaffold that allows AI-assisted solutions to move from local development to production reliably and safely · Be a pragmatic partner to developers: unblock deployment quickly, catch infrastructure-level risks early, and help establish patterns that make rapid iteration safe at scale · Actively use AI tools (e.g. Claude, Copilot, or similar) to accelerate your own work: writing scripts, diagnosing issues, generating runbooks, reviewing configurations Diagnosis & Incident Response · Take ownership of vague or ambiguous production issues (e.g. "it's running slow”, "the server keeps falling over”) and drive them through to resolution · Deliver short-term fixes rapidly to restore service, while tracking and delivering long-term root cause resolutions · Maintain a pragmatic balance between speed-of-recovery and quality-of-fix SKILLS & EXPERIENCE Essential · Proven experience in a DevOps, infrastructure, or platform engineering role · Hands-on experience with Kubernetes — deploying, configuring, and operating workloads in a shared or managed cluster environment · Experience containerising applications: writing Dockerfiles, managing images, publishing to a registry, and debugging container-level issues · Strong networking fundamentals: DNS, TLS/SSL certificates, firewall rules, load balancing, VPNs, and service-to-service connectivity · Comfort operating in environments where the architecture is still being defined — able to contribute to the decision, then execute once direction is set · Hands-on experience with RHEL (or equivalent enterprise Linux) — provisioning, hardening, package management (yum/dnf), systemd services · Experience managing cloud infrastructure, ideally in an enterprise private/hybrid cloud environment · Experience with infrastructure-as-code or configuration management tooling (e.g. Terraform, Ansible, Puppet, or similar) · Solid scripting ability in Bash and at least one higher-level language (Python preferred) · Experience with monitoring and observability tooling (e.g. Prometheus, Grafana, Datadog, or similar) · Strong incident diagnosis skills — able to work from vague symptoms to root cause using logs, metrics, and reasoning · Comfortable working with AI-generated or AI-assisted codebases: reading, extending, and debugging solutions without a full traditional authorship history · Clear written and verbal communication — able to translate infrastructure complexity for non-technical stakeholders Desirable · Experience with AWS or client specific AWS, particularly EKS · Familiarity with client's internal platform tooling: Kube, Shield, Floodgate, or similar · Experience integrating with Snowflake, including managing drivers, credentials, and network access · Experience with CI/CD pipelines (GitLab CI, Jenkins, GitHub Actions, or similar) · Exposure to security tooling, vulnerability scanning, or compliance frameworks (e.g. CIS Benchmarks) · Familiarity with secrets management tooling (Vault, CyberArk, or similar) · Experience working in a regulated or enterprise environment with change management processes WAYS OF WORKING · You are comfortable with genuine ambiguity — including at the architectural level — and can make progress and contribute to decisions without waiting for everything to be resolved · You default to automation: if you do something twice, you script it; if you do it three times, you build a process · You adapt quickly: the tools, environments, and solutions you support can change fast, and you treat that as normal rather than exceptional · You are pragmatic under pressure: you know when to stop the bleeding first and fix it properly later · You are self-directed and comfortable owning problems end-to-end with minimal hand-holding · You are a willing partner to developers who move fast — you keep up, add guardrails where they matter, and don't become a bottleneck WHAT SUCCESS LOOKS LIKE · A new managed container environment is designed, built, and running — with existing VM-based workloads migrated onto it in a controlled, sequenced way · The standard deployment path (build → containerise → publish → connect) is well-established, documented, and easy for the team to use · Connectivity from the new environment to client internal systems (Snowflake , Shield, Floodgate, etc.) is reliable, well-understood, and correctly secured · Teams are unblocked quickly when they need new integrations, access, or capabilities — even when the solutions they are deploying have been built at speed · Production issues are resolved rapidly, with lasting fixes following close behind · Monitoring catches issues before users do · The infrastructure estate — both old and new — is well-documented, well-understood, and in a known-good state |