Skip to main content
Posted 12 June, 2026

Software Principal Engineer - SRE, Production Engineering

Boomi
India Full Time
Reference: 102_698576_5792761004

Are you ready to work on world changing technologies? Today, organizations need to move with increased agility and insight to grow and thrive. Boomi is one of the hottest tech companies in the SaaS/Cloud industry, named a Leader for the third year in a row in the Gartner Enterprise iPaaS Magic Quadrant and recently recognized by Inc. Magazine as one of the best workplaces. Our award-winning, patented technology is transforming the world of integration by making enterprise-class integration technology accessible and affordable to companies of all sizes.
Boomi provides the foundation on which your business can evolve and innovate. According to a recent survey by Vanson Bourne, connected businesses are far outpacing their competitors. We help organizations connect everything and engage everywhere across any channel, device or platform. More than 18,000 organizations are using Boomi to run better, faster and smarter.
Working at Boomi means doing what you love. We hire trailblazers with an entrepreneurial spirit who can solve challenging problems, make a real impact in technology and want to build something big. If you are passionate about solving hard problems, enjoy working with world-class people and developing cutting edge technology, you should explore a career with Boomi. Learn more at http://www.boomi.com/ or visit Boomi Careers.
Join us as a Sr Site Reliability Engineer on our Reliability team to do the best work of your career and make a profound social impact.

What you'll achieve
As a Senior Site Reliability Engineer,, you will be responsible for developing sophisticated systems and software based on the customer's business goals, needs and general business environment. You will work with product management, other engineering teams, customer success and support on developing cutting edge new product features and enhancements across various areas of Boomi offerings.
You will:
  • Participate actively in detecting, remediating and reporting on Production incidents, ensuring the SLAs/ SLOs are defined and met
  • Participate in on-call rotation to ensure coverage for planned/unplanned events.
  • Engage with other Engineering organizations to implement processes, identify improvements, and drive consistent results.
  • Working with your SRE and Engineering counterparts for driving DR exercises, Game days, training and other response readiness efforts.
  • Collaborate with Service Engineering organizations to build and automate tooling, implement best practices on Observability and manage the Boomi services in production and consistently achieve our market leading SLA.
  • Improving the scalability and reliability of Boomi's systems in production.
  • Automate the provisioning and maintenance of Boomi's infrastructure.
  • Work independently with a minimal level of guidance from technical leadership
  • Mentor other Boomi engineers, including design collaboration and code reviews


Take the first step towards your dream career with Boomi
Essential Requirements
Expert in defining, measuring, and improving Reliability Metrics (SLO/SLI/ Error budgets)
  • Strong in implementing observability practices (Monitoring, Logging, Distributed Tracing etc.) preferably using New Relic and Splunk. Experience not limited to using the dashboards, but creating them from scratch.
  • Passionate about SRD Automation and infrastructure platforms. Expert in developing Ansible playbooks and automation for Infrastructure as code using Terraform and Cloud Formation Templates and Python.
  • Experience in conducting and automating DR exercise in AWS cloud thus validating RPOs and RTOs.
  • Strong understanding and working experience with AWS components.
  • Ability to design and implement API's for use by internal teams.


Desirable Requirements
  • 7+ years' experience in the software engineering industry, with experience supporting large scale software systems in production.
  • Experience actively in detecting, remediating and reporting on Production incidents, ensuring the SLAs/ SLOs are defined and metand participate in on-call rotation to ensure coverage for planned/unplanned events.
  • Certified in Cloud (AWS/Azure/GCP/Oracle), experience in using services such as computers, containers and databases.
  • Experience in Observability, creating dashboards for SLA/SLI/SLO
  • Experience in Ansible/Terraform and Python.
  • A grasp of Cloud Native concepts, containerization best practices and security awareness in Cloud will be a strong plus.

Sign up for Job Alerts