Senior Software Engineer I -Agentic AI
About the Role
DigitalOcean's Agentic AI organisation provides a powerful inference cloud, Managed Agents, and robust Feedback systems that enable customers to run AI inference confidently at scale. We are looking for a Senior Software Engineer I to join our Feedback Systems team. This team is responsible for building the scalable backend infrastructure that tests and evaluates AI agents safely and reliably.
In this role, you will help design and develop high-throughput backend systems that orchestrate complex execution workflows, interface with isolated execution environments, and process evaluation signals. You will work on solving distributed systems problems at the intersection of infrastructure orchestration and asynchronous data flows, ensuring our platforms are robust, scalable, and highly available.
What You'll Do:
- Designing, building, and maintaining robust backend services and highly concurrent asynchronous workflows, primarily in Python and Go.
- Integrating backend control planes with isolated, secure execution environments to safely run agents and capture execution artifacts.
- Building and operating scalable APIs (gRPC, REST) to serve as the connective tissue across data pipelines, evaluation engines, and internal platforms.
- Collaborating closely with cross-functional teams to ensure infrastructure reliably supports complex evaluation scenarios and outcome metrics.
- Driving engineering best practices, including code reviews, comprehensive testing, and technical design documentation.
- Monitoring system performance, investigating bottlenecks, and ensuring high reliability of orchestration systems.
What You'll Add to DigitalOcean:
- 5+ years of software engineering experience building highly concurrent, fault-tolerant distributed systems.
- Deep proficiency in Python and Go (or strong experience in C++/Java with a willingness to master Python and Go quickly).
- Hands-on experience with durable workflow orchestration engines for managing distributed state and long-running asynchronous tasks.
- Strong working knowledge of containerization, virtualization, or secure workload isolation technologies (e.g., Docker, Kubernetes, or microVMs).
- Experience building resilient APIs and handling high-throughput event or message data.
- A strong sense of ownership, excellent communication skills, and the ability to work effectively in a globally distributed team.
- AI/ML Ecosystem Awareness: While you do not need to be an ML researcher, you possess a strong foundational understanding of interacting with LLM APIs, prompt constraints, and the unique architectural challenges of testing non-deterministic systems.
This is a hybrid role based out of Hyderabad, India.
#LI-Hybrid