Sr. Software Engineer - Cloud Networking
About the Team
The Cloud Technology Infrastructure team at Roku is looking for a dedicated engineer to join our Cloud Infrastructure Operations group. As a pivotal team, we're committed to building a scalable, secure, and reusable infrastructure platform that empowers all Roku teams to deliver with speed and confidence. Embracing an 'Infrastructure as Code (IaC) First' methodology, we're eager to onboard a professional who's ready to partner across teams to build foundational components in Networking, Infrastructure, Identity Management, and Security.
About the Role
Are you passionate about streamlining the creation of Cloud infrastructure to achieve maximum efficiency? Do makeshift Cloud services that require manual configuration cause you concern? If you excel in roles that simplify processes for engineers, enabling them to take ownership with ease, and if you're an advocate for Infrastructure as Code (IaC), then this opportunity is tailored for you. Join us and play a key role in shaping the building blocks that will empower teams across Roku to innovate and thrive.
What you'll be doing
- Collaborate with development teams to enhance their cloud service management capabilities, ensuring they can independently deploy, monitor, and maintain their services.
- Implement monitoring and observability solutions that provide real-time insights into infrastructure health, enabling proactive issue resolution and system optimization.
- Participate in a shared On-Call rotation to support critical infrastructure incidents, IAC CI pipeline issues, and cloud networking escalations, ensuring high availability and rapid incident response.
- Design and implement network observability solutions that provide end-to-end traffic visibility across cloud networking-including Kubernetes, VPCs, routing, peering, and cross-cloud connectivity-by collaborating with the Platform team to provide granular, real-time insights into cloud-native networking, leveraging visualization platforms and telemetry data to support troubleshooting and performance optimization.
- Partner with technical leaders on the Infra Roadmap to fortify our infrastructure security, leveraging cutting-edge automation tools for proactive defense strategies.
- Develop frameworks into Infrastructure as Code (IaC) workflows, promoting reusable and adaptable infrastructure components deployed via CI/CD.
- Advocate for cloud cost optimization strategies, ensuring efficient resource utilization and cost-effective infrastructure management.
- Champion best practices across all functions, including development, QA, and Infrastructure/Operations, to foster a culture of excellence and continuous improvement.
We're excited if you have
- 7+ years of experience designing and operating scalable, resilient cloud-native systems, with a focus on automation, observability, and developer enablement.
- Deep expertise in public cloud networking (AWS/GCP), including VPC design, peering, routing, NAT gateways, and service endpoints.
- Developer skills in Python, Java, or Go writing applications and code that will bring together data from disparate systems and will work to help visualize data in a way that will make business decisions easier.
- Experience managing and debugging traffic flows between Kubernetes workloads and cloud-managed services, with a strong understanding of how service discovery, DNS, and overlay networks impact observability.
- Proven ability to architect and implement network mapping and visualization solutions (e.g., using tools like Istio telemetry, or custom Grafana dashboards) to provide actionable insights into traffic behavior and dependencies.
- Required experience building and managing complex networks in public cloud providers like AWS, GCP and integrating them with cloud-native platforms, such as Kubernetes.
- Hands-on advanced experience utilizing cloud-native services in large scale environments with AWS/GCP using Infrastructure-as-Code (Terraform) and other CI/CD tools/services.
- Experience enabling monitoring solutions to visualize and create alerts for core infrastructure with one or more of the following: Grafana, Prometheus, and Datadog.
- Experience driving timely consensus in design with other senior team members across a large organization.
- Experience with how to create infrastructure using CI, Continuous Delivery, and Continuous Deployment workflows; GitLab experience desired.
- Excellent written and verbal communication skills to help communicate complicated workflows to a broad and diverse user group.
-
B.S. or M.S. degree in Computer Science, Engineering, or equivalent.