Senior Software Engineer, Bigdata
About the Team:
The Data Foundations team plays a critical role in supporting Roku Ads business intelligence and analytics. The team is responsible for developing and managing foundational datasets designed to serve the operational and analytical needs of the broader organization. The team's mission is carried out through three focus areas: acting as the interface between data producers and consumers, simplifying data architecture, and creating tools in a standardized way.
About the Role:
We are seeking a talented and experienced Senior Software Engineer with a strong background in big data technologies, including Apache Spark and Apache Airflow. This hybrid role bridges software and data engineering, requiring expertise in designing, building, and maintaining scalable systems for both application development and data processing. You will collaborate with cross-functional teams to design and manage robust, production-grade, large-scale data systems. The ideal candidate is a proactive self-starter with a deep understanding of high-scale data services and a commitment to excellence.
What you'll be doing
- Software Development:
- Write clean, maintainable, and efficient code, ensuring adherence to best practices through code reviews.
- Big Data Engineering:
- Design, develop, and maintain data pipelines and ETL workflows using Apache Spark, Apache Airflow.
- Optimize data storage, retrieval, and processing systems to ensure reliability, scalability, and performance.
- Develop and fine-tune complex queries and data processing jobs for large-scale datasets.
- Monitor, troubleshoot, and improve data systems for minimal downtime and maximum efficiency.
- Collaboration & Mentorship:
- Partner with data scientists, software engineers, and other teams to deliver integrated, high-quality solutions.
- Provide technical guidance and mentorship to junior engineers, promoting best practices in data engineering.
We're excited if you have
- Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent experience).
- 5+ years of experience in software and/or data engineering with expertise in big data technologies such as Apache Spark, Apache Airflow and Trino.
- Strong understanding of SOLID principles and distributed systems architecture.
- Proven experience in distributed data processing, data warehousing, and real-time data pipelines.
- Advanced SQL skills, with expertise in query optimization for large datasets.
- Exceptional problem-solving abilities and the capacity to work independently or collaboratively.
- Excellent verbal and written communication skills.
- Experience with cloud platforms such as AWS, GCP, or Azure, and containerization tools like Docker and Kubernetes. (preferred)
- Familiarity with additional big data technologies, including Hadoop, Kafka, and Presto. (preferred)
- Strong programming skills in Python, Java, or Scala. (preferred)
- Knowledge of CI/CD pipelines, DevOps practices, and infrastructure-as-code tools (e.g., Terraform). (preferred)
- Expertise in data modeling, schema design, and data visualization tools. (preferred)
- AI literacy and curiosity.You have either tried Gen AI in your previous work or outside of work or are curious about Gen AI and have explored it.