Skip to main content
Posted 16 June, 2026

Lead Data Engineer - Scala/Spark

Xebia
Bengaluru, KA, IN Full Time
Reference: 721fda472816dff4

Job Description

Job Title : Lead Data Engineer - Scala/Spark\nJob location : Bengaluru\nExp Range : 5-14 years\nNotice Period : immediate - 15 days\n\nWe are seeking a Senior Data Engineer with deep expertise in Scala-based Spark development and end-to-end\ndeployment of data pipelines on Kubernetes cluster, orchestrated via Airflow. The ideal candidate should have\na strong software engineering foundation, excellent understanding of distributed systems, proficient in\nsoftware design, modern project/code structuring skills, with good understanding on CI/CD processes and\nimplementation which enables them to deliver reliable, scalable and robust data solutions. Should have overall\nexperience of minimum 6-8 years with minimum 5Years in Hadoop, Spark.\nKey Responsibilities:\n• Design & implement robust, scalable, batch & real-time data engineering solutions using Apache\nSpark (Scala) & Spark structure streaming.\n• Architect well-structured Scala projects using reusable, modular, and testable codebases aligned\nwith SOLID principles and clean architecture principles & practices.\n• Develop, Deploy & Manage Spark jobs on Kubernetes clusters, ensuring eTicient resource utilization,\nfault tolerance, and scalability.\n• Orchestrate data workflows using Apache Airflow — manage DAGs, task dependencies, retries, and\nSLA alerts.\n• Write and maintain comprehensive unit tests and integration tests for Pipelines / Utilities developed.\n• Work on performance tuning, partitioning strategies, and data quality validation.\n• Use and enforce version control best practices (branching, PRs, code review) and continuous\nintegration (CI/CD) for automated testing and deployment.\n• Write clear, maintainable documentation (README, inline docs, docstrings).\n• Participate in design reviews and provide technical guidance to peers and junior engineers.\nTechnical Skills:\nPrimary:\n• Languages: Scala, Java\n• Big Data Orchestration: Airflow, Spark on Kubernetes, Yarn, Oozie\n• Big Data Processing: Hadoop, Kafka, Spark & Spark Structured Streaming.\n• Experience on SOLID & DRY principles with Good Software Architecture & Design implementation\nexperience\n• Advanced Scala experience (e.g.

Functional Programming, using Case classes, Complex Data\nStructures & Algorithms)\n• Proficient in developing automated frameworks for unit & integration testing.\n• Experience with Docker and Helm and related container technologies.\n• Proficient in deploying and managing Spark workloads on Kubernetes clusters.\n• Experience in evaluation and implementation of Data Validation & Data Quality\n• Devops experience in Jenkins, Maven, Github, Github actions, CI/C

Sign up for Job Alerts