Software Engineer III - Data Engineer, Databricks
We have an exciting and rewarding opportunity for you to take your software engineering career to the next level.
As a Software Engineer III at JPMorgan Chase within the Asset & Wealth Management, you serve as a seasoned member of an agile team to design and deliver trusted market-leading technology products in a secure, stable, and scalable way. You are responsible for carrying out critical technology solutions across multiple technical areas within various business functions in support of the firm's business objectives.
Job responsibilities
- Designs, build, and maintain batch and (as needed) streaming data pipelines using Databricks.
- Develops and optimize ETL/ELT workflows using PySpark / Spark SQL and Databricks workflows/jobs.
- Implements data modeling (bronze/silver/gold patterns), curation, and dataset publishing for analytics and consumption.
- Tunes and optimize Spark jobs for performance, cost, and scalability (partitioning, file sizing, caching, joins, etc.).
- Ensures strong data quality through validations, reconciliations, monitoring, and alerting.
- Works with stakeholders (data analysts, data scientists, product, and engineering teams) to translate requirements into data solutions.
- Implements and follow CI/CD and SDLC practices for data engineering code (testing, code reviews, version control).
- Supports production operations: incident triage, root-cause analysis, and pipeline reliability improvements.
- Contributes to documentation, standards, and reusable frameworks to improve team productivity.
Required qualifications, capabilities, and skills
- Formal training or certification on software engineering concepts and 3+ years applied experience
- Hands-on experience in Data Engineering.
- Strong experience with Databricks (jobs/workflows, notebooks, clusters, performance tuning).
- Proficiency in Python and SQL; strong hands-on in PySpark/Spark SQL.
- Experience in Data modeling, ETL/ELT, performance tuning, data quality, monitoring, troubleshooting.
- Solid understanding of data pipeline architecture, orchestration concepts, and dependency management.
- Experience working with data lakes/lakehouse storage patterns and file formats (e.g., Parquet).
- Familiarity with Git-based workflows and engineering best practices.
Preferred qualifications, capabilities, and skills
- AI/ML exposure as an added advantage: experience supporting ML workflows by building feature datasets, training/serving data pipelines, or enabling model monitoring and experimentation (e.g., working with data scientists on reproducible data inputs, feature engineering, and ML-ready tables).
-
Familiarity with ML ecosystem/tools is a plus (examples: MLflow, Databricks model registry, notebooks-based experimentation), and understanding of basic ML concepts (training vs inference, leakage, drift).
Experience with Delta Lake features (ACID tables, time travel, optimization).
-
Exposure to streaming (e.g., Spark Structured Streaming) and event-driven patterns.
Experience with cloud platforms (AWS/Azure/GCP) and cloud storage integrations.
-
Knowledge of data governance, access controls, and secure handling of sensitive data.
Familiarity with orchestration tools (e.g., Airflow or similar) and supporting production-grade data platforms (monitoring, SLAs, on-call rotations).