Posted 20 June, 2026

Data Engineer

World Bank

Chennai,IN Full Time

Reference: 7_688666_37213

Do you want to build a career that is truly worthwhile? Working at the World Bank Group provides a unique opportunity for you to help our clients solve their greatest development challenges. The World Bank Group is one of the largest sources of funding and knowledge for developing countries; a unique global partnership of five institutions dedicated to ending extreme poverty, increasing shared prosperity and promoting sustainable development. With 189 member countries and more than 130 offices worldwide, we work with public and private sector partners, investing in groundbreaking projects and using data, research, and technology to develop solutions to the most urgent global challenges. For more information, visit www.worldbank.org

ITS Vice Presidency Context

The Information and Technology Solutions (ITS) Vice Presidential Unit (VPU) enables the World Bank Group to achieve its mission of ending extreme poverty and boost shared prosperity on a livable planet by delivering transformative information and technologies to its staff working in over 150+ locations. For more information on ITS, see this video: https://www.youtube.com/watch?reload=9&v=VTFGffa1Y7w

Unit Context:

The ITS Data Office is the central entity within the World Bank Group's Information and Technology Solutions (ITS) department responsible for enabling data, AI, information, and knowledge capabilities across the institution. It comprises four Units focused on platforms & tools, product & service delivery, enablement and governance. The office plays a pivotal role in advancing the Bank's digital transformation, supporting business domains with trusted data, information and AI capabilities, and fostering a culture of responsible innovation.

The Platforms & Tools unit is responsible for building, integrating, and continuously modernizing the foundational technology infrastructure that powers data, AI, archives, and knowledge services across the World Bank Group. The unit leads the rationalization and simplification of legacy systems, and modernization towards platforms that are scalable, secure, interoperable, and designed for self-service and adoption. The unit plays a critical role in enabling enterprise-wide transformation by delivering data environments, digitization infrastructure, and open knowledge repositories that are AI-ready and aligned with business needs.

Duties and accountabilities:

Role Purpose:

The Data Engineer is responsible for designing, building, and maintaining the data infrastructure that supports the organization's data-driven decision-making processes. With limited supervision, this role develops ETL processes, optimizes data retrieval performance, and collaborates with stakeholders to gather and understand data requirements, ultimately supporting the organization's data integration and transformation initiatives.

Key Responsibilities:

Data Pipeline Development

Design, develop, and maintain data pipelines for ingestion, transformation, and serving across batch and streaming workloads

Build ETL/ELT workflows to integrate data from diverse sources into enterprise data platforms

Develop data transformation logic using Apache Spark, PySpark, SparkSQL, and SQL

Implement change data capture (CDC) patterns for real-time and near-real-time data synchronization

Build streaming data pipelines for real-time analytics and operational use cases

Optimize pipeline performance, resource utilization, and cost efficiency

Federated Data Pipelines & Domain Enablement

Support federated data pipeline architecture that enables Line of Business (LOB) teams to own and manage their domain data

Contribute to self-serve data infrastructure that abstracts complexity and allows domain teams to build pipelines independently

Develop standardized pipeline deployment patterns that LOB teams can adopt while maintaining autonomy

Support domain teams in building data products that are discoverable, interoperable, and compliant with enterprise standards

Enable distributed data processing across domains while ensuring consistency through federated governance

Assist in establishing data contracts and interoperability standards that allow seamless data sharing across domains

Support the balance between domain autonomy and enterprise-wide governance requirements

Templates, Blueprints & Patterns

Develop reusable pipeline templates and Infrastructure as Code (IaC) patterns for common data product types

Create blueprints for data ingestion, transformation, quality validation, and serving that LOB teams can customize

Build standardized patterns for batch pipelines, streaming pipelines, CDC implementations, and API-based integrations

Contribute to a pattern library covering medallion architecture, dimensional modeling, and data product packaging

Document best practices and reference architectures that guide LOB teams in building compliant, high-quality pipelines

Develop starter kits and accelerators that reduce time-to-value for domain teams building new data products

Create cookbooks and implementation guides that translate enterprise standards into actionable steps

Support LOB teams in adopting templates while allowing appropriate customization for domain-specific needs

Data Integration

Integrate data from multiple internal and external sources into unified data assets

Build reusable data integration patterns and connectors for enterprise data sources

Implement data ingestion using Auto Loader, COPY INTO, and other ingestion frameworks

Develop API-based data integrations and file-based data processing workflows

Ensure data consistency and reliability across integrated sources

Support data migration efforts and legacy system integrations

Data Modeling & Transformation

Implement medallion architecture patterns (bronze, silver, gold) for data organization and quality progression

Develop dimensional models, fact tables, and aggregations for analytics use cases

Build data transformation logic that ensures accuracy, consistency, and business alignment

Create reusable transformation components and modular pipeline designs

Optimize data models for query performance and consumption patterns

Support schema evolution and data versioning requirements

Data Quality & Testing

Implement data quality checks, validation rules, and automated testing within pipelines

Develop data profiling and anomaly detection to identify quality issues

Build data reconciliation processes to ensure accuracy across systems

Implement unit testing, integration testing, and regression testing for pipelines

Monitor data quality metrics and remediate issues proactively

Document data quality rules and thresholds for pipeline outputs

Data Observability & Operations

Implement logging, monitoring, and alerting for pipeline health and performance

Build dashboards to track pipeline execution, data freshness, and quality metrics

Develop automated error handling, retry logic, and failure notifications

Support incident response and troubleshooting for pipeline failures

Implement data lineage tracking to support auditability and impact analysis

Ensure pipelines meet SLAs for data availability and freshness

Analytics & AI Enablement

Build data pipelines that enable analytics, reporting, and business intelligence use cases

Prepare and serve data for machine learning and AI workloads

Develop feature engineering pipelines for ML model development

Create semantic layers and curated datasets that enable self-service analytics

Support integration with analytics tools including Power BI and Tableau

Build data products with clear documentation and consumption guidance

Collaboration & Enablement

Partner with data architects to align pipeline development with architectural standards

Collaborate with business analysts and data scientists to understand data requirements

Work with platform engineers to leverage platform capabilities effectively

Contribute to technical documentation, runbooks, and knowledge sharing

Support data consumers in understanding and accessing data assets

Participate in code reviews and follow engineering best practices

Coaching & Technical Mentorship

Support data engineering delivery with contractor and consultant teams under guidance from senior team members

Contribute to knowledge-sharing sessions and workshops to build data engineering capability across LOB teams

Document best practices, lessons learned, and technical standards for data engineering

Stay current with industry trends in data mesh, federated architectures, and cloud data services

Share insights and learnings with the broader team to foster continuous improvement

Continuous Improvement

Assist in evaluating emerging data engineering technologies, frameworks, and tools

Identify opportunities to enhance pipeline performance, reliability, and cost efficiency

Contribute to the evolution of best practices and standards for data engineering

Propose automation opportunities to reduce manual effort and improve consistency

Other duties as assigned

Apply to this Job

Data Engineer

Sign up for Job Alerts

Share this Job