Intern - Data Engineer
Job Description
Our clients are some of the biggest and the most progressive names in the financial services industry. We are entering a significant growth phase and are looking for motivated and analytical freshers who want to join us in this exciting journey. What will your role involve?
• Work with large structured datasets using SQL and PySpark. • Build, maintain, and optimize ETL/data processing pipelines. • Assist in business/entity matching logic and fuzzy matching implementations.
• Create and validate analytical datasets for model development and reporting. • Perform data cleaning, transformation, aggregation, and quality checks. • Write efficient SQL queries using joins, CTEs, window functions, and aggregations.
• Support feature engineering for ML/risk modeling use cases. • Work on incremental data processing and monthly/daily refresh strategies. • Analyze data discrepancies, debug pipeline failures, and improve reliability.
• Collaborate with analytics, data science, and engineering teams. • Participate in testing, deployment, and code review activities. To help us level up, you will ideally have: • A background in Computer Science, Data Science, Statistics, Mathematics, or a related field.
• Strong SQL knowledge — joins, CTEs, aggregations, CASE statements, and window functions. • Basic understanding of Python and familiarity with PySpark or distributed data processing concepts. • Understanding of relational databases, data structures, and ETL/data pipeline concepts.
• Exposure to AWS or cloud platforms such as Redshift, Spark, Hadoop, or Databricks is a plus. • Familiarity with Git/version control and basic understanding of APIs. • An analytical mindset and strong problem-solving skills, with attention to detail and data accuracy.
• The ability to work in a fast-paced environment and to deal with ambiguity. • Strong communication, documentation, and collaboration skills across multiple teams.