Posted 11 June, 2026
Intern - Data Engineer
bluCognition
West Bunghmun, MZ, IN
Full Time
Reference: 750dc85b040d70b7
Job Description
Data Engineer / Analytics Engineer\n\nAbout bluCognition: bluCognition is an AI/ML based start-up specializing in risk analytics, data conversion and data enrichment capabilities. Founded in 2017, by some very senior professionals from the financial services industry, the company is headquartered in the US, with the delivery centre based in Pune.\n\nWe build all our solutions while leveraging the latest technology stack in AI, ML and NLP combined with decades of experience in risk management at some of the largest financial services firms in the world. Our clients are some of the biggest and the most progressive names in the financial services industry.\n\nWe are entering a significant growth phase and are looking for motivated and analytical freshers who want to join us in this exciting journey.\n\nWhat will your role involve?\n• Work with large structured datasets using SQL and PySpark.\n• Build, maintain, and optimize ETL/data processing pipelines.\n• Assist in business/entity matching logic and fuzzy matching implementations.\n• Create and validate analytical datasets for model development and reporting.\n• Perform data cleaning, transformation, aggregation, and quality checks.\n• Write efficient SQL queries using joins, CTEs, window functions, and aggregations.\n• Support feature engineering for ML/risk modeling use cases.\n• Work on incremental data processing and monthly/daily refresh strategies.\n• Analyze data discrepancies, debug pipeline failures, and improve reliability.\n• Collaborate with analytics, data science, and engineering teams.\n• Participate in testing, deployment, and code review activities.\n\nTo help us level up, you will ideally have:\n• A background in Computer Science, Data Science, Statistics, Mathematics, or a related field.\n• Strong SQL knowledge — joins, CTEs, aggregations, CASE statements, and window functions.\n• Basic understanding of Python and familiarity with PySpark or distributed data processing concepts.\n• Understanding of relational databases, data structures, and ETL/data pipeline concepts.\n• Exposure to AWS or cloud platforms such as Redshift, Spark, Hadoop, or Databricks is a plus.\n• Familiarity with Git/version control and basic understanding of APIs.\n• An analytical mindset and strong problem-solving skills, with attention to detail and data accuracy.\n• The ability to work in a fast-paced environment and to deal with ambiguity.\n• Strong communication, documentation, and collaboration skills across multiple teams.