Skip to main content
Posted 22 May, 2026

Data Engineer (Snowflake)

ClifyX
India Full Time
Reference: 365_594563_26-03477

Description

Detailed JD
(Roles and Responsibilities)
• Build and maintain scalable ETL/ELT pipelines using Apache Spark (Scala/Python) to process raw data from AWS S3 and deliver it to Snowflake via Snowpipe.
• Apply expertise in big data file formats (Parquet, Avro) and handling procedures to optimize storage. Utilize geospatial libraries to process and index location-based telemetry at scale.
• Engineer robust filtering and anomaly detection layers to scrub "noise" from source data, ensuring high-quality inputs for downstream Datasets, Machine Learning models and analytics.
• Conduct deep-level optimization of SQL and Python/Spark code to reduce execution runtime and minimize cloud compute costs (Snowflake credits/AWS EMR).
• Leverage modern AI tools and analytical frameworks to accelerate data exploration, automate feature engineering and support predictive modeling workflows.
• Manage scheduled tasks and orchestration to ensure the seamless delivery of materialized data for BI and data science teams.

Mandatory skills
Apache Spark using Scala or Python, Advanced SQL and Python skills, Strong understanding of AWS ecosystem (S3, IAM, EMR/Glue) and Snowflake architecture

Desired skills
• Expert-level proficiency in Apache Spark using Scala or Python for complex data transformations.
• Hands-on experience with geospatial libraries (e.g. GeoPandas, PySpark-Magellan), H3 and spatial indexing.
• Advanced SQL and Python skills, specifically focused on query plan optimization and memory management
• Strong understanding of AWS ecosystem (S3, IAM, EMR/Glue) and Snowflake architecture
• Deep knowledge of file partitioning, bucketing, and compression strategies for petabyte-scale environments.
• Background in Snowflake Task optimization and Snowpipe troubleshooting.
• Familiarity in using LLM, ML functions in snowflake

Sign up for Job Alerts