Skip to main content
Posted 16 June, 2026

Data Engineer | AI | Visualization

Pashtek • Salesforce Partner | Data & AI
Chennai, TN, IN Full Time
Reference: f55388d466d4ccad

Job Description

About the Role

We are looking for a Data Engineer with strong experience in data lakehouse platforms, AI Studio implementations, and visualization tools. In this role, you will build and operate data pipelines, tables, semantic layers, dashboards, and AI-ready data products across cloud and on-premises environments.

You will work with engineering, analytics, AI, security, and business teams to deliver governed, high-performance data products that support reporting, AI applications, and enterprise decision-making.

What You’ll Do
  • Build data pipelines: Develop reliable ELT/ETL pipelines using Spark, SQL, dbt, Airflow, AWS Glue, or similar tools to ingest data from on-premises and cloud systems.
  • Implement lakehouse tables: Create and maintain lakehouse tables using Apache Iceberg, Delta Lake, or Hudi with support for ACID transactions, schema evolution, partitioning, and time travel.
  • Work with data lakehouse platforms: Support data platforms such as Databricks, Snowflake, AWS Lake Formation, Glue Catalog, Hive Metastore, Polaris/REST catalogs, Trino, Starburst, Dremio, and related technologies.
  • AI Studio implementations: Support AI Studio use cases by preparing AI-ready datasets, creating reusable data products, enabling metadata-driven discovery, and integrating structured data with AI workflows, agents, and chat-based analytics.
  • Visualization and BI development: Build and support dashboards, reports, semantic models, and data marts using Power BI, Tableau, QuickSight, Looker, or similar visualization tools.
  • Model data for analytics: Design dimensional models, semantic layers, and domain-oriented data products for reporting, analytics, and AI consumption.
  • Governance and security: Apply cataloging, lineage, PII classification, IAM roles, RBAC/ABAC, masking, row-level security, and column-level security.
  • Performance and cost tuning: Optimize queries, partitions, clustering, file sizing, compaction, caching, and compute workloads to improve performance and reduce cost.
  • Streaming and CDC: Build real-time and near-real-time pipelines using Kafka, MSK, Kinesis, Spark Structured Streaming, Debezium, DMS, or Fivetran.
  • Migration projects: Support migration from legacy platforms such as Informatica, SSIS, SAP BW, SQL Server, Oracle, Hadoop, Netezza, or Teradata to modern lakehouse platforms.
  • Quality and observability: Implement data quality checks, validation rules, alerts, lineage tracking, monitoring dashboards, and SLA/SLO reporting.
  • DevOps for data: Use Git, CI/CD, Terraform, CloudFormation, GitHub Actions, GitLab, or Azure DevOps to version, test, and deploy data workloads.
  • Documentation: Create clear technical documentation, data contracts, runbooks, and implementation guides for engineering and business users.
Required Experience
  • 5+ years of experience in data engineering, data platforms, analytics engineering, or similar roles.
  • Strong hands-on experience with SQL, Spark, Python, and cloud-based data platforms.
  • Experience with AWS data services such as S3, Glue, EMR, IAM, Lake Formation, Athena, or Redshift.
  • Experience with at least one major data platform such as Databricks, Snowflake, Dremio, Starburst/Trino, or EMR Spark.
  • Experience with open table formats such as Apache Iceberg, Delta Lake, or Apache Hudi.
  • Strong understanding of data modeling, dimensional modeling, semantic layers, ELT/ETL design, and SQL performance tuning.
  • Experience with BI and visualization tools such as Power BI, Tableau, QuickSight, or Looker.
  • Exposure to AI Studio, AI agents, chat-based analytics, or AI-ready data product development is preferred.
  • Knowledge of governance and security concepts including RBAC, row-level security, column-level security, masking, encryption, and key management.
  • Experience with CI/CD and infrastructure-as-code for data workloads.
  • Strong communication skills and ability to work with business users, analysts, platform engineers, and stakeholders.
Nice to Have
  • Experience with dbt, Airflow, Great Expectations, Deequ, OpenLineage, or Marquez.
  • Experience with streaming platforms such as Kafka, MSK, Kinesis, or Flink.
  • Experience with data catalogs and governance platforms such as AWS Glue Data Catalog, Unity Catalog, DataHub, Atlan, Collibra, or Amundsen.
  • Experience building semantic models and governed datasets for Power BI or Tableau.
  • Experience supporting AI/ML, GenAI, AI Studio, agentic workflows, or natural language analytics.
  • Knowledge of compliance standards such as SOC 2, HIPAA, GDPR, PCI, or ISO.
  • Experience with multi-tenant platforms, federated governance, or enterprise data product frameworks.
Location & Work Style

This is a full-time role based in Chennai, India . The role may be remote or hybrid depending on project needs. Candidates should be comfortable collaborating with global teams and working with overlapping hours as needed.

Ideal Candidate

The ideal candidate is a hands-on data engineer who understands modern lakehouse architecture, AI-ready data products, and business-facing analytics. You should be comfortable building pipelines, tuning performance, supporting BI tools like Power BI and Tableau, and contributing to AI Studio and data platform implementations.

Sign up for Job Alerts