GCP Data Engineer
-
Data Pipeline Development:
Design, develop, and maintain scalable data pipelines to extract, transform, and load (ETL) data using GCP services (e.g., Cloud Dataflow, Cloud Dataproc, Cloud Composer, BigQuery).
Implement real-time and batch data processing solutions to support analytics, machine learning, and reporting requirements.
-
Data Integration:
Work with various data sources (structured, semi-structured, and unstructured) and integrate them into GCP data environments.
Utilize GCP storage services like Cloud Storage, BigQuery, and Cloud SQL to store and manage large datasets.
Integrate data across multiple systems, including APIs, databases, and third-party data sources.
-
Cloud Infrastructure:
Leverage GCP infrastructure services such as Google Kubernetes Engine (GKE), Compute Engine, and Cloud Functions to manage data workflows and processing.
Ensure data storage, processing, and workflows are optimized for performance and cost efficiency.
-
Data Modeling and Schema Design:
Design and implement efficient data models and schemas for structured and unstructured data to ensure scalability and flexibility.
Develop and maintain best practices for data modeling, data quality, and governance.
-
Data Security & Compliance:
Implement data security measures using GCP tools (e.g., IAM, Cloud Identity, Cloud KMS) to ensure data privacy and compliance with industry standards.
Perform regular audits and risk assessments to ensure that the data architecture complies with relevant security policies and regulatory requirements.
-
Collaboration & Communication:
Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and deliver solutions that meet business needs.
Communicate technical concepts and results effectively to non-technical stakeholders.
-
Performance Optimization:
Monitor and optimize the performance of data pipelines, queries, and storage, ensuring cost-efficient solutions.
Conduct performance tuning and troubleshooting of data workflows to ensure reliability and uptime.