Data Architect
Job Description
Job Description – Data Architect
Data Architect – Data & Insights
Summary
We are looking for a highly experienced Data Architect (Grade 7A) to lead the design and delivery of the enterprise Data & Insights platform on Microsoft Azure . The role demands deep expertise in cloud data architecture , data models , data lake/lakehouse , vector-store based RAG systems , GenAI , Agentic AI , and governance using Microsoft Purview .
The ideal candidate has strong hands-on skills in Python , LangChain , LangGraph , Azure data services, and end-to-end SDLC execution.
Roles & Responsibilities
- Architect and design Azure-based data lake/lakehouse platforms , domain data models, and ingestion-to-consumption pipelines.
- Develop conceptual, logical, and physical cloud data models aligned with enterprise standards.
- Architect RAG pipelines including embeddings, chunking, vector stores, hybrid retrieval, reranking, and evaluation.
- Build Agentic AI workflows using LangChain and LangGraph; design tool orchestration, memory, and safety layers.
- Implement governance with Microsoft Purview for cataloging, lineage, PII tagging, and policy enforcement.
- Ensure platform security using Entra ID, private endpoints, VNETs, Key Vault, and encryption controls.
- Lead solution architecture reviews, performance tuning, cost optimization, and NFR engineering.
- Oversee CI/CD (Azure DevOps), IaC (Terraform/Bicep), and observability (Azure Monitor, App Insights).
- Mentor engineering teams and standardize best practices, patterns, and reusable components.
Technical Skills
Mandatory
- Azure Data Platform : ADLS Gen2, Synapse/Serverless SQL, Databricks/Spark, ADF/Synapse Pipelines
- Programming : Python, PySpark, SQL
- GenAI & Agentic AI : RAG architecture, vector stores (Azure Cognitive Search, Pinecone, Weaviate, Qdrant), embeddings, reranking
- Frameworks : LangChain, LangGraph
- Data Modeling : Conceptual/logical/physical models, Delta/Parquet patterns, lakehouse modeling
- Data Governance : Microsoft Purview (catalog, lineage, classification, glossary, PII governance)
- Security : Entra ID, RBAC/ABAC, Key Vault, VNET integration, encryption
- SDLC & DevOps : Azure DevOps (CI/CD), Terraform/Bicep, ADRs, HLD/LLD documentation
- Performance & Cost Optimization across compute, storage, vector workloads, and pipelines
Preferred
- Azure Fabric / OneLake; Power BI semantic modeling
- dbt for transformations and testing
- Cosmos DB, PostgreSQL, SQL Server MI
- Knowledge graphs (Neo4j) and graph-based retrieval
- LLMOps: evaluation, telemetry, safety assessment, drift monitoring
- FinOps optimization practices
- Multi-cloud experience (AWS/GCP equivalents)
- API design: REST, GraphQL, gRPC
Qualifications
- Bachelor’s or Master’s degree in Engineering, Computer Science, or related discipline.
- 12–14 years of total experience with minimum 5+ years in cloud data architecture.
- Proven experience delivering Azure-based data platforms and production-grade GenAI/RAG systems .
Top 5 Screening Points for Recruiter
1. Strong Azure Data Platform Architecture Experience
Look for explicit, hands-on experience with Azure Data Lake Gen2 , Synapse/Serverless SQL , Databricks/Spark , and Azure Data Factory/Synapse Pipelines .
Keywords to match: ADLS, Synapse, Databricks, Spark, ADF, Delta Lake, Lakehouse.
2. Proven Expertise in GenAI, RAG & Vector Stores
Candidate must have real project experience building Retrieval-Augmented Generation (RAG) systems with vector databases.
Keywords to match: RAG, vector stores, embeddings, hybrid search, Azure Cognitive Search (vector), Pinecone, Weaviate, Qdrant, LangChain, LangGraph.
3. Solid Data Modeling & Architecture Background
Candidate should demonstrate ability to design conceptual, logical, physical data models , and lakehouse architectures.
Keywords to match: data modeling, dimensional modeling, canonical models, Delta/Parquet, partitioning, Z-order, architecture diagrams.
4. Microsoft Purview & Data Governance Experience
Specific experience implementing cataloging, lineage, PII tagging, and governance frameworks.
Keywords to match: Purview, data governance, lineage, catalog, data quality, PII/PHI classification, access policies.
5. Strong Python, PySpark, SQL + Hands-on Technical Delivery
Look for strong programming skills and end‑to‑end SDLC ownership.
Keywords to match: Python, PySpark, SQL, CI/CD (Azure DevOps), Terraform/Bicep, HLD/LLD, architecture reviews, full-lifecycle delivery.
Optional Recruiter Tip (Very Useful)
Reject profiles that only list:
'4C; “Azure”, “GenAI”, or “LangChain” without project details or outcomes .
Prioritize candidates who describe actual implementations , not tool familiarity.