Skip to main content
Posted 12 June, 2026

AI Infrastructure Systems/Solutions Architect

BayOne
Gurugram,Haryana,India Full Time
Reference: 365_553037_25-09400

About the Role

We are looking for a Systems or Solutions Architect with deep expertise in networking, infrastructure-as-a-service (IaaS), and cloud-scale system design to help architect and optimize AI/ML infrastructure.
The ideal candidate combines strong fundamentals in cloud architecture (AWS or equivalent), networking, compute, and storage, with hands-on experience in Kubernetes, observability, and automation.
You'll design scalable systems that support large AI workloads - enabling efficient training, inference, and data pipelines across distributed environments.

Key Responsibilities

  • Architect and scale AI/ML infrastructure across public cloud (AWS / Azure / GCP) and hybrid environments.
  • Design and optimize compute, storage, and network topologies for distributed training and inference clusters.
  • Build and manage containerized environments using Kubernetes, Docker, and Helm.
  • Develop automation frameworks for provisioning, scaling, and monitoring infrastructure using Python, Go, and IaC (Terraform / CloudFormation).
  • Partner with data science and ML Ops teams to align AI infrastructure requirements (GPU/CPU scaling, caching, throughput, latency).
  • Implement observability, logging, and tracing using Prometheus, Grafana, CloudWatch, or Open Telemetry.
  • Drive networking automation (BGP, routing, load balancing, VPNs, service meshes) using software-defined networking (SDN) and modern APIs.
  • Lead performance, reliability, and cost-optimization efforts for AI training and inference pipelines.
  • Collaborate cross-functionally with product, platform, and operations teams to ensure secure, performant, and resilient infrastructure.

Required Qualifications

  • Knowledge of AI/ML infrastructure patterns, including distributed training, inference pipelines, and GPU orchestration.
  • Bachelor's or Master's degree in Computer Science, Information Technology, or related field.
  • 10+ years of experience in systems, infrastructure, or solutions architecture roles.
  • Deep understanding of:
    • Cloud architecture: AWS (preferred), Azure, or GCP
    • Networking: VPC, Transit Gateway, DNS, routing, peering, load balancing, VPN
    • Compute and storage: EC2, ECS/EKS, S3, EBS, EFS, FSx, caching systems
    • Core infrastructure: virtualization, containers, distributed systems, and OS-level tuning
  • Proficiency in Linux systems engineering and scripting with Python and Bash.
  • Experience with Kubernetes (EKS/GKE/AKS) for large-scale workload orchestration.
  • Experience with Go (Golang) for infrastructure or network automation.
  • Familiarity with Infrastructure-as-Code (IaC) tools like Terraform, Ansible, or CloudFormation.
  • Experience implementing monitoring and observability systems (Prometheus, Grafana, ELK, Datadog, CloudWatch).

Preferred Qualifications

  • Experience with DevOps and MLOps ecosystems (SageMaker, Kubeflow, MLflow, Airflow).
  • AWS or cloud certifications such as Solutions Architect Professional or Advanced Networking Specialty.
  • Experience in performance benchmarking, security hardening, and cost optimization for compute-intensive workloads.
  • Strong collaboration skills and ability to communicate complex infrastructure concepts clearly.

Sign up for Job Alerts