Skip to main content
Posted 12 June, 2026

SDE IV - GPU Engineer

Glance
Bangalore Full Time
Reference: 102_737043_7213752

About the Role

As a GPU Systems Engineer, you'll lead design and optimization efforts across our GPU inference stack.
You will architect the libraries and runtime systems that enable Stable Diffusion, multimodal transformers, and emerging video generation models to run efficiently at scale.

You'll guide cross-functional teams, influence hardware selection, and set the technical vision for GPU optimization practices across the company.

Key Responsibilities

  • Architect high-performance inference runtimes, kernel dispatchers, and memory planners for large diffusion and transformer workloads.
  • Lead investigations into cross-GPU performance bottlenecks, communication overheads, and scheduling inefficiencies.
  • Drive multi-GPU parallelism strategies - model, pipeline, and tensor parallelization.
  • Establish company-wide GPU optimization standards, tooling, and SLIs.
  • Collaborate with research to design scalable implementations of novel architectures.
  • Mentor engineers in profiling, tuning, and low-level optimization.
  • Partner with hardware vendors and infra teams to maximize cluster utilization.

Required Qualifications

  • 5+ years in high-performance computing, GPU runtime systems, or ML infrastructure.
  • Proven expertise in CUDA / Triton / C++, with deep understanding of GPU scheduling, occupancy, register usage, and tensor cores.
  • Experience building and maintaining distributed inference or training systems.
  • Ability to design abstractions balancing flexibility and performance.
  • Strong knowledge of NCCL, NVLink, PCIe, and interconnects.
  • Familiar with profiling automation and performance dashboards.
  • Excellent technical leadership and mentoring capabilities.

Preferred Qualifications

  • Background in compiler-aided optimization (TVM, XLA, MLIR, Triton).
  • Experience tuning Stable Diffusion or transformer inference pipelines.
  • Exposure to heterogeneous compute backends (AMD ROCm, TPU, ASICs).
  • Experience working with hardware-software co-design initiatives.
  • Open-source or research contributions in GPU optimization

Sign up for Job Alerts