Skip to main content
Posted 16 June, 2026

AI Inference Junior Engineer WFH

Qubrid AI
Tiruppūr, TN, IN Full Time
Reference: 846689c7aa2feb4d

Job Description

Read everything carefully. The requirements and screening questions are critical and if not answered correctly and satisfactorily will result in auto-rejection and waste of your time.


  • Work from Home.
  • This is a full-time role. If you plan to do 2 or more jobs at the same time or want to do this part-time, that won't work for us. In that case please do not apply as it will get auto-rejected
  • Note - this job requires working late night India time until 4AM to overlap with USA working times. Do not apply if this timing doesn't work
  • Salary depends on experience and current verifiable (paychecks) compensation.
  • Junior candidates with 2 years experience are suitable


About Qubrid AI


Qubrid AI is building the next generation AI infrastructure platform that enables organizations to deploy, scale, and monetize AI workloads across cloud, on-premises, and hybrid environments. Our platform combines GPU cloud infrastructure, inference APIs, model deployment services, RAG pipelines, fine-tuning capabilities, and AI orchestration software into a unified AI stack.

We are seeking an experienced and hands-on AI Inference Engineer to design, optimize, and scale large-scale AI inference systems supporting thousands of concurrent users and enterprise AI workloads.


Role Overview


As an AI Inference Engineer, you will be responsible for deploying, optimizing, and operating open-source and commercial AI models across NVIDIA GPU infrastructure. You will work at the intersection of machine learning, distributed systems, GPU optimization, and cloud infrastructure to deliver low-latency, high-throughput AI services.


This is a highly technical role requiring deep expertise in LLM serving, GPU performance tuning, model optimization, inference frameworks, and large-scale production deployments.

Responsibilities


AI Model Deployment & Serving
  • Deploy and manage Large Language Models (LLMs), multimodal models, vision models, speech models, and embedding models in production.
  • Build and optimize inference pipelines for enterprise and public AI workloads.
  • Implement scalable serving architectures using modern inference frameworks.
  • Support model versioning, rollbacks, canary deployments, and A/B testing.


GPU Performance Optimization
  • Optimize GPU utilization, memory allocation, throughput, and latency.
  • Implement model quantization techniques including FP16, BF16, INT8, GPTQ, AWQ, and GGUF.
  • Tune inference workloads across NVIDIA H100, H200, B300, B200, A100, L40S, and other accelerator platforms.
  • Analyze bottlenecks using NVIDIA profiling and monitoring tools.
AI Infrastructure Engineering


  • Design scalable inference clusters using Kubernetes and containerized workloads.
  • Implement auto-scaling, load balancing, and fault-tolerant architectures.
  • Build GPU scheduling and resource allocation strategies.
  • Optimize multi-tenant AI serving environments.

  • Inference Framework Expertise


    • Deploy and optimize models using:
    • vLLM
    • NVIDIA TensorRT-LLM
    • Triton Inference Server
    • SGLang
    • TGI (Text Generation Inference)
    • Ollama
    • Ray Serve
    • OpenAI-compatible serving stacks
    • NVIDIA Dynamo


    Model Optimization
    • Implement batching, continuous batching, speculative decoding, KV cache optimization, and context caching.
    • Optimize token throughput and cost efficiency.
    • Evaluate emerging inference technologies and frameworks.
    • Benchmark models across performance, accuracy, and cost metrics.


    Platform Development
    • Develop APIs and backend services supporting AI inference workloads.
    • Integrate authentication, billing, token metering, and usage tracking.
    • Work closely with platform engineering teams to improve reliability and scalability.
    • Contribute to Qubrid's AI Model Studio and AI Compute Platform.


    Required Qualifications
    • Bachelor's or Master's degree in Computer Science, Engineering, AI/ML, or related field.
    • 2+ years of software engineering experience.
    • 2+ years of production AI/ML infrastructure experience.
    • Strong Python programming expertise.
    • Deep understanding of transformer architectures and modern LLMs.
    • Experience deploying models such as Llama, DeepSeek, Qwen, Mistral, Gemma, and other open-source models.
    • Strong Linux systems administration skills.
    • Experience with Docker and Kubernetes.
    • Experience with distributed systems and cloud-native architectures.


    Technical Skills
    AI & ML
    • PyTorch
    • Hugging Face Transformers
    • Model quantization
    • Fine-tuning workflows
    • Embedding models
    • RAG architectures
    • Vector databases
    GPU & Infrastructure
    • NVIDIA CUDA
    • TensorRT
    • NCCL
    • NVLink
    • NVSwitch
    • Multi-GPU optimization
    • GPU monitoring and profiling
    Cloud & DevOps
    • Kubernetes
    • Docker
    • Terraform
    • CI/CD pipelines
    • AWS, Azure, GCP, or private cloud environments
    Databases & Backend
    • PostgreSQL
    • MongoDB
    • Redis
    • REST APIs
    • gRPC
    • Event-driven architectures
    Preferred Qualifications
    • Experience building AI API platforms similar to OpenAI, Anthropic, Together AI, Fireworks, or DeepInfra.
    • Experience operating large-scale inference clusters with hundreds or thousands of GPUs.
    • Knowledge of GPU virtualization and multi-tenancy.
    • Experience with distributed training and fine-tuning.
    • Familiarity with NVIDIA DGX, HGX, and enterprise GPU environments.
    • Contributions to open-source AI infrastructure projects.


    What Success Looks Like
    • Deliver highly optimized AI inference platform services with industry-leading latency and throughput.
    • Improve GPU utilization and reduce infrastructure costs.
    • Scale AI services reliably across cloud and on-premise environments.
    • Enable customers to deploy and consume AI models through Qubrid's unified AI platform.
    • Drive innovation in AI inference, model optimization, and GPU infrastructure.


    Why Join Qubrid AI
    • Build the future of AI infrastructure.
    • Work on cutting-edge NVIDIA GPU platforms.
    • Influence the architecture of a rapidly growing AI platform.
    • Solve challenging problems in inference, scale, performance, and distributed systems.
    • Help democratize access to AI infrastructure globally.


    Qubrid AI is an equal opportunity employer and welcomes applicants passionate about building the future of AI infrastructure.

Sign up for Job Alerts