Skip to main content
Posted 17 June, 2026

AI Inference Junior Engineer WFH

Qubrid AI
Thrissur, KL, IN Full Time
Reference: a5a37fcd230acd88

Job Description Read everything carefully. The requirements and screening questions are critical and if not answered correctly and satisfactorily will...

Job Description

Read everything carefully. The requirements and screening questions are critical and if not answered correctly and satisfactorily will result in auto-rejection and waste of your time.


  • Work from Home.
  • This is a full-time role. If you plan to do 2 or more jobs at the same time or want to do this part-time, that won't work for us. In that case please do not apply as it will get auto-rejected
  • Note - this job requires working late night India time until 4AM to overlap with USA working times. Do not apply if this timing doesn't work
  • Salary depends on experience and current verifiable (paychecks) compensation.
  • Junior candidates with 2 years experience are suitable


About Qubrid AI


Qubrid AI is building the next generation AI infrastructure platform that enables organizations to deploy, scale, and monetize AI workloads across cloud, on-premises, and hybrid environments. Our platform combines GPU cloud infrastructure, inference APIs, model deployment services, RAG pipelines, fine-tuning capabilities, and AI orchestration software into a unified AI stack.

We are seeking an experienced and hands-on AI Inference Engineer to design, optimize, and scale large-scale AI inference systems supporting thousands of concurrent users and enterprise AI workloads.


Role Overview


As an AI Inference Engineer, you will be responsible for deploying, optimizing, and operating open-source and commercial AI models across NVIDIA GPU infrastructure. You will work at the intersection of machine learning, distributed systems, GPU optimization, and cloud infrastructure to deliver low-latency, high-throughput AI services.


This is a highly technical role requiring deep expertise in LLM serving, GPU performance tuning, model optimization, inference frameworks, and large-scale production deployments.

Responsibilities


AI Model Deployment & Serving
  • Deploy and manage Large Language Models (LLMs), multimodal models, vision models, speech models, and embedding models in production.
  • Build and optimize inference pipelines for enterprise and public AI workloads.
  • Implement scalable serving architectures using modern inference frameworks.
  • Support model versioning, rollbacks, canary deployments, and A/B testing.


GPU Performance Optimization
  • Optimize GPU utilization, memory allocation, throughput, and latency.
  • Implement model quantization techniques including FP16, BF16, INT8, GPTQ, AWQ, and GGUF.
  • Tune inference workloads across NVIDIA H100, H200, B300, B200, A100, L40S, and other accelerator platforms.
  • Analyze bottlenecks using NVIDIA profiling and monitoring tools.
AI Infrastructure Engineering


  • Design scalable inference clusters using Kubernetes and containerized workloads.
  • Implement auto-scaling, load balancing, and fault-tolerant architectures.
  • Build GPU scheduling and resource allocation strategies.
  • Optimize multi-tenant AI serving environments.

  • Inference Framework Expertise


    • Deploy and optimize models using:
    • vLLM
    • NVIDIA TensorRT-LLM
    • Triton Inference Server
    • SGLang
    • TGI (Text Generation Inference)
    • Ollama
    • Ray Serve
    • OpenAI-compatible serving stacks
    • NVIDIA Dynamo


    Model Optimization
    • Implement batching, continuous batching, speculative decoding, KV cache optimization, and context caching.
    • Optimize token throughput and cost efficiency.
    • Evaluate emerging inference technologies and frameworks.
    • Benchmark models across performance, accuracy, and cost metrics.


    Platform Development
    • Develop APIs and backend services supporting AI inference workloads.
    • Integrate authentication, billing, token metering, and usage tracking.
    • Work closely with platform engineering teams to improve reliability and scalability.
    • Contribute to Qubrid's AI Model Studio and AI Compute Platform.


    Required Qualifications
    • Bachelor's or Master's degree in Computer Science, Engineering, AI/ML, or related field.
    • 2+ years of software engineering experience.
    • 2+ years of production AI/ML infrastructure experience.
    • Strong Python programming expertise.
    • Deep understanding of transformer architectures and modern LLMs.
    • Experience deploying models such as Llama, DeepSeek, Qwen, Mistral, Gemma, and other open-source models.
    • Strong Linux systems administration skills.
    • Experience with Docker and Kubernetes.
    • Experience with distributed systems and cloud-native architectures.


    Technical Skills
    AI & ML
    • PyTorch
    • Hugging Face Transformers
    • Model quantization
    • Fine-tuning workflows
    • Embedding models
    • RAG architectures
    • Vector databases
    GPU & Infrastructure
    • NVIDIA CUDA
    • TensorRT
    • NCCL
    • NVLink
    • NVSwitch
    • Multi-GPU optimization
    • GPU monitoring and profiling
    Cloud & DevOps
    • Kubernetes
    • Docker
    • Terraform
    • CI/CD pipelines
    • AWS, Azure, GCP, or private cloud environments
    Databases & Backend
    • PostgreSQL
    • MongoDB
    • Redis
    • REST APIs
    • gRPC
    • Event-driven architectures
    Preferred Qualifications
    • Experience building AI API platforms similar to OpenAI, Anthropic, Together AI, Fireworks, or DeepInfra.
    • Experience operating large-scale inference clusters with hundreds or thousands of GPUs.
    • Knowledge of GPU virtualization and multi-tenancy.
    • Experience with distributed training and fine-tuning.
    • Familiarity with NVIDIA DGX, HGX, and enterprise GPU environments.
    • Contributions to open-source AI infrastructure projects.


    What Success Looks Like
    • Deliver highly optimized AI inference platform services with industry-leading latency and throughput.
    • Improve GPU utilization and reduce infrastructure costs.
    • Scale AI services reliably across cloud and on-premise environments.
    • Enable customers to deploy and consume AI models through Qubrid's unified AI platform.
    • Drive innovation in AI inference, model optimization, and GPU infrastructure.


    Why Join Qubrid AI
    • Build the future of AI infrastructure.
    • Work on cutting-edge NVIDIA GPU platforms.
    • Influence the architecture of a rapidly growing AI platform.
    • Solve challenging problems in inference, scale, performance, and distributed systems.
    • Help democratize access to AI infrastructure globally.


    Qubrid AI is an equal opportunity employer and welcomes applicants passionate about building the future of AI infrastructure.

This listing expired on 19 Jun. Applications are no longer accepted.

Below are some other jobs we think you might be interested in.

  • AI Inference Junior Engineer WFH
    • Qubrid AI
    • Kākināda, AP, IN
    Job Description Read everything carefully. The requirements and screening questions are critical and if not answered correctly and satisfactorily will...
    19 Jun
  • AI Inference Junior Engineer WFH
    • Qubrid AI
    • Ahmedabad, GJ, IN
    Job Description Read everything carefully. The requirements and screening questions are critical and if not answered correctly and satisfactorily will...
    19 Jun
  • AI Inference Junior Engineer WFH
    • Qubrid AI
    • Bhavnagar, GJ, IN
    Job Description Read everything carefully. The requirements and screening questions are critical and if not answered correctly and satisfactorily will...
    19 Jun
  • AI Inference Junior Engineer WFH
    • Qubrid AI
    • Ernakulam, KL, IN
    Job Description Read everything carefully. The requirements and screening questions are critical and if not answered correctly and satisfactorily will...
    19 Jun
  • AI Inference Junior Engineer WFH
    • Qubrid AI
    • Mumbai, MH, IN
    Job Description Read everything carefully. The requirements and screening questions are critical and if not answered correctly and satisfactorily will...
    19 Jun
  • AI Inference Junior Engineer WFH
    • Qubrid AI
    • Vellore, TN, IN
    Job Description Read everything carefully. The requirements and screening questions are critical and if not answered correctly and satisfactorily will...
    19 Jun
  • AI Inference Junior Engineer WFH
    • Qubrid AI
    • Thiruvananthapuram, KL, IN
    Job Description Read everything carefully. The requirements and screening questions are critical and if not answered correctly and satisfactorily will...
    19 Jun
  • AI Inference Junior Engineer WFH
    • Qubrid AI
    • Solāpur, JH, IN
    Job Description Read everything carefully. The requirements and screening questions are critical and if not answered correctly and satisfactorily will...
    17 Jun
  • AI Inference Junior Engineer WFH
    • Qubrid AI
    • Bareilly, UP, IN
    Job Description Read everything carefully. The requirements and screening questions are critical and if not answered correctly and satisfactorily will...
    19 Jun
  • AI Inference Junior Engineer WFH
    • Qubrid AI
    • Visakhapatnam, AP, IN
    Job Description Read everything carefully. The requirements and screening questions are critical and if not answered correctly and satisfactorily will...
    19 Jun
  • AI Inference Junior Engineer WFH
    • Qubrid AI
    • Belgaum, KA, IN
    Job Description Read everything carefully. The requirements and screening questions are critical and if not answered correctly and satisfactorily will...
    19 Jun
  • AI Inference Junior Engineer WFH
    • Qubrid AI
    • Hubballi, KA, IN
    Job Description Read everything carefully. The requirements and screening questions are critical and if not answered correctly and satisfactorily will...
    18 Jun
  • AI Inference Junior Engineer WFH
    • Qubrid AI
    • Guwahati, AS, IN
    Job Description Read everything carefully. The requirements and screening questions are critical and if not answered correctly and satisfactorily will...
    19 Jun
  • AI Inference Junior Engineer WFH
    • Qubrid AI
    • Faridabad, HR, IN
    Job Description Read everything carefully. The requirements and screening questions are critical and if not answered correctly and satisfactorily will...
    19 Jun
  • AI Inference Junior Engineer WFH
    • Qubrid AI
    • Aurangābād, BR, IN
    Job Description Read everything carefully. The requirements and screening questions are critical and if not answered correctly and satisfactorily will...
    19 Jun
  • AI Inference Junior Engineer WFH
    • Qubrid AI
    • Anantapur, AP, IN
    Job Description Read everything carefully. The requirements and screening questions are critical and if not answered correctly and satisfactorily will...
    17 Jun
  • AI Inference Junior Engineer WFH
    • Qubrid AI
    • Saint Thomas Mount, TN, IN
    Job Description Read everything carefully. The requirements and screening questions are critical and if not answered correctly and satisfactorily will...
    18 Jun
  • AI Inference Junior Engineer WFH
    • Qubrid AI
    • Tirupur, TN, IN
    Job Description Read everything carefully. The requirements and screening questions are critical and if not answered correctly and satisfactorily will...
    18 Jun
  • AI Inference Junior Engineer WFH
    • Qubrid AI
    • Ajit, RJ, IN
    Job Description Read everything carefully. The requirements and screening questions are critical and if not answered correctly and satisfactorily will...
    19 Jun
  • AI Inference Junior Engineer WFH
    • Qubrid AI
    • Hyderabad, TG, IN
    Job Description Read everything carefully. The requirements and screening questions are critical and if not answered correctly and satisfactorily will...
    19 Jun