Skip to main content
Posted 17 June, 2026

Senior/ Staff Backend

Zyoin Group
Bengaluru, KA, IN Full Time
Reference: 46f6f788562bebaa

Job Description

We're looking for a Senior/Staff AI Engineer — Inference & Agent Systems for a rapidly growing Fintech startup setting up their operations in India.


Why Join?

Get an opportunity to be of the founding member of the team and develop product from scratch.


About Role and the work:


Inference Optimization

  • Drive TTFT below 400ms for multi-step agent pipelines
  • Streaming optimization: first token to user while sub-agents are still running
  • KV cache strategy, prompt compression, dynamic context window management
  • Multi-provider routing: model selection by latency, cost, and task type across OpenAI, Anthropic, Gemini, and open-weight models

Infrastructure

  • Model serving and cold start optimization
  • Async worker architecture for parallel sub-agent execution
  • Observability: trace every token, every tool call, every synthesis step

What We're Looking For:


You've built something that runs in production at a meaningful scale and you understand why it's fast (or why it isn't).


Weaker signal (but not disqualifying):

  • You've fine-tuned models but haven't shipped inference systems
  • You've used LangChain/LlamaIndex but haven't built the layer underneath
  • Strong ML research background without systems exposure

Stack familiarity (we care more about depth than match): Go, Python, Temporal, Kafka, PostgreSQL, Docker


Why This Role:


The problems here don't have blog posts about them yet. Parallel agent DAG execution under hard latency budgets, streaming synthesis across partial sub-agent results, and eval harnesses for non-deterministic multi-step systems are genuinely unsolved at production quality. Small team. High ownership. Every engineer's decisions ship to production.


Key Responsibilities


Agent Architecture

  • Design and implement Plan-Execute-Synthesize pipelines that run sub-agents in parallel DAGs, not sequential chains
  • Build reliable orchestration on top of Temporal: retries, timeouts, partial failure recovery, idempotency
  • Structured output enforcement: JSON schema validation, retry loops on malformed LLM output, graceful degradation
  • Tool call design: schema design that LLMs actually follow reliably across providers

Evaluation & Harness

  • Own the eval framework end to end: ground truth datasets, automated scoring pipelines, regression detection on every PR
  • LLM-as-judge pipelines for qualitative output assessment
  • Latency regression testing - p50/p95/p99 tracked across every deployment
  • Adversarial test case design: ambiguous queries, missing data, conflicting sources, malformed tool responses

Must-Have Skills

  • You've worked on inference pipelines where TTFT was the primary metric and you moved it meaningfully
  • You've built multi-step agent systems and you know where they break not from reading papers but from watching them fail in production
  • You've written eval harnesses from scratch and you have opinions about what makes a ground truth dataset actually useful
  • You've debugged LLM non-determinism in production and built systems resilient to it
  • You've worked with streaming LLM responses and built infrastructure around partial output handling

Good-to-Have Skills


You've shipped inference systems at:

  • A real-time AI product (search, coding assistant, chat at scale)
  • A model serving infrastructure company
  • An agent platform (any domain)

Location & Work Mode:


Location: Bangalore

Work Mode: Hybrid setup

Interview procedure comprises of 5 rounds usually:

  • HackerRank test
  • coding round
  • Design round
  • CTO round
  • Product head round

Important Notes / Filters


Candidates from premium colleges/institutes preferred.

Product company experience preferred.

Bangalore-based candidates preferred.

Sign up for Job Alerts