Skip to main content

Posted 28 June, 2026

Senior SDET Performance Engineering

Codvo.ai

Maharashtra,Pune,India Full Time

Reference: 219_575378_8-hFJlQEn7Ga

Senior SDET - Performance Engineering
Location
Pune
Experience
10+ Years
Role Overview
We are looking for a Senior SDET specializing in Performance Engineering in a cloud-native
Azure environment. This role focuses on driving scalability, reliability, and performance
validation across distributed microservices systems. The candidate will design automated
performance frameworks, build simulators and mocks, define KPIs, and partner with
engineering, architecture, and SRE teams to ensure production-grade resilience.
Key Responsibilities
Design and implement end-to-end performance and load testing strategies for
microservices-based systems
Build custom simulators, traffic generators, and mocks for complex system dependencies
Define, measure, and track performance KPIs (latency, throughput, error rate, saturation,
scalability limits)
Develop fully automated performance test frameworks integrated into CI/CD pipelines
(Jenkins, GitHub Actions, GitLab)
Execute load, stress, spike, endurance, and chaos testing
Collaborate with architects, developers, product owners, and SRE teams to optimize
system performance
Analyze bottlenecks across application, database, and infrastructure layers
Work closely with Azure services (AKS, compute, storage, networking) for performance
tuning
Implement observability using Prometheus, Grafana, and APM tools
Optimize Redis caching, database queries (MariaDB, MySQL, etc), and messaging systems
Support resilience engineering and chaos testing (Chaos Monkey or equivalent)
Drive RCA for performance issues and production incidents
Contribute to capacity planning and scalability strategy
Required Skills
Strong experience in performance testing tools (K6, JMeter, Gatling, and creating custom
frameworks)
Proficiency in scripting (Python, C#, Java, or similar)
Deep understanding of distributed systems and microservices architecture
Hands-on experience with Kubernetes (AKS preferred)
Strong knowledge of Azure cloud ecosystem
Experience with CI/CD and DevOps practices
Understanding of SRE principles (SLI/SLO, error budgets)
Experience with observability and monitoring tools
Strong database performance tuning expertise
Preferred Skills
Experience in contact center / SaaS platforms
Exposure to Kafka, RabbitMQ
Knowledge of AIOps, AI-driven testing and anomaly detection
Experience building custom performance tools or simulators
Qualifications
Bachelor's/Master's in Computer Science or related field
10+ years experience in QA, development and automation, with strong focus on
performance engineering
Tech Stack
Cloud: Azure (AKS, Compute, Storage)
DevOps: Docker, Kubernetes, Terraform, CI/CD
Monitoring: Prometheus, Grafana
Databases: PostgreSQL, MongoDB, Redis
Backend: Java, Spring Boot, APIs
Messaging: Kafka / RabbitMQ
AI-Driven Performance Engineering (GenAI & AIOps)
Leverage Generative AI (GenAI) to auto-generate performance test scenarios, workloads,
and synthetic datasets
Implement AI-driven anomaly detection for identifying performance regressions and
system bottlenecks
Use machine learning models for predictive capacity planning and workload forecasting
Integrate AIOps tools for intelligent alerting, noise reduction, and automated root cause
analysis (RCA)
Apply AI techniques for log analysis, pattern recognition, and failure prediction
Build self-healing test systems with automated remediation triggers
Enhance observability platforms (Prometheus, Grafana) with AI-based insights
Utilize AI for dynamic test optimization based on real-time system behavior
Collaborate with data science teams to implement advanced analytics for performance
insights
AI & Observability Tooling (Real-World Examples)
Azure Monitor + Application Insights (with AI capabilities): Smart detection, failure
anomaly detection, and auto-root cause insights
Azure OpenAI / GenAI integrations: Generate performance scenarios, synthetic workloads,
and intelligent test data
Dynatrace (Davis AI): Automatic dependency mapping, causal AI for root cause analysis,
and real-time anomaly detection
Datadog AI / Watchdog: Automated anomaly detection, performance regression
identification, and alert correlation
New Relic AI: Predictive alerting and performance intelligence across distributed systems
Prometheus + Grafana (with ML plugins): Advanced metric analysis and anomaly detection
extensions
Elastic Stack (ELK) with ML: Log anomaly detection, pattern recognition, and predictive
insights
Chaos Engineering tools (Gremlin, Chaos Monkey): Integrated with observability platforms
for resilience validation
k6 + AI-based extensions: Intelligent load modeling and performance insights
Custom AI/ML pipelines: Python-based models for predictive scaling, workload modeling,
and anomaly detection

Employment Type: FULL_TIME

Apply to this Job

Apply to this Job

Sign up for Job Alerts