Skip to main content
Posted 19 May, 2026

Datacenter Observability and Site Reliability Engineer

Macpower Digital Assets Edge Private Limited
Chennai, Tamil Nadu, IN Full Time
Reference: 25-01115-2555-1

Job Summary: We are seeking a skilled Observability & Site Reliability Engineer to join our team in supporting large-scale, enterprise-grade infrastructure. The ideal candidate will have extensive experience with observability tools especially Grafana, Loki, Mimir, and Kubernetes metrics/logs along with a strong passion for performance, scalability, and system uptime. Candidates must be flexible to collaborate with Korean stakeholders and work within the Korean time zone.
  • Experience: 8 to 12 years.
  • Notice Period: Immediate to 30 days preferred.

Key Must-Have Skills:
  • 5+ years in Observability Engineering.
  • Expertise in Grafana, Loki, Mimir, and Alloy agent.
  • Strong understanding of infrastructure metrics (e.g., GPU, CPU, Kubernetes).
  • Proficiency in scripting languages ( Python, Go, Bash).
  • Prior exposure to tools such as Prometheus, ELK, Docker, and Terraform.
  • Flexibility to work with Korean stakeholders and time zones.

Role Highlights:
  • Design and manage the observability stack across large-scale data center infrastructure.
  • Build scalable telemetry systems, dashboards, alerts, and reports.
  • Apply SRE best practices to ensure system reliability and performance.
  • Troubleshoot real-time issues and contribute to ongoing system optimization.

Good to Have:
  • Previous experience working with Korean stakeholders.
  • Familiarity with cloud platforms like AWS, GCP, or Azure.

Sign up for Job Alerts