Posted 19 May, 2026

Datacenter Observability and Site Reliability Engineer

Macpower Digital Assets Edge Private Limited

Chennai, Tamil Nadu, IN Full Time

Reference: 25-01115-2555-1

Job Summary: We are seeking a skilled Observability & Site Reliability Engineer to join our team in supporting large-scale, enterprise-grade infrastructure. The ideal candidate will have extensive experience with observability tools especially Grafana, Loki, Mimir, and Kubernetes metrics/logs along with a strong passion for performance, scalability, and system uptime. Candidates must be flexible to collaborate with Korean stakeholders and work within the Korean time zone.

Experience: 8 to 12 years.
Notice Period: Immediate to 30 days preferred.

Key Must-Have Skills:

5+ years in Observability Engineering.
Expertise in Grafana, Loki, Mimir, and Alloy agent.
Strong understanding of infrastructure metrics (e.g., GPU, CPU, Kubernetes).
Proficiency in scripting languages ( Python, Go, Bash).
Prior exposure to tools such as Prometheus, ELK, Docker, and Terraform.
Flexibility to work with Korean stakeholders and time zones.

Role Highlights:

Design and manage the observability stack across large-scale data center infrastructure.
Build scalable telemetry systems, dashboards, alerts, and reports.
Apply SRE best practices to ensure system reliability and performance.
Troubleshoot real-time issues and contribute to ongoing system optimization.

Good to Have:

Previous experience working with Korean stakeholders.
Familiarity with cloud platforms like AWS, GCP, or Azure.

Apply to this Job

Datacenter Observability and Site Reliability Engineer

Sign up for Job Alerts

Share this Job