Posted 29 May, 2026
Site Reliability Engineer
NR Consulting
Noida,Uttar Pradesh
Full Time
Reference: 365_463738_24-07918
Title: Site Reliability Engineer
Location: Noida
Experience: 7-11 Years
Skill requirements for a reliability engineer:
There are several key skills that are instrumental for being a good reliability engineer:
• Analytical thinking: the ability to think critically and logically, as well as the ability to work with large sets of data and make sense of them.
• Technical aptitude: a strong understanding of the systems, equipment, and processes at hand. This includes knowledge of the engineering principles and specific systems used at an organization.
• Problem-solving: the ability to think creatively and come up with innovative solutions to complex problems.
• Communication: the ability to explain technical concepts in a clear and understandable manner, as well as the ability to collaborate effectively with others.
• Project management: the ability to manage multiple projects and tasks simultaneously and effectively including planning, scheduling, and organizing.
• Data management skills: the ability to use statistical analysis tools as well as the ability to interpret and communicate the results of the analysis.
• A continuous improvement mindset: identifying and implementing ways to continuously improve reliability.
• Safety-conscious: the ability to identify potential hazards and take the necessary steps to mitigate them.
• Ability to program (structured and OOP) using one or more high-level languages, such as Python, Java, C/C++, Ruby, and JavaScript
• Experience with distributed storage technologies such as NFS, HDFS, Ceph, and Amazon S3, as well as dynamic resource management frameworks (Apache Mesos, Kubernetes, Yarn)
• Previous success in technical engineering
• Coding experience beyond simple scripts
Objectives of this role:
• Run the production environment by monitoring availability and taking a holistic view of system health
• Build systems to manage platform infrastructure and applications
• Improve reliability, quality, and time-to-market of the defined solutions
• Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement
• Provide primary operational and engineering support
Responsibilities of a reliability engineer
A reliability engineer is responsible for finding potential problems or opportunities for improvement by analyzing data and identifying patterns within that data.
Once problems have been identified, the reliability engineer will develop and implement solutions to prevent them, ultimately improving the reliability of systems, equipment, and processes.
Common scenarios include:
• Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding
• Conducting root cause analysis to determine the underlying cause of problems.
• Developing and implementing new maintenance procedures.
• Designing and implementing new procedures for monitoring and testing equipment.
• Finding new technologies and processes that can improve equipment performance and reliability.
• Developing and implementing training programs for employees.
• Collaborating with other departments to ensure that reliability is integrated into all aspects of the organization.
• Participate in system design consulting, platform management, and capacity planning
• Create sustainable systems and services through automation and uplifts
• Balance feature development speed and reliability with well-defined service-level objectives
Location: Noida
Experience: 7-11 Years
Skill requirements for a reliability engineer:
There are several key skills that are instrumental for being a good reliability engineer:
• Analytical thinking: the ability to think critically and logically, as well as the ability to work with large sets of data and make sense of them.
• Technical aptitude: a strong understanding of the systems, equipment, and processes at hand. This includes knowledge of the engineering principles and specific systems used at an organization.
• Problem-solving: the ability to think creatively and come up with innovative solutions to complex problems.
• Communication: the ability to explain technical concepts in a clear and understandable manner, as well as the ability to collaborate effectively with others.
• Project management: the ability to manage multiple projects and tasks simultaneously and effectively including planning, scheduling, and organizing.
• Data management skills: the ability to use statistical analysis tools as well as the ability to interpret and communicate the results of the analysis.
• A continuous improvement mindset: identifying and implementing ways to continuously improve reliability.
• Safety-conscious: the ability to identify potential hazards and take the necessary steps to mitigate them.
• Ability to program (structured and OOP) using one or more high-level languages, such as Python, Java, C/C++, Ruby, and JavaScript
• Experience with distributed storage technologies such as NFS, HDFS, Ceph, and Amazon S3, as well as dynamic resource management frameworks (Apache Mesos, Kubernetes, Yarn)
• Previous success in technical engineering
• Coding experience beyond simple scripts
Objectives of this role:
• Run the production environment by monitoring availability and taking a holistic view of system health
• Build systems to manage platform infrastructure and applications
• Improve reliability, quality, and time-to-market of the defined solutions
• Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement
• Provide primary operational and engineering support
Responsibilities of a reliability engineer
A reliability engineer is responsible for finding potential problems or opportunities for improvement by analyzing data and identifying patterns within that data.
Once problems have been identified, the reliability engineer will develop and implement solutions to prevent them, ultimately improving the reliability of systems, equipment, and processes.
Common scenarios include:
• Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding
• Conducting root cause analysis to determine the underlying cause of problems.
• Developing and implementing new maintenance procedures.
• Designing and implementing new procedures for monitoring and testing equipment.
• Finding new technologies and processes that can improve equipment performance and reliability.
• Developing and implementing training programs for employees.
• Collaborating with other departments to ensure that reliability is integrated into all aspects of the organization.
• Participate in system design consulting, platform management, and capacity planning
• Create sustainable systems and services through automation and uplifts
• Balance feature development speed and reliability with well-defined service-level objectives