Lead Site Reliability Engineer

You must Sign In before continuing to the company website to apply.

Smart SummaryPowered by Roshi

Join as a Lead Site Reliability Engineer at Monotype's Noida location. Work with cross-functional teams to design and implement highly reliable, scalable, and fault-tolerant systems. Mentor junior team members, drive automation initiatives, and ensure system availability and performance through advanced monitoring and optimization techniques.

Job description

We are seeking an experienced and highly skilled Site Reliability Engineer to join our team. In this role, you will be working with Cross-Functional teams during designing and implementation phase to ensure highly reliable, scalable, performant and fault-tolerant systems that support our critical business applications and services.

What you’ll be doing:

Collaborate closely with development teams to ensure the reliability, observability, performance, and maintainability of applications and systems.

Develop and maintain sophisticated automation scripts and tools to streamline complex tasks and workflows, aiming to improve system reliability and operational efficiency.

Implement advanced monitoring solutions and performance optimization techniques to ensure high system availability and responsiveness.

Lead blameless post-mortems, identify root causes of incidents, and drive continuous improvement initiatives to prevent recurrence.

Mentor and provide technical guidance to junior team members, fostering a culture of knowledge sharing and professional growth within the team.

Act as an escalation point for complex technical issues, ensuring timely resolution and effective communication with stakeholders.

Provide leadership during incidents and outages, facilitating incident response efforts and coordinating cross-functional teams to restore services quickly.

Stay updated with industry trends, emerging technologies, and best practices in SRE, DevOps, and cloud computing, and integrate relevant insights into the team's operations.

Drive project tasks forward within a multi-disciplined team, ensuring alignment with project goals and deadlines.

Prepare and present project status updates, metrics, and reports for stakeholders, including Senior Management.

Facilitate functional and cross-functional discussions to resolve issues and drive decision-making, providing guidance and coaching as needed.

Manage risks effectively and proactively, mitigating their impact on project deliverables and overall system stability.

Interpret internal and external business challenges, recommending best practices to enhance products, processes, or services.

What we’re looking for:

Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent professional experience.

7-10 years of hands-on experience in Site Reliability Engineering, DevOps, or a similar role, demonstrating a strong focus on ensuring system reliability and driving automation initiatives.

Proficiency in programming languages such as Python, Groovy, with a proven track record of developing automation scripts and tools to streamline operations.

Extensive expertise in Linux systems administration, along with practical experience in managing cloud computing platforms (e.g., AWS, GCP, Azure), containerization technologies (e.g., Docker, Kubernetes), and infrastructure as code principles using tools like Terraform or CloudFormation.

Solid understanding of networking fundamentals, load balancing strategies, caching mechanisms, and distributed system design principles.

Experience implementing and managing monitoring and observability solutions, including tools such as DataDog, Prometheus, Grafana, and ELK stack, to ensure the health and performance of systems.

Strong problem-solving, analytical, and troubleshooting skills, with the ability to diagnose and resolve complex technical issues efficiently.

Excellent communication and collaboration abilities, with a proven track record of effectively working in cross-functional teams and communicating technical concepts to non-technical stakeholders.

Demonstrated ability to mentor and develop junior team members, fostering a culture of continuous learning and professional growth.

Experience with chaos engineering principles and practices, leveraging tools like Chaos Monkey or Gremlin to proactively identify weaknesses in distributed systems.

Knowledge of machine learning techniques and data analytics in the context of Site Reliability Engineering, enabling data-driven decision-making and predictive analysis.

Familiarity with service mesh technologies such as Istio or Linkerd, facilitating the implementation of resilient and secure microservices architectures.

Active involvement in open-source projects or participation in relevant technical communities, demonstrating a commitment to professional development and knowledge sharing.

Set alert for similar jobsLead Site Reliability Engineer role in Noida, India

Company

Monotype

Job Posted

a year ago

Job Type

Full-time

WorkMode

On-site

Experience Level

8-12 Years

Related Jobs

Site Quality Lead

GE (General Electric)

Noida, Uttar Pradesh, India

Posted: a year ago

Job Description Your Role: Working together with the Document Controller to ensure that the administration of the documents is properly organized and handled. Working together with the offices responsible to ensure that all local contracts from the site are awarded only to demonstrably qualified suppliers and/or that appropriate preventive actions are taken to eliminate weak points. Prepare and agree audit schedule. Working together with the Site Management to ensure that the material handling manager, assembly and commissioning staff responsible check all incoming shipments for damage in transit and completeness of shipment. Checking the contractors I&T plans, approved FQP and associated procedures for their work at the Site and acting via the Site Manager, advising any corrections required. Monitoring to make certain that the inspection and testing during assembly and commissioning are conducted, documented, checked, approved, and documented as prescribed in the I&T Plans and the Approved FQP and that the testing status is readily identifiable. Acting according to agreement with the Site Management in organizing and supervising acceptance testing and/or approval by customers and governmental authorities. On assignment from the Site Management, ensuring that documents (Q records and red correction copies) are updated and submitted to the technical responsible offices according to the standards. Take part in Site Induction Quality Training for all quality tools & Process & active participation in Kaizen event at site. Take lead as issuing organization for NCR, Stop Work & Lesson Learned Active Participation in Kick Off Meeting with all the site contractor Ensuring that all special processes have been identified and that the staff employed for them is verifiably qualified (i.e. welders, etc.). Monitoring the application of the identified processes. Ensuring that the Non-conformance & STOP WORK Process is established. Ensuring calibration process for testing and measurement equipment is established Providing training (quality induction) and consultation for all Site employees Carrying out internal Quality Audits at site Ensuring that all QA Certificates are placed in the archives as called for in Work Instructions and that experience with local suppliers/ site contractors is incorporated into the List of Suppliers/ Sub contractors. Preparation of Quality Punch list and ensuring these are attended. Obtaining customer clearance as per FQP Attending to the Quality Punch Points as issued by Customer. Preparation and hand-over of the Quality Dossier after completion of erection activities. A job at this level requires good interpersonal skills. For customer facing roles, develops strong customer relationships and serves as the interface between customer and GE. Explains technical information to others. Required Qualifications B.E/ B. Tech/ Diploma in Engineering with minimum 10 years’ experience NDT Level II (PT, UT, MT,RT)

Site Reliability Trainee

Monotype

Noida, Uttar Pradesh, India

Posted: a year ago

Join as Site Reliability Trainee at Monotype India, a global company advancing brand experiences through font and technology innovation. Develop skills in programming, system engineering, and troubleshooting in a dynamic Agile environment with a focus on Windows & Linux systems, monitoring tools, AWS, DevOps, and collaborating with cross-functional teams.

Site Reliability, Staff

Synopsys Inc

Rampur, Uttar Pradesh, India

Posted: a year ago

Site Reliability Staff role at Synopsys Inc. in Rampur, Uttar Pradesh, India. Responsible for designing, implementing, and managing on-prem & cloud IT infrastructure services. Collaborate on upgrades, migrations, and automation projects. Support internal customers and global team members.

Lead Performance Engineer in Test

Monotype

Noida, Uttar Pradesh, India

Posted: a year ago

Lead Performance Engineer in Test at Monotype, Noida, India. Work collaboratively to perform performance testing on Monotype Fonts Web and Desktop applications. Provide detailed reports, debug performance issues, and monitor production systems proactively.

Site Engineer

Ericsson

Noida, Uttar Pradesh, India

Posted: a year ago

Validate on-site survey and build site design document, BIM Models, Digital Twin, As built documents, and AutoCAD drawings for installation, commissioning, and integration activities. Ensure compliance with Ericsson quality and standards.

Lead Site Reliability Engineer

Zeta

Hyderabad, Telangana, India

Posted: 2 years ago

Establish an SRE site and build an effective, inclusive SRE team. Provide technical leadership and guidance to ensure availability and performance of mission critical services. Manage project priorities and deadlines. Lead Incident Management and drive MTTR as per the Incident SLA.