Senior Site Reliability Engineer

You must Sign In before continuing to the company website to apply.

Smart SummaryPowered by Roshi

Work to understand arising issues and improve application performance by enacting monitoring solutions. Analyze current systems, reduce problems, and suggest solutions. Support monitoring, processes, tools, architecture, and root cause analysis. Develop and maintain monitoring systems, automate tasks, troubleshoot incidents, and improve system management efficiency. Identify areas for improvement and design scalable, reliable solutions. Monitor and act on alerts to prevent outages. Meet qualifications and possess necessary experience.

What is the job like

Work to understand any arising issues and overall application performance by enacting monitoring solutions.
Conduct consistent and thorough analysis of current systems and work to reduce the quantity of existing problems, suggesting new solutions to help upgrade & refine such systems.
Provide support across a broad range of areas including monitoring, processes & tools, architecture, and Root Cause Analysis.
Develop and maintain monitoring and alerting systems to proactively detect and resolve issues.
Automate routine tasks to improve system efficiency and reduce downtime.
Troubleshoot and resolve incidents and outages.
Develop and implement automation scripts and tools to improve the efficiency and effectiveness of system management tasks.
Identifying areas for improvement, and designing solutions that are scalable, reliable, and easy to maintain.
Monitoring & acting on Alerts to avoid production outages, Incidents.
Upkeeping of Run books for the Alerts

Qualification:

4-6 years of sysadmin experience in handling large-scale distributed system software deployments in cloud or in an on-premises environment.
Strong cloud management foundation.
Unix shells, Python & Go programming proficiency.
Experience in MySQL or PostgresQL in database.
Outstanding teammate who can collaborate and influence in a multifaceted environment.
Excellent interpersonal, and written communication skills.
Excellent debugging and troubleshooting skills.
Ability to define standard operating procedures for supported platform features.
Experience working with observability tools and practices(Prometheus, Grafana).
Experience in troubleshooting and resolving incidents.
Cloud experience in AWS (preferred) including hands-on experience with AWS-CLI.
Hands-on experience in the Orchestration and Containerisation like Kubernetes, Containers.
Experience with CI /CD (i. e. Jenkins, ArgoCD).
Solid Understanding of Networking (firewall, connectivity, routing, iptables, subnet config, etc.).
Experience with Linux OS and Shell/Python Scripting.
Experience in programming with Python, Go
Experience with API Gateway like Kong, Nginx based systems.
Experience with security best practices and technologies.
BS degree in Computer Science or a related technical field involving coding, or equivalent practical experience

Set alert for similar jobsSenior Site Reliability Engineer role in Hyderabad, India

Company

Zeta

Job Posted

2 years ago

Job Type

Full-time

WorkMode

On-site

Experience Level

3-7 Years

Related Jobs

Site Reliability Engineer II

Zeta

Hyderabad, Telangana, India

Posted: 2 years ago

We are looking for a skilled engineer to deploy and maintain container orchestration, service mesh, API gateways, CI/CD components, and observability stacks. Collaborate with developers to create abstractions for an optimal developer experience and ensure platform availability. Prior experience with kernel, networking, OS fundamentals, public and private cloud solutions, distributed systems, and micro-service architectures is required. Knowledge of CI/CD practices, deployment patterns, and tools is essential. Proficiency in Kubernetes, Envoy, API gateways, service mesh, infrastructure as code, and configuration management is necessary. Strong programming skills in Python, Go, or Ruby are preferred. Familiarity with cloud security and DevSecOps practices is a plus. Must be a lifelong learner and an effective engineer. Bonus if you have expertise in chaos and resilience engineering.

Lead Site Reliability Engineer

Zeta

Hyderabad, Telangana, India

Posted: 2 years ago

Establish an SRE site and build an effective, inclusive SRE team. Provide technical leadership and guidance to ensure availability and performance of mission critical services. Manage project priorities and deadlines. Lead Incident Management and drive MTTR as per the Incident SLA.

Senior Site Reliability Engineer

Thomson Reuters

Hyderabad, Telangana, India

Posted: a year ago

As a Senior Site Reliability Engineer at Thomson Reuters, you will manage cloud environments, troubleshoot application issues, conduct end-to-end application testing, maintain system documentation, and collaborate with cross-functional teams. This hybrid full-time role in Hyderabad requires 6+ years' experience in cloud with Windows systems, AWS DevOps expertise, ITIL knowledge, .Net application experience, scripting skills, database knowledge, and strong communication and analytical abilities.

Senior Site Reliability Engineer

Zeta

Bengaluru, Karnataka, India

Posted: 2 years ago

Build, deploy, and manage business applications on cloud platforms using containers orchestration, service mesh, and API gateways. Collaborate with product managers, designers, and developers in self-sufficient teams and follow best DevOps practices. Own service availability.

Lead Site Reliability Engineer

JPMorgan Chase & Co.

Hyderabad, Telangana, India

Posted: 2 years ago

Lead Site Reliability Engineer at JPMorgan Chase within the Consumer and Community Banking of Infrastructure and Production Management Team. Hold a leadership role, demonstrate strong knowledge across multiple technical domains, and advise others on technical and business issues. Lead resiliency design reviews, act as a technical lead, and provide mentoring. Champion site reliability culture and practices, improve reliability and stability, and identify and solve technology-related bottlenecks. Required qualifications include formal training or certification, deep proficiency in reliability and scalability, fluency in programming language, proficiency in observability and CI/CD tools, experience with containers and troubleshooting networking technologies.

Site Reliability Engineer III

JPMorgan Chase & Co.

Hyderabad, Telangana, India

Posted: 2 years ago

JOB DESCRIPTION There’s nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems. As a Site Reliability Engineer III at JPMorgan Chase within the Consumer and Community Banking of Infrastructure and Production Management, you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform. Job responsibilities Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications Implements infrastructure, configuration, and network as code for the applications and platforms in your remit Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers Develop, test and debug automated tasks (Apps, Systems, Infrastructure) Troubleshoot priority incidents, facilitate blameless post-mortems    Required qualifications, capabilities, and skills Minimum 7 years of over all experience in IT industry Formal training or certification on site reliability engineering concepts and 3+ years applied experience Proficient in at least one programming language such as Python, Java/Spring Boot Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.) Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform Familiarity with container and container orchestration such as ECS, Kubernetes, and Docker Preferred qualifications, capabilities, and skills Proficiency in one or more technology domains, may be a cross-domain expert able to solve complex and mission critical problems within a business or across the firm Adept in the development of automated tools, systems, and services in multiple technology domains Working knowledge of infrastructure components. (E.g. routers, load balancers , cloud products , container systems , compute, storage and networks) Excellent debugging and trouble shooting skills   ABOUT US JPMorgan Chase & Co., one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the world’s most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans over 200 years and today we are a leader in investment banking, consumer and small business banking, commercial banking, financial transaction processing and asset management. We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. In accordance with applicable law, we make reasonable accommodations for applicants’ and employees’ religious practices and beliefs, as well as any mental health or physical disability needs. ABOUT THE TEAM Our Consumer & Community Banking division serves our Chase customers through a range of financial services, including personal banking, credit cards, mortgages, auto financing, investment advice, small business loans and payment processing. We’re proud to lead the U.S. in credit card sales and deposit growth and have the most-used digital solutions – all while ranking first in customer satisfaction.