The Job logo

What

Where

Site Reliability Engineer

ApplyJoin for More Updates

You must Sign In before continuing to the company website to apply.

Smart SummaryPowered by Roshi
As an SRE, you will work with our Application devops engineers to maintain and scale our cloud services. You will serve as front-line support, triaging issues to the platform, applications, or infrastructure. You will partner with multiple teams, including a second SRE team who supervises the GPU cloud infrastructure. In this role, you will monitor the application stack, onboard customers, and manage the customer lifecycle. Bachelor's degree in Computer Science or a related field is required. Experience in system design, software design in Unix/Linux systems, and operating production systems is necessary. Familiarity with Kubernetes and multi-cloud environments is preferred. Excellent problem-solving and communication skills are essential.

As a SRE, you will work with our Application devops engineers to maintain and scale our ever-growing number of services hosted in the cloud. You will serve as front-line support, triaging issues to the platform, the applications, or the underlying infrastructure. In this role, you will partner with multiple teams within and outside the application Infrastructure team, including a second SRE team who supervises the GPU cloud infrastructure, while this role will focus on monitoring the application stack. You will be involved in on-boarding customers to our services and managing the customer lifecycle. 

 

What you'll be doing: 

  • build/integrate new software, tools and analytics that drive improvements to the availability, scalability, latency, and efficiency of our cloud services products and services
  • Handle upgrades, and automated rollbacks across all clusters
  • Maintain Service Level Agreement (SLAs) of measurable benchmarks, working hand in hand with developers of new services on how to define SLIs, and design a stable, secure service
  • Help guide the Change Advisory Board, and RCCA processes 
  • Work with engineering, devops and product area leads from technologies across the GPU cloud services stack to guide product engineering to build fast, reliable, and durable production systems 
  • Drive process changes to improve reliability and performance of our cloud services
  • Debug production issues across services and levels of the stack
  • Improve operational processes

 

What we need to see:

  • Bachelor's degree in Computer Science or a related field, or equivalent experience
  • 5+ years of experience in system design, complexity analysis, software design in Unix/Linux systems, performance, and application issues
  • 5+ years of experience authoring, and debugging software written in C++ and python
  • hands-on experience with Kubernetes based cloud environments
  • Multi-cloud experience
  • Experience working with partners across multiple teams 
  • Experience operating production systems

 

Ways To Stand out from the Crowd: 

Background with SaaS offerings

Experience in application issues, algorithms, and data structures


 

Set alert for similar jobsSite Reliability Engineer role in Pune, India
NVIDIA Logo

Company

NVIDIA

Job Posted

a year ago

Job Type

Full-time

WorkMode

On-site

Experience Level

3-7 Years

Category

Engineering

Locations

Pune, Maharashtra, India

Qualification

Bachelor

Applicants

Be an early applicant

Related Jobs

Akamai Technologies Logo

Site Reliability Engineer II - Remote

Akamai Technologies

Bengaluru, Karnataka, India

+4 more

Posted: 5 months ago

As a Site Reliability Engineer II, you will deploy and maintain internal platforms, collaborate with teams to troubleshoot complex problems, develop automated tools, improve system monitoring, and utilize data analysis and debugging skills. Required: 2+ years experience, Bachelor's degree in Computer Science, proficiency in Python/Perl/Shell/Bash, UNIX/Linux environment, network monitoring, Docker, Jenkins, Kubernetes, strong communication, and organizational skills. Full-time remote opportunity at Akamai Technologies in Bengaluru, Hyderabad, Chennai, Noida, and Pune.