The Job logo

What

Where

Site Reliability Engineer

ApplyJoin for More Updates

You must Sign In before continuing to the company website to apply.

Smart SummaryPowered by Roshi
Monitor and troubleshoot system performance, availability, and security. Analyze metrics and trace data. Collaborate with development teams for scalability and reliability. Manage app releases and resolve production issues. Conduct root cause analysis. Optimize system performance and capacity planning. Utilize CI/CD tools.

Job Requirement

What you’ll do:

  • Monitor and troubleshoot issues related to system performance, availability, and security.
  • Define and implement Service Level Indicators (SLI), Service Level Objectives (SLO), and Error Budgets to measure and improve service reliability.
  • Analyze and report on Metrics and Trace data using Grafana.
  • Participate in on-call rotation to provide 24/7 support for critical production systems.
  • Collaborate with development teams to ensure new features and services are designed with scalability and reliability in mind.
  • Help in rolling out new security and infra features as and when released.
  • Proactively identify and resolve issues before they impact customers.
  • Manage app releases by automating the deployment process, ensuring proper version control, and managing the rollout to minimize the impact on users.
  • Coordinate between developers and operations to ensure smooth software releases and timely resolution of production issues.
  • Conduct Root Cause Analysis (RCA) of production incidents and develop plans to prevent future occurrences.
  • Review and optimize system performance, identify bottlenecks and implement capacity planning and recovery  strategies.
  • Valuate and automate manual and repetitive tasks to reduce toil and improve system efficiency.
  • Use CI/CD tools such as Git, Jira, and Jenkins to streamline the software development process.

What We're Looking For:

  • 4-6 years of relevant work experience.
  • Bachelor's or Master's degree in Computer Science or a related field.
  • Strong understanding of Linux/Unix systems administration and networking.
  • Experience with cloud platforms such as GCP, AWS.
  • Strong programming skills in one or more languages such as Python, Java, or Go.
  • Experience with monitoring and alerting tools such as Grafana, Prometheus, or New Relic.
  • Experience with configuration management too.
  • Strong problem-solving skills.
  • Strong communication and teamwork skills.
  • Experience with Kubernetes, Docker, and other containerization technologies is a plus
Set alert for similar jobsSite Reliability Engineer role in Gurgaon, India, Pune, India, or Bengaluru, India
Groww Logo

Company

Groww

Job Posted

a year ago

Job Type

Full-time

WorkMode

Remote

Experience Level

3-7 years

Category

Software Engineering

Locations

Gurgaon, Haryana, India

Pune, Maharashtra, India

Bengaluru, Karnataka, India

Qualification

Bachelor

Applicants

Be an early applicant

Related Jobs

Canonical Logo

Site Reliability Engineer, APAC

Canonical

Gurgaon, Haryana, India

+4 more

Posted: a year ago

As a Site Reliability Engineer at our company, you will be responsible for bringing Python software engineering skills and rigour to the operations domain. You will work with OpenStack, Kubernetes, and software-defined storage to enable devsecops for applications running on our infrastructure. You must be a software engineer fluent in Python and have a genuine interest in the full open source infrastructure stack. You should also have experience working in a high-pressure operations environment with mission-critical services. Join us and gain experience in a broad range of cloud technologies.