The Job logo

What

Where

Site Reliability Engineer (SRE)

ApplyJoin for More Updates

You must Sign In before continuing to the company website to apply.

Smart SummaryPowered by Roshi
IBM is seeking a Site Reliability Engineer (SRE) to ensure the seamless operation of our quantum computing systems. You will design, build, and maintain critical systems, contributing to high availability and efficient problem resolution. Your responsibilities include ensuring reliability, scalability, high availability, designing and maintaining quantum systems, implementing and maintaining CI/CD pipelines, monitoring system performance, executing testing, implementing security measures, responding to system alerts, and documenting system. Required skills include Python, Go (Golang), JavaScript, TypeScript, C++, Rust, Linux, DevOps tools, quantum computing systems, distributed systems, Red Hat, OpenShift, RHEL, Docker, Podman, Kubernetes, service mesh technologies, GitOps, infrastructure as code tools, monitoring/logging/tracing tools, Jinja2, and cloud platforms. Preferred skills include advanced knowledge of quantum computing, Helm, DevOps/Agile methodologies, automating manual processes, and database technologies.

Your Role and Responsibilities
IBM is seeking a Site Reliability Engineer (SRE) to be a key player in ensuring the seamless operation of our quantum computing systems. Working closely with our development teams, you will design, build, and maintain critical systems, creating software and systems to manage, monitor, and scale our quantum computing platforms. Your expertise will contribute to the high availability, optimal performance, and efficient problem resolution of our technology.

 

Responsibilities:

  • Ensure the reliability, scalability, and high availability of our quantum computing systems.
  • Collaborate with development teams to design, deploy, and maintain quantum systems.
  • Implement and maintain CI/CD pipelines using modern tools like Concourse, Tekton, and GitLab CI/CD.
  • Monitor system performance using Grafana, Sysdig, LogDNA, Datadog, and other tools, troubleshoot and resolve issues.
  • Develop and execute monitoring, load, and stress testing, ensuring system resilience.
  • Implement security measures to safeguard system integrity, leveraging tools like Vault.
  • Respond to system alerts using PagerDuty and similar tools to ensure swift issue resolution.
  • Create and maintain comprehensive system documentation, utilizing Github for version control and collaboration.


Required Technical and Professional Expertise

  • Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent work experience.
  • Minimum 5+ years of proven experience as a Site Reliability Engineer or similar role in a software development setting.
  • Proficiency in Python, Go (Golang), JavaScript, TypeScript, C++, and Rust.
  • In-depth knowledge in at least one of these languages is required.
  • Strong Linux skills, including command-line tools, shell scripting, and system diagnostics.
  • Familiarity with fundamental DevOps tools like SSH, Git, and Makefiles.
  • Experience with quantum computing systems and Qiskit.
  • Knowledge of distributed systems and backend systems architecture.
  • Experience with Red Hat, OpenShift, RHEL, and container technologies like Docker and Podman.
  • Proficiency with Kubernetes and familiarity with service mesh technologies like Istio.
  • Experience with GitOps and infrastructure as code tools such as ArgoCD, Ansible, and Terraform.
  • Familiarity with “Pipelines as Code” principles and practices.
  • Experience with monitoring, logging, and tracing tools such as Grafana, Sysdig, LogDNA, Datadog, OpenTelemetry, and Prometheus.
  • Experience with templating languages like Jinja2.
  • Familiarity with cloud platforms like IBM Cloud, AWS, GCP, or Azure.


Preferred Technical and Professional Expertise

  • Master’s degree in Computer Science, Engineering, or a related field.
  • Experience in a quantum computing environment.
  • Advanced knowledge of Quantum Information Science principles and technologies.
  • Experience with Helm for managing Kubernetes applications.
  • Familiarity with the principles and practices of DevOps and Agile methodologies.
  • Experience automating manual processes, customizing and optimizing CI/CD pipelines.
  • Knowledge of database technologies such as PostgreSQL, MySQL, MongoDB, and InfluxDB.
  • Certifications related to Kubernetes, Red Hat, or other relevant technologies.
Set alert for similar jobsSite Reliability Engineer (SRE) role in Gurgaon, India or Bangalore Urban, India
IBM Logo

Company

IBM

Job Posted

a year ago

Job Type

Full-time

WorkMode

On-site

Experience Level

3-7 Years

Category

Software Engineering

Locations

Gurgaon, Haryana, India

Bangalore Urban, Karnataka, India

Qualification

Master or Bachelor

Applicants

Be an early applicant

Related Jobs

Groww Logo

Site Reliability Engineer

Groww

Gurgaon, Haryana, India

+2 more

Posted: a year ago

Monitor and troubleshoot system performance, availability, and security. Analyze metrics and trace data. Collaborate with development teams for scalability and reliability. Manage app releases and resolve production issues. Conduct root cause analysis. Optimize system performance and capacity planning. Utilize CI/CD tools.