Site Reliability Engineer (SRE)

You must Sign In before continuing to the company website to apply.

Smart SummaryPowered by Roshi

IBM is seeking a Site Reliability Engineer (SRE) to ensure the seamless operation of our quantum computing systems. You will design, build, and maintain critical systems, contributing to high availability and efficient problem resolution. Your responsibilities include ensuring reliability, scalability, high availability, designing and maintaining quantum systems, implementing and maintaining CI/CD pipelines, monitoring system performance, executing testing, implementing security measures, responding to system alerts, and documenting system. Required skills include Python, Go (Golang), JavaScript, TypeScript, C++, Rust, Linux, DevOps tools, quantum computing systems, distributed systems, Red Hat, OpenShift, RHEL, Docker, Podman, Kubernetes, service mesh technologies, GitOps, infrastructure as code tools, monitoring/logging/tracing tools, Jinja2, and cloud platforms. Preferred skills include advanced knowledge of quantum computing, Helm, DevOps/Agile methodologies, automating manual processes, and database technologies.

Your Role and Responsibilities
IBM is seeking a Site Reliability Engineer (SRE) to be a key player in ensuring the seamless operation of our quantum computing systems. Working closely with our development teams, you will design, build, and maintain critical systems, creating software and systems to manage, monitor, and scale our quantum computing platforms. Your expertise will contribute to the high availability, optimal performance, and efficient problem resolution of our technology.

Responsibilities:

Ensure the reliability, scalability, and high availability of our quantum computing systems.
Collaborate with development teams to design, deploy, and maintain quantum systems.
Implement and maintain CI/CD pipelines using modern tools like Concourse, Tekton, and GitLab CI/CD.
Monitor system performance using Grafana, Sysdig, LogDNA, Datadog, and other tools, troubleshoot and resolve issues.
Develop and execute monitoring, load, and stress testing, ensuring system resilience.
Implement security measures to safeguard system integrity, leveraging tools like Vault.
Respond to system alerts using PagerDuty and similar tools to ensure swift issue resolution.
Create and maintain comprehensive system documentation, utilizing Github for version control and collaboration.

Required Technical and Professional Expertise

Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent work experience.
Minimum 5+ years of proven experience as a Site Reliability Engineer or similar role in a software development setting.
Proficiency in Python, Go (Golang), JavaScript, TypeScript, C++, and Rust.
In-depth knowledge in at least one of these languages is required.
Strong Linux skills, including command-line tools, shell scripting, and system diagnostics.
Familiarity with fundamental DevOps tools like SSH, Git, and Makefiles.
Experience with quantum computing systems and Qiskit.
Knowledge of distributed systems and backend systems architecture.
Experience with Red Hat, OpenShift, RHEL, and container technologies like Docker and Podman.
Proficiency with Kubernetes and familiarity with service mesh technologies like Istio.
Experience with GitOps and infrastructure as code tools such as ArgoCD, Ansible, and Terraform.
Familiarity with “Pipelines as Code” principles and practices.
Experience with monitoring, logging, and tracing tools such as Grafana, Sysdig, LogDNA, Datadog, OpenTelemetry, and Prometheus.
Experience with templating languages like Jinja2.
Familiarity with cloud platforms like IBM Cloud, AWS, GCP, or Azure.

Preferred Technical and Professional Expertise

Master’s degree in Computer Science, Engineering, or a related field.
Experience in a quantum computing environment.
Advanced knowledge of Quantum Information Science principles and technologies.
Experience with Helm for managing Kubernetes applications.
Familiarity with the principles and practices of DevOps and Agile methodologies.
Experience automating manual processes, customizing and optimizing CI/CD pipelines.
Knowledge of database technologies such as PostgreSQL, MySQL, MongoDB, and InfluxDB.
Certifications related to Kubernetes, Red Hat, or other relevant technologies.

Set alert for similar jobsSite Reliability Engineer (SRE) role in Gurgaon, India or Bangalore Urban, India

Company

IBM

Job Posted

a year ago

Job Type

Full-time

WorkMode

On-site

Experience Level

3-7 Years

Related Jobs

Site Reliability Engineer

Groww

Gurgaon, Haryana, India

+2 more

Posted: a year ago

Monitor and troubleshoot system performance, availability, and security. Analyze metrics and trace data. Collaborate with development teams for scalability and reliability. Manage app releases and resolve production issues. Conduct root cause analysis. Optimize system performance and capacity planning. Utilize CI/CD tools.

Senior Site Reliability Engineer

Microsoft

Bangalore Urban, Karnataka, India

Posted: a year ago

Join Microsoft as a Senior Site Reliability Engineer and work with a highly talented engineering team to deliver software improvements for Azure Cosmos DB. As a Senior SRE, you will ensure service stability, performance, and reliability through software development and system design. This is a full-time hybrid opportunity located in Bangalore Urban, Karnataka, India.

Site Reliability Engineer (SRE) – Automotive IT

Qualcomm

Hyderabad, Telangana, India

Posted: 7 months ago

As a Site Reliability Engineer (SRE) in Automotive IT at Qualcomm, you will collaborate with a diverse team to ensure stable, sustainable, and secure infrastructure and services. Your role involves modernizing applications, deploying new technologies, and optimizing systems in a high-trust culture. Join us in the Invention Age to transform 5G potential into groundbreaking products.

Senior Linux Site Reliability Engineer (Pacemaker)

SAP

Bangalore Urban, Karnataka, India

Posted: a year ago

Seeking a Senior Linux Site Reliability Engineer with expertise in Pacemaker. Troubleshoot complex Pacemaker software and Linux OS/ infrastructure issues. Develop automation for stability and reliability. Standardize and simplify server operations using DevOps. Requires 7-12 years of related experience with advanced technical background in Linux based server operating systems. Strong knowledge of Linux HA clusters, networking, and IT security. Experience with script programming and server automation tools. Fluency in English and ability to work in global teams.

Site Reliability Engineer, APAC

Canonical

Gurgaon, Haryana, India

+4 more

Posted: a year ago

As a Site Reliability Engineer at our company, you will be responsible for bringing Python software engineering skills and rigour to the operations domain. You will work with OpenStack, Kubernetes, and software-defined storage to enable devsecops for applications running on our infrastructure. You must be a software engineer fluent in Python and have a genuine interest in the full open source infrastructure stack. You should also have experience working in a high-pressure operations environment with mission-critical services. Join us and gain experience in a broad range of cloud technologies.

Site Engineer

Ericsson

Bangalore Urban, Karnataka, India

Posted: a year ago

Plan, conduct and validate on-site surveys using survey tools and devices. Design site drawings and maps. Compile and distribute audit reports to stakeholders. Create detailed design documentation and 3D site digital designs using BIM technology. Good knowledge of Ericsson Radio System, site installation, and intelligent site survey required. Proven experience in site engineering. BE/B.Tech or higher education.