The Job logo

What

Where

Principal Site Reliability Engineer

ApplyJoin for More Updates

You must Sign In before continuing to the company website to apply.

Smart SummaryPowered by Roshi
The System Reliability Engineer is responsible for 24/7 availability for Zeta’s cloud SaaS platform. Build, Deploy and Manage business applications to cloud platforms using Containers orchestration, Service mesh, API gateways, CI/CD components & Observability stacks. Collaborate with Product managers, Designers and Developers in self-sufficient teams to implement and follow best SRE practices.

What is the job like

  • The System Reliability Engineer is responsible for 24/7 availability for Zeta’s cloud SaaS platform.
  • Build, Deploy and Manage business applications to cloud platforms using Containers orchestration, Service mesh, API gateways, CI/CD components & Observability stacks. 
  • Collaborate with Product managers, Designers and Developers in self-sufficient teams to implement and follow best SRE practices.
  • Provide technical guidance to the team on managing availability and performance of mission critical services on building automation to prevent problem recurrence and building automated responses for non-exceptional service conditions.
  • Lead Incident Management during Incidents.
  • Participation in an on-call rotation and operate effectively in a global 24x7 environment.
  • Responsible for driving MTTR as per the Incident SLA.
  • Responsible for having 100% coverage for various alerts covering Application, Infrastructure, Security, Flows etc 
  • Own service or services availability.

 

What are we looking for?

  • 9+ years of relevant work experience.
  • Software engineers with a bent towards Operations engineering or vice versa, Kernel, Networking and OS fundamentals.
  • Public and Private cloud solutions (Any of AWS, GCP, Azure, Openshift, DIY clouds etc)
  • Expertise in distributed systems, storage systems, or databases, algorithms and data structures and Unix/Linux systems internals (e.g., filesystems, system calls) and administration.
  • Experience with Continuous Integration and Deployment (CI/CD) and release orchestration - Jenkins, ArgoCD, AWS Pipeline, GitLab, GitHub Actions etc..
  • Knowledge about K8s, envoy, API gateway (Kong or Ambassador or Traefik), Service Mesh (istio or consul mesh or linkerd/conduit)
  • Experience with Infrastructure as code & Configuration management with tools like Terraform, Helm, Ansible.
  • Observability practices and toolchains (Monitoring, Metrics, Logging, Alerts & Tracing with tools like ELK/EFK, Prometheus, Grafana, alert manager, Sysdig, Datadog, NewRelic, Zabbix etc..
  • Experience in programming/scripting language (e.g., Bash, Python, Go, Java,
  • Cloud security / DevSecOps Practices.
  • Experience designing, analyzing, and troubleshooting large-scale distributed systems.
  • Excellent communication skills and a sense of ownership, with a systematic problem-solving approach.
  • Most importantly, Learners for life and An Effective Engineer!
  • Bonus : Chaos & resilience engineering concepts & experience
Set alert for similar jobsPrincipal Site Reliability Engineer role in Bengaluru, India
Zeta Logo

Company

Zeta

Job Posted

a year ago

Job Type

Full-time

WorkMode

On-site

Experience Level

8-12 Years

Category

Engineering

Locations

Bengaluru, Karnataka, India

Qualification

Bachelor

Applicants

Be an early applicant

Related Jobs

Zeta Logo

Manager Site Reliability Engineer

Zeta

Bengaluru, Karnataka, India

Posted: a year ago

Assigns and monitors work of technical personnel, ensures application development and deployment is done in the best possible way, implements quality control and review systems. Manages design and development of custom tools and integration with existing tools to increase engineering productivity. Takes responsibility for the architecture and technical leadership of the entire DevOps infrastructure.

Netskope Logo

Staff Site Reliability Engineer

Netskope

Bengaluru, Karnataka, India

Posted: a year ago

About the role Please note, this team is hiring across all levels and candidates are individually assessed and appropriately leveled based upon their skills and experience. The SRE Data / Provisioner team supports the Netskope Data Product Suite, and Provisioner, a critical component of our foundational technologies and the single source of truth for all user data across all Netskope Apps. We are a team of software engineers focused on improving availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of the engineering stacks. If you are passionate about solving complex problems and developing cloud services at scale, we would like to speak with you. Job Responsibilities   Partner closely with our development teams and product managers to architect and build features that are highly available, performant and secure Develop innovative ways to smartly measure, monitor & report application and infrastructure health Gain deep knowledge of our application stack Experience improving the performance of micro-services and solve scaling/performance issues Capacity management and planning Function well in a fast-paced and rapidly-changing environment Participate in 24X7 on-call rotations. Preferred Qualifications BS or MS in Computer Science or equivalent technical degree or related practical experience Preferred Technical Skills: 10+ years experience with troubleshooting Unix/Linux Understanding of Networking concepts - TCP/IP, SSL/TLS, IPSec, GRE, VPN Experience with algorithms, data structures, complexity analysis, and software design Experience in one or more of the following: C, C++, Python, Go Experience in managing a large-scale web operations role Bonus points for experience with Ansible, Kubernetes, SQL and NoSQL datastores, CI/CD Hands-on working with private or public cloud services in a highly available and scalable production environment.  Desired Technical Skills: Knowledge of distributed systems is a big plus.   Additional Skills Great written and verbal communication Ability to work for a geo-distributed cross-functional group Demonstrated ability to own and deliver projects independently Demonstrated ability of technical mentoring and coaching  Strong interpersonal communication skills (including listening, speaking, and writing) and the ability to work well in a diverse, team-focused environment with other SREs, developers, Product Managers, etc