The Job logo

What

Where

Senior Site Reliability Engineer

ApplyJoin for More Updates

You must Sign In before continuing to the company website to apply.

Smart SummaryPowered by Roshi
Architect, design, and provide guidance on cloud-based services and applications, with a focus on AWS, Azure, and GCP, including Kubernetes integration. Lead in deploying, monitoring, and scaling Kubernetes clusters across multiple cloud platforms, ensuring the stability and reliability of the production environment. Collaborate with cross-functional teams to strategize and improve multi-tenant and multi-cloud architecture, demonstrating leadership in Infrastructure, CI/CD, Automation, and Observability. Mentor junior team members and provide expert troubleshooting for infrastructure-related incidents in a multi-cloud environment, including managing on-call shifts. Drive the optimization of system performance, security, and overall quality of service across various cloud platforms. Implement and standardize best practices in multi-cloud infrastructure management using Infrastructure as Code (IaC) tools, such as Terraform.

What You’ll Do

--> Architect, design, and provide guidance on cloud-based services and applications, with a focus on AWS, Azure, and GCP, including Kubernetes integration.

--> Lead in deploying, monitoring, and scaling Kubernetes clusters across multiple cloud platforms, ensuring the stability and reliability of the production environment.

--> Collaborate with cross-functional teams to strategize and improve multi-tenant and multi-cloud architecture, demonstrating leadership in Infrastructure, CI/CD, Automation, and Observability.

--> Mentor junior team members and provide expert troubleshooting for infrastructure-related incidents in a multi-cloud environment, including managing on-call shifts.

--> Drive the optimization of system performance, security, and overall quality of service across various cloud platforms.

--> Implement and standardize best practices in multi-cloud infrastructure management using Infrastructure as Code (IaC) tools, such as Terraform.

What You’ll Need to Succeed
  • Extensive experience in systems/DevOps/CloudOps or Site Reliability Engineering related roles, particularly with AWS, Azure, and GCP, including Kubernetes.
  • Expertise in deploying, monitoring, and scaling Kubernetes clusters in high-traffic production environments across multiple clouds.
  • In-depth understanding of containers, containerization concepts, and multi-cloud strategies.
  • Proven experience in configuring, deploying, and maintaining cloud infrastructure using AWS, Azure, GCP services, and Infrastructure as Code (IaC) tools.
  • Mastery of CI/CD systems, such as Gitlab or similar tools, in a multi-cloud environment.
  • Advanced understanding of Linux fundamentals.
  • Proficiency in one or more scripting languages, such as Bash or Python.
  • Demonstrated ability to analyze distributed systems across different clouds, debug, and solve complex problems.
  • Commitment to continuous learning and the application of best engineering practices for building high-performance, reliable, and scalable applications in a multi-cloud environment.
  • Ability to drive new technologies and optimize delivery timelines across various cloud platforms.
  • Excellent communication, collaboration, and leadership skills.
  • Completed M.S. or B.S. in Computer Science or equivalent experience.
It's Great If You also Have:
  • Relevant advanced certifications in AWS, Azure or GCP, such as AWS Certified Solutions Architect – Professional, Azure Solutions Architect Expert, Google Professional Cloud Architect.
  • Certified Kubernetes Administrator (CKA), Certified Kubernetes Application Developer (CKAD), or Certified Kubernetes Security Specialist (CKS).
  • 5+ years of Extensive hands-on experience in developing and leading cloud-native applications across different cloud platforms.


 

Set alert for similar jobsSenior Site Reliability Engineer role in Chennai, India or Bengaluru, India
Uniphore Logo

Company

Uniphore

Job Posted

a year ago

Job Type

Full-time

WorkMode

Hybrid

Experience Level

3-7 Years

Category

Engineering

Locations

Chennai, Tamil Nadu, India

Bengaluru, Karnataka, India

Qualification

Bachelor

Applicants

Be an early applicant

Related Jobs

Freshworks Logo

Staff Engineer - Site Reliability

Freshworks

Chennai, Tamil Nadu, India

Posted: a year ago

Job Description SRE at Freshworks Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Freshwork's services—both our internally critical and our externally-visible systems—have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE’s will keep an ever-watchful eye on our systems capacity and performance. Much of our SRE focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you’ll have the opportunity to manage the complex challenges of scale, while using your expertise in coding, algorithms, complexity analysis and large-scale system design.    Responsibilities: Design, write, and deliver software to improve the availability, latency, and efficiency of Freshwork’s Products & Platforms. Manage availability, latency and performance of mission critical services and build automation to prevent problem recurrence. Independently determine and develop architectural approaches and Infrastructure solutions. Defining strategy, vision, and roadmap to develop CI/CD, Application hosting, Security and Compliance standards and guidelines across Freshworks. Drive blameless postmortems for large scale incidents. Define and drive automation and orchestration strategies. Strategize cost optimization across Freshworks Cloud environment.   Qualifications Requirements: 12+ years of Software Engineering and Coding Experience in C# / Python / JavaScript / Golang (one or more).  12+ years of Experience handling Linux and Windows Systems at a very large scale.  6+ years of Hands-on experience on Containers & Container Orchestration Tools. 10+ years of proven Experience with designing, building, supporting and observing large-scale distributed systems/services/infrastructure. Strong Experience in Microservices Architecture, Service Mesh implementation and instrumenting XaaC (Infrastructure, Software, Network, Policy, Security) across global scale systems Hands-on Experience in defining and driving Disaster Recovery across Freshworks Products & Platforms. Proficiency in implementing FinOps and cloud cost optimization strategies. Experience and knowledge of incorporating testing, compliance and security requirements within code release pipelines.  Proficiency in algorithms, data structures, complexity analysis, and software design. Ability to turn technical deep-dives into code, networking, operating systems, and storage, with ability to participate in an executive strategy discussion. Data Mining & Data Analytics experience utilizing big data and\or relational data bases technologies. Excellent experience in designing & architecting solutions using OpenSource Software (OSS). Intellectual Curiosity, Problem Solving and Storytelling presentation.