The Job logo

What

Where

Site Reliability Engineering Professional

ApplyJoin for More Updates

You must Sign In before continuing to the company website to apply.

Smart SummaryPowered by Roshi
We are looking for a dedicated Site Reliability Engineer (SRE) to ensure the reliability, availability, and performance of our critical systems and applications. You will be responsible for resolving application-related issues, driving innovation and automation, and collaborating with cross-functional teams. With your expertise in monitoring tools, security best practices, system architecture, and scalability strategies, you will contribute to the overall success of our operations. Join our team and make a significant impact on our system reliability and performance.

JOB DESCRIPTION

Responsibilities

  • SREs responsible for ensuring the reliability, availability, and performance of critical systems and applications.
  • Define and execute the SRE strategy and roadmap in alignment with organizational goals.
  • Resolve application related issues raised by the users through ticketing tools.
  • Follow Change Request (CR) process to implement PROD changes on regular basis.
  • Foster a culture of collaboration, innovation, and continuous improvement within the team.
  • Drive initiatives to reduce incidents and improve system resilience.
  • Identify opportunities for automation to streamline operational tasks and reduce manual intervention.
  • Collaborate with development teams to integrate reliability practices into the software development lifecycle.
  • Conduct post-incident reviews to identify root causes and prevent recurrence.
  • Define scaling strategies and capacity thresholds to maintain system performance.
  • Collaborate with security teams to ensure system security and compliance with industry regulations.
  • Implement security best practices and incident response plans to address security incidents.
  • Maintain comprehensive documentation of system architecture, configurations, and processes.
  • Track and report on key performance metrics to measure the effectiveness of SRE efforts.
  • Make data-driven decisions to improve system reliability and performance.
  • Collaborate with product teams to align SRE efforts with business objectives.

Technical Skills

  • Understanding of continuous integration and continuous deployment (CI/CD) pipelines and associated tools like Jenkins.
  • Should be good in Unix / Linux, database concepts.
  • Familiarity with DevOps principles and practices, emphasizing collaboration between development and operations teams.
  • Proficiency in monitoring tools such as Prometheus, Grafana, Nagios, or equivalent, and the ability to establish effective alerting systems.
  • Proficient in key software and project management areas including production support, customer service, project management, delivery management, and transition & migration.
  • Solid knowledge of essential tools and platforms including Linux/Unix, Jira, Confluence, Docker, Git Lab, Git, Kubernetes, Sonar Qube, Dynatrace, Nexus IQ, and Nexus Repo.
  • Expertise in cost optimization across all levels, both on-premises and in the cloud, with a focus on efficiency and savings.

Soft Skills

  • Strong Communication Skills: Ability to articulate thoughts clearly, listen to varying viewpoints and influence stakeholders
  • Business acumen: Knowledgeable in business strategy and the drivers of organisational performance, including people drivers of performance and financial literacy (e.g. business KPIs, business cases)
  • Bold decision maker: Problem solves to achieve business unit plans, providing expertise and insight to support bold decisions
  • Inspiring communicator: Creates compelling messages that inspire and influence others to engage and support
  • Collaborative Partner: Builds internal and external relationships, collaborating both operationally and technically to deliver business results
  • Commercial acumen: experience of thinking and acting commercially. Strong business acumen and commercial awareness to assess commercial, technical and financial risk, and develop solutions to address
  • Change management: can successfully lead and implement complex change, particularly relating to software platforms and customer solutions, applying effective stakeholder management skills
  • Problem solving: track record of using strong analytical skills and intuition to analyse data and interpret business insights and trends. Uses data to support decision making and develop the best solutions
Set alert for similar jobsSite Reliability Engineering Professional role in Bengaluru, India
BT Group Logo

Company

BT Group

Job Posted

a year ago

Job Type

Full-time

WorkMode

On-site

Experience Level

3-7 Years

Category

Software Engineering

Locations

Bengaluru, Karnataka, India

Qualification

Bachelor

Applicants

Be an early applicant

Related Jobs

BT Group Logo

Site Reliability Engineering Professional

BT Group

Bengaluru, Karnataka, India

Posted: a year ago

The Mobile Systems Development unit designs, builds, and maintains the UK voice and mobile communication and collaboration services. This role is responsible for ensuring system uptime, building automation solutions, and working alongside developers to add value. It also involves monitoring and resolving issues, supporting platform upgrades, and ensuring compliance. The role requires experience in deploying production systems, working with load balancers, SSL/TLS configuration, data streaming, infrastructure automation, containerization, and source control. Knowledge of incident and change management, communication skills, and familiarity with Openstack and machine learning is desirable.