Lead Site Reliability Engineer
Opentext
Waterloo, Ontario, Canada
What You Are Great At Applying broad range of knowledge skills and experiences with an area of expertise to assignments that are received in the form of objectives. Determining how to use resources to meet schedules and goals. Providing guidance to peers within the latitude of established company policy. Using broad knowledge of the organization to impact strategy, policy, and process development as a technical authority and leader with vision for positive business outcomes Leading multi-functional strategic and tactical efforts. Providing leadership by assisting in triage for escalated production incidents. Being a change agent able to develop, implement and maintain policies and processes Collaborating with peer technology organizations, business, clients and management to review application, systems and infrastructure functionality and develop plans for improvement. Leading development and implementation of strategies focused on greater efficiencies to deliver systems. Identifying and implementing strategies to reduce platform Mean-Time-To-Resolution (MTTR) Reliability (SRE) practices and automation principles. Managing continuous improvement of service engineering, delivery, and operational practices. Reduces expenses by eliminating unnecessary downtime and disruptions. Understanding of current business and technology trends to find opportunities for improving services and reducing risk. Adopting and promoting an an SLO mindset with Disaster recovery best practices in mind Effectively navigating organization structure and culture to make positive outcomes. What It Takes 10+ years of related experience, or equivalent Intermediate and advanced level certifications that demonstrate knowledge of Cloud and security concepts Extensive knowledge of: CaaS Technologies including Kubernetes, Google Anthos/Google Kubernetes Engine (GKE), Ingress and PaaS technologies Knowledge of (IaaS) technologies including Hypervisor (VMWare ESX), Routing (VMWare NSX-T) and Load Balancing (F5, etc.) Knowledge of monitoring and logging technologies including VMWare Tanzu Observability/Wavefront, Dynatrace and Splunk In depth knowledge of Network and Infrastructure security best practices including governance Experience in CI/CD Pipeline implementation Automation of build, Packaging and Release Management activities (Build automation, CI/ CD, GIT, Jenkins, Git) Experience with tools like JIRA, GIT/Bitbucket, Confluence, etc. Build self-healing and automated systems Design and build systems to collect, visualize, and store service health indicators Demonstrates ability to achieve successful outcomes in handling difficult situations and work with various customers and management levels. Demonstrates previously working in Agile team working in SCRUM and Kanban formats. Communicate effectively with technical and non-technical audiences. A self-starter with the ability to work independently and in a collaborative team environment