Title: Site Reliability Engineer
Who we are
Today’s challenging business environment is more than that – it’s a period of disruption between the pandemic, global business change and internal process complexity. For us to focus on simplicity and the best customer experience, we need great talent and the right skillsets to be successful. This is now a mantra for our Cisco leadership team and for us.
Cisco is transforming its platforms to run the next generation of cloud-native and multi-cloud services. This role offers a superb opportunity to transform how infrastructure platforms are developed, managed with full software automation and at the same time is highly available with self-healing, full lifecycle monitoring, and management capabilities.
Who you will work with
Cloud Infrastructure Platform Services (part of the Hybrid Cloud Infrastructure and Operations) is responsible for the architecture, design, build and the operations of the private cloud (OpenStack and VMware, various PaaS platforms) and Public cloud services (AWS and GCP) to help the clients choose the right IaaS and PaaS offerings for their workloads. CIPS also provides the technical consultation for architectural guidance, deployment options and managed services to help clients from on-boarding to decommission through GitOps operating model. The organization today is focused on strengthening governance, security, observability to ensure complete visibility, security, and manageability of the client workloads to be able to support in a reliable manner.
You will be working alongside other Site Reliability Engineers who are passionate to work with cloud-based applications and push the way we use multicloud platforms in our business. You will help build new microservices and infrastructure to improve the way we conduct our workflow. Our team works in a fast paced, agile environment and ready to learn new things in an instant. Making use of the best Cisco has to offer by integrating these products with the wider spectrum of 3rd Party services to provide a best-in-class Cloud service.
What you’ll Do
You will be a member of a site reliability engineering team that uses tools and integrations for a portfolio of cloud infrastructure services for deploying and managing Cisco’s critical business services through GitOps. We are looking for an enthusiastic individual with extensive experience in Devops and GitOps, to join a dynamic and agile team of talented engineers who enable customers to move their workloads to cloud native hybrid cloud model using both Private and Public Clouds.
Responsibilities:
· Write terraform automations for infrastructure and application deployment of customers in AWS and GCP
· Integrate Observability Stack and manage lifecycle and operations of hybrid cloud infra
· Ensure the quality, performance, robustness, and scalability of the services that are implemented, perform bug fixes and triaging issues
· Automate the development, testing, and deployment processes through CI/CD pipelines (GitHub, GitHub Action, Jenkins, Helm, ArgoCD)
· Champion and drive the adoption of Infrastructure as Code (IaC) practices and mindset
· Software development lifecycle including design, development, testing, packaging, deployment, upgrade and support (Python).
· Collaborate with other core services team members to define roadmaps, write clear user stories with well-defined acceptance criteria, design, and build solutions
· Applies global knowledge of IT Infrastructure to develop standard solutions that can be leveraged across multiple areas; Contributes to the development of new technical principles and concepts
· Looks at new and emerging technology and determine group applicability
· Proactively engages and/or creates cross-functional teams to solve problems or add business value
· Generates ideas and/or technical strategies and presents them to his/her peers for feedback
· Influences others to support/implement ideas and/or technical strategies through collaboration with managers and peers in the organization
· Creating standards and policies and influencing technology decisions beyond own functional area or project; Practice DevOps supporting application from development through the operation lifecycle
· Responsible for determining and setting SLO’s, creating adequate monitoring and logging for features so that SLO can successfully be measured