You will be part of a diverse team at Oracle Cloud Infrastructure (OCI) and have the autonomy to do your best work. This role involves addressing exciting challenges in artificial intelligence and cutting-edge cloud infrastructure. You will be responsible for architecting and shipping high-performance AI/ML enabled products and services. We are seeking candidates with experience in software engineering, cloud services development, and senior management. Strong technical understanding and knowledge of large-scale compute, network, and storage systems is required.
Job Description
At Oracle Cloud Infrastructure (OCI), we build the future of the cloud for Enterprises as a diverse team of fellow creators and inventors. We act with the speed and attitude of a start-up, with the scale and customer-focus of the leading enterprise software company in the world.
Values are OCI’s foundation and how we deliver excellence. We strive for equity, inclusion, and respect for all. We are committed to the greater good in our products and our actions. We are constantly learning and taking opportunities to grow our careers and ourselves. We challenge each other to stretch beyond our past to build our future.
You are the builder here. You will be part of a team of really smart, motivated, and diverse people and given the autonomy and support to do your best work. It is a dynamic and flexible workplace where you’ll belong and be encouraged.
This role is available on the OCI AI Data org. We are addressing exciting challenges at the intersection of artificial intelligence and cutting-edge cloud infrastructure. We are building state of the art health care data processing platform and accompanying application framework enabling unlocking the health care data with different modalities for ML purposes. In addition, we also build GPU clusters for large scale AI (including Large language models) training and efficient inferencing as well as integrated offering with healthcare data processing platform. Finally, we are also building LLM based data generators for generating synthetic data in various modalities and domains.
Basic Qualifications
- Bachelor’s degree in computer science, engineering, or an equivalent highly technical field
- 10+ years of software engineering experience (2+ years in cloud services development environment) and a proven track record of successfully architecting and shipping high performance, low latency AI/ML enabled products & services
- 2+ years of senior management experience or Lead experience with a solid track record of building and leading engineering teams
- Strong technical understanding in building complex, scalable, low latency streaming/batch processing AI/ML cloud services
- Proven track record on running operations for a cloud service
- Deep knowledge of large-scale compute, network, and storage systems
- Experience working with Distributed Systems
Preferred Qualifications
- PhD or MS in Computer Science or related technical field (Statistics, Mathematics, AI/ML, Operations Research etc.)
- Experience in scalable distributed backend services design with Cloud Native
- Demonstrated knowledge and experience with machine learning platforms from major providers (AWS, Microsoft Azure and Google Cloud)
- Experience in leading multiple geographically distributed teams
- Handling and working with Compliance frameworks and Healthcare data
Additional Details
- Required to have worked in at least one of the following areas of Healthcare data processing, Large scale GPU infrastructure for ML training and Inferencing, building pipelines from ML models etc
- Familiarity with recent ML/AI-based approaches for building and packaging LLMs and large ML models would be highly preferred
Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans' status or any other characteristic protected by law.
Responsibilities
Responsibilities
- Come in at ground level and build new built on OCI and OCI Native services from scratch.
- Inspire a culture of 'Always On' Service Operations in the team
- Organize, Anticipate, Plan 24x7x365 Service Operations for multiple services
- Present weekly to upper management - Service Escalations and Statistics along with Corrective Actions/Preventive Actions
- Interface with Architects and technical leads to steer them to continuous Feature Improvements
- Directly and indirectly manage Globally Distributed engineering team that is fast growing
- Allocate resources, set priorities, and manage schedules for the team. Work across the platform organization to define and provide inputs to the technology strategy, infrastructure and architecture vision that supports the successful execution of the product roadmap and business strategy.
- Feed the Service Operations requirements, challenges into service development teams for continuous improvements
- Responsible to Hire outstanding SDEs, SREs in a competitive environment. Proven ability to motivate, align, and manage high performing, happy and empowered developers.
Required Skills
- AI (Artificial Intelligence)
- Cloud Services
- Distributed Systems
- ML (Machine Learning)
- Python (Programming Language)
- Software Engineering