The Job logo

What

Where

Senior Software CI System Architect

ApplyJoin for More Updates

You must Sign In before continuing to the company website to apply.

What you’ll be doing:

Work in a multifaceted agile software development team with very high production quality standards.

Participate in the full lifecycle of tool development, test, and deployment.

Design and improve systems to schedule and utilize resources, improve performance, increase reliability, and provide better throughput.

Work closely with other team members and users to understand their build and test processes and needs.

Craft and develop reliable, easy to use environments for hundreds of engineers around the world.

Directly contribute to the overall quality of and improve time to market of NVIDIA's chips and deep learning software stacks.

 

What we need to see:

BS in Computer Science (or equivalent experience) or MS (preferred)

At least 5+ years of experience

Experience developing and deploying automated CI systems using Jenkins, GitLab CI, etc.

Strong software engineering process skills

Experience with Linux development programming tools

Background with SCM tools such as Perforce, Git, Subversion, ClearCase, etc.

Strong object-oriented programming skills

Strong interpreted language application skills, Python preferred

Excellent planning and communication skills

Flexibility/adaptability working in a dynamic environment with changing requirements

 

Ways to stand out from the crowd:

Experience with chip design workflows

Deep understanding of SCM processes and tools for large, multi-site development, including branching, integration, and release strategies

Set alert for similar jobsSenior Software CI System Architect role in Westford, United States, Durham, United States, or Santa Clara, United States
NVIDIA Logo

Company

NVIDIA

Job Posted

a year ago

Job Type

Full-time

WorkMode

On-site

Experience Level

3-7 years

Locations

Westford, Massachusetts, United States

Durham, North Carolina, United States

Santa Clara, California, United States

Qualification

Bachelor

Applicants

Be an early applicant

Related Jobs

NVIDIA Logo

Senior Solutions Architect, Machine Learning

NVIDIA

Durham, North Carolina, United States

Posted: a year ago

What You’ll Be Doing: A considerable part of the day-to-day job is staying up to date on pioneering Deep Learning and Machine Learning ecosystems. You'll be called on to help architect and scale high-performance, distributed AI deployments on-prem or in the cloud built with the latest NVIDIA GPU supercomputers. Document what you know and teach others. This can vary from building targeted training for partners and other Solutions Architects to writing whitepapers, blogs, and wiki articles, to working through challenging problems with a partner on a whiteboard. Answer questions and provide mentorship. Work with Partner Business Managers to assist partners and customers on their critical projects. You will help them build their GPU and DPU-enabled Accelerated Compute datacenters or cloud services to get the most out of their investment. Lead and develop proofs-of-concept (PoCs) for solutions applied to enterprise and industrial applications such as LLM, NLP/NLU, recommender systems, image recognition, video analytics, and DPU applications. Support the business development team through the sales process for GPU/DPU/Network hardware/software products. Responsible for technical relationships and enabling customers to build innovative NVIDIA technology solutions. Partner with NVIDIA Engineering, Product, and Sales teams to secure design wins for customers. Enable development and growth of NVIDIA product features through customer feedback and PoC evaluations.   What We Need To See: BS, MS, or PhD in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, or other Engineering fields or equivalent experience. 5+ years of work-related experience in Deep Learning and Machine Learning, including deep learning frameworks TensorFlow or PyTorch, GPU, and CUDA experience extremely helpful. Experience working with DevOps on-prem or in cloud environments, including but not limited to Docker/Containers, Kubernetes, cloud APIs, IaaS and Data Center deployments; additional experience with DPUs applications development will be beneficial. Deep understanding of dense data center design, including computing, storage, networking, cloud APIs, and IaaS. Ability to multitask efficiently in a dynamic environment. Strong analytical and problem-solving skills. Clear written and oral communication skills with the ability to effectively collaborate and coordinate across cross-functional teams in engineering, sales, marketing, product, and program management. Comfortable working in a customer-facing environment. C/C++ and Python programming skills.   Ways To Stand Out From The Crowd: Excellent customer-facing skills and background. Skilled in deploying ML/DL models at scale on cloud computing clusters in production. Development experience with NVIDIA software libraries and GPUs or DPUs. Knowledge of LLM, MLOps, DevOps, and Cloud-oriented workflows using Docker/containers, Kubernetes, cloud APIs, data center deployments, etc. Able to think creatively to debug and solve complex problems.

NVIDIA Logo

Principal Software Architect - Data Center

NVIDIA

Santa Clara, California, United States

Posted: a year ago

What you’ll be doing: Drive the system architecture for a complex server platform in a multi-functional environment. Work directly with major customers to understand their requirements and work to align their roadmap with NVIDIA’s roadmap. Work with business partners and vendors to shape their products to meet NVIDIA’s needs. Develop a roadmap of new technologies and protocols and drive their design and adoption. Mentor architects and engineering teams to grow them into future leaders. Make key technical decisions even when faced with ambiguity, and mitigate execution risks by following left shift strategy.   What we need to see: Deep experience in designing architecture for scalable and performant server systems, particularly at the SW/HW interface. Expertise in Out of Band and Inband management architectures. Knowledge of device management protocols such as MCTP, PLDM and RDE. Knowledge of system management protocols such as Redfish and IPMI. Experience working with platform security experts to define tradeoffs between security and ease of use. Demonstrable experience in implementing left shift strategy to de-risk program execution. Excellent written and verbal communication skills. BS or MS degree in Computer Engineering, Computer Science, or related degree or equivalent experience. 15+ years in the area of System architecture and design.   Ways to stand out from the crowd: Knowledge of cloud and cluster level deployment and management systems. Participation and contributions in standards bodies such as OCP and DMTF. Familiarity with CXL architectures. Knowledge in storage and networking technologies.

NVIDIA Logo

Senior HPC Scheduler Engineer

NVIDIA

Santa Clara, California, United States

Posted: a year ago

What you’ll be doing: Provide engineering solutions and prototypes to enable efficient resource management and job scheduling for large scale clusters, ensure technical relationships with internal and external engineering teams, and assist system architects and machine learning/deep learning engineers in building creative solutions based on NVIDIA technology. Be an internal reference for scheduling and resource management concepts and methodologies among the NVIDIA technical community. Test, evaluate, and benchmark new technologies and products and work with vendors, partners and peers to improve functionality and optimize performance. What we need to see: 5+ years of experience designing and running scheduling and resource management systems in large datacenter/AI/HPC solutions. Knowledge and experience with resource management / scheduling code bases: SLURM preferred, other implementations (LSF, SGE, Torque...). Proven understanding of performance clusters, infrastructure and workload patterns. Experience using and installing Linux-based server platforms. C/Python/Bash/Lua programming/scripting experience. Experience working with engineering or academic research community supporting HPC or deep learning. Strong teamwork and both verbal and written communication skills. Ability to multitask efficiently in a very dynamic environment! Action driven with strong analytical and troubleshooting skills. Desire to be involved in multiple diverse and innovative projects. BS in Engineering, Mathematics, Physics, or Computer Science or equivalent experience. MS or PhD desirable. Ways to stand out from the crowd: Experience with HPC cluster administration for AI. Experience deploying containerized services. Experience with orchestrators (e.g. Kubernetes). Demonstrated work with Open-Source software: building, debugging, patching and contributing code. Experience tuning memory, storage, and networking settings for performance on Linux systems. Exposure to monitoring and telemetry systems.voyager