Closed

No longer accepting applications.

HPC System R&D Engineer

Smart SummaryPowered by Roshi

Join KLA's AI Advanced Computing Labs as an HPC System R&D Engineer to develop system-level HPC technologies for next-generation clusters used in KLA tools. Your role involves exposing limitations in existing solutions, scaling out image processing & AI loads, benchmarking pre-release hardware, and exploring modern HPC systems software for adoption into KLA's tools.

Job Description

KLA’s AI Advanced Computing Labs is looking for an extraordinary HPC System R&D Engineer to join its team to develop system-level HPC technologies that would form the foundation of next-generation clusters used in KLA tools. The technologies would be developed and demonstrated on on-prem clusters that serve as testbeds for next-generation KLA tools.

Your Day-to-day Roles

Expose limitations in existing solutions, based on clusters of CPUs & GPUs, to deploy AI-based solutions on on-prem & cloud infrastructures at scale.

Develop system-level solutions that enable scaling out image processing & AI loads from single GPU to multi-node clusters with multiple GPUs.

Install, benchmark, and evaluate pre-release hardware for early-stage evaluation and prototyping by identifying (or developing) relevant workloads.

Explore modern HPC systems software (such as new distributions of linux) for adoption into KLA’s tools.

Minimum Qualifications

Masters / PhD in Computer Science or related fields; bachelors degree holders with relevant experience and extraordinary track-record will also be considered.

Deep understanding of operating systems, computer networks, and high performance applications

Good mental model of the architecture of a modern distributed systems that is comprised of CPUs, GPUs, and accelerators.

Experience with deployments of deep-learning frameworks based on TensorFlow, and PyTorch on large-scale on-prem or cloud infrastructures.

Solid understanding of container infrastructure such as Docker or singularity, and Kubernetes.

Strong Scripting Skills in Bash, Python, or similar.

Good communication.

Company

KLA

Job Posted

2 years ago

Job Type

Full-time

WorkMode

On-site

Experience Level

3-7 Years

HPC System R&D Engineer

Related Jobs

System Design Engineer (R&D) -Power Quality Solutions

Hitachi Energy

R&D Experienced Professional

Hitachi

HMI/SCADA Engineer (R&D) -Power Quality Solutions

Hitachi Energy

Security Engineer

KLA

Software Quality Engineer

KLA

Engineer, Software Test Automation

KLA