Deep Learning Performance Architect, Infrastructure
NVIDIA
Shanghai, Shanghai, China
What you'll be doing: Designing and developing software for testing and analysis of our codebases Building scalable automation for build, test, integration, and release processes for publicly distributed deep learning libraries Developing throughout the software stack, from the user experience down to the cluster and database layers Configuring, maintaining, and building upon deployments of industry-standard tools (e.g. Kubernetes, Jenkins, Docker, CMake, Gitlab, Jira, etc) Advancing state of the art in those industry-standard tools and upstreaming contributions to the open source community What we need to see: BS or equivalent experience or higher degree in Computer Science or Computer Engineering 3+ years of relevant experience. Strong programming skills in Python (or similar) and familiarity with C/C++ development Experience setting up, maintaining, and automating continuous integration systems Fluency in SCM (e.g. Git, Perforce) and build systems (e.g. Make, CMake, Bazel) A pragmatic approach to solving problems and collaboration Passion for “it just works” automation and enabling team members Ways to stand out from the crowd: Experience designing and developing automation in Jenkins with Groovy (or similar) Background with distributed systems and cluster/cloud computing, especially with Kubernetes Experience designing and developing unit and integration test frameworks Hands-on experience with code coverage and static code analysis tools Knowledge of GPU computing systems