DeepLearningExamples - Discover NVIDIA GPU-Optimized Deep Learning Examples for Enhanced Performance

Deep Learning Examples: A Comprehensive Guide

Introduction

The NVIDIA Deep Learning Examples for Tensor Cores repository is a comprehensive resource, offering state-of-the-art deep learning models designed for training and deployment processes. These models provide exceptional reproducible accuracy and optimal performance, leveraging the NVIDIA CUDA-X software stack on NVIDIA's Volta, Turing, and Ampere GPUs.

NVIDIA GPU Cloud (NGC) Container Registry

NVIDIA has streamlined the deep learning process by offering these models in Docker containers, available in the NVIDIA GPU Cloud (NGC) container registry. Updated monthly, these containers include:

The latest examples from the NVIDIA repository.
The newest contributions from NVIDIA, shared with their respective frameworks.
Updated versions of NVIDIA Deep Learning software libraries, such as cuDNN, NCCL, and cuBLAS, which undergo rigorous monthly quality checks to ensure peak performance.
Monthly release notes are available for each optimized container, providing a clear update log.

Computer Vision

NVIDIA's computer vision offerings within this repository include a variety of models, such as EfficientNet and ResNet. These models are adaptable across multiple deep learning frameworks like PyTorch and TensorFlow, supporting features such as Automatic Mixed Precision (AMP) and TensorRT optimizations. They can leverage multi-GPU setups and some support ONNX and NVIDIA Triton Inference Server for scalable deployment.

Natural Language Processing

The repository boasts extensive tools for natural language processing (NLP), with models such as BERT and GNMT. Available in PyTorch and TensorFlow versions, these models are designed to run efficiently with AMP and facilitate multi-GPU and multi-node training, improving speed and accuracy.

Recommender Systems

In the recommender system domain, models like DLRM and Wide&Deep are illustrated. These systems are well-optimized for PyTorch and TensorFlow, utilizing AMP for performance enhancement and supported by multi-GPU configurations.

Speech to Text and Text to Speech

For those working in speech-related AI, NVIDIA provides models such as Jasper for speech recognition and Tacotron 2 for speech synthesis. These models are crafted to support AMP and TensorRT, ensuring they are effective for high-performance, speech-based applications.

Graph Neural Networks and Time-Series Forecasting

NVIDIA rounds out their offerings with models suitable for graph neural networks and time-series forecasting. Examples include the SE(3)-Transformer for drug discovery and the Temporal Fusion Transformer for forecasting tasks, both optimized for parallel computations on GPUs.

NVIDIA Support and Additional Features

Each network README in the repository outlines the level of support available, ranging from ongoing updates to notable one-time releases.

This repository also educates users on several advanced concepts:

Multinode Training: Supported on clusters with pyxis/enroot Slurm.
Deep Learning Compiler (DLC): Tools like TensorFlow's XLA and PyTorch's JIT for accelerative model compilation.
Automatic Mixed Precision (AMP): Automating precision enhancements on newer GPU architectures.
TensorFloat-32 (TF32): Providing significant computational speedups in matrix operations on NVIDIA's A100 GPUs.
Jupyter Notebooks: Interactive environments to develop and visualize project data.

Feedback / Contributions

Recognizing the value of community interaction, NVIDIA encourages users to contribute to the repository. Feedback, enhancements, and issue resolutions are critical components of their collaborative process, with contributions welcomed through GitHub Issues and pull requests.

Known Issues

Each model comes with a network README indicating any known issues, facilitating comprehensive understanding and the opportunity for community-driven solutions.