tiny-cuda-nn - Optimizing Neural Network Training with CUDA

Tiny CUDA Neural Networks

Tiny CUDA Neural Networks is a compact, all-in-one framework designed for training and using neural networks, particularly optimized for speed and efficiency. It is particularly known for its extremely fast "fully fused" multi-layer perceptron (MLP) and a versatile multiresolution hash encoding, which are supported along with various other input encodings, loss functions, and optimizers.

Performance

The key feature of this framework is its speed. It significantly outperforms traditional libraries like TensorFlow when using the fully fused MLP on GPUs such as the NVIDIA RTX 3090. This performance edge is due to its optimized use of GPU capabilities, particularly tensor cores.

Usage

The framework offers a straightforward C++/CUDA API that allows users to configure, train, and use neural network models with ease.

Model Configuration: Users can define the model’s architecture, loss functions, and optimizers through a simple JSON configuration.
Training: The API supports batch processing for training, facilitating efficient use of GPU resources.
Inference: Once trained, the model can be used to make predictions, or "inference," on new data inputs.

Example Application: Learning a 2D Image

Tiny CUDA Neural Networks includes sample applications, such as learning a 2D image from input image data. Users can execute commands using provided examples to understand how the framework processes training over steps, showcasing rapid learning speed on high-end GPUs like the RTX 4090.

Requirements

To make the most of this framework, users will need:

An NVIDIA GPU, ideally with tensor cores, to fully leverage speed enhancements.
A modern C++14 compatible compiler like Visual Studio on Windows or GCC on Linux.
A recent version of NVIDIA CUDA for running the GPU computations.
CMake for building the project files.

Compilation

For both Windows and Linux platforms, cloning the repository and using CMake commands can build the project. This ensures that the framework is correctly set up for development or deployment on your local machine.

PyTorch Extension

For users familiar with Python, Tiny CUDA Neural Networks offers a PyTorch extension. This extension allows the fast MLPs and encodings to be used within PyTorch-based applications, although it's noted that smaller batch sizes may not leverage the framework's full speed potential due to Python's overheads.

Components Overview

The framework supports:

Networks: Offers different implementations of MLPs—fully fused for speed or CUTLASS-based for handling larger networks.
Input Encodings: Includes multiple encoding strategies like composite, frequency, grid, and identity, catering to varied neural network requirements.
Losses: Provides a range of loss functions, from basic L1 and L2 to more complex ones like cross entropy and variance.
Optimizers: Encompasses well-known optimizers like Adam, SGD, and Shampoo, along with advanced options like exponential decay and lookahead.

License and Citation

Tiny CUDA Neural Networks is licensed under the BSD 3-clause license. It is encouraged to cite the framework in any research that benefits from its capabilities, providing proper acknowledgment to the developers.

Publications & Software

The framework has been pivotal in several publications and software, underscoring its relevance and contribution to advancing neural rendering and graphics. Notable projects powered by Tiny CUDA Neural Networks include Instant Neural Graphics Primitives and Real-time Neural Radiance Caching for Path Tracing, among others.

This concise yet powerful framework is ideal for developers and researchers focused on high-efficiency neural network training and inference tasks, especially those leveraging the CUDA capabilities of NVIDIA GPUs.