nanodl - Develop Transformer Models Easily Using the Jax Library

Introduction to NanoDL

NanoDL is an innovative library designed for AI and machine learning experts who want to create and train transformer models efficiently and from the ground up. This library is built using Jax, a flexible yet powerful framework known for speeding up neural network development and supporting distributed training. NanoDL mitigates some of the challenges associated with the development of transformer models in Jax by offering a host of features that make the process more accessible and effective.

Key Features

Diverse Model Support: NanoDL offers an extensive selection of pre-designed models such as Gemma, LlaMa3, Mistral, GPT3, GPT4, T5, Whisper, and CLIP, among others. These models serve a variety of purposes in both natural language processing and computer vision.
Comprehensive Components: The library includes a variety of blocks and layers that allow users to build customized transformer models from scratch, providing flexibility in model creation.
Distributed Training: NanoDL includes data-parallel distributed trainers that enable model training on multiple GPUs or TPUs, eliminating the need for manual training loops.
Advanced Algorithms: It supports advanced algorithms for tasks like Gaussian Blur and BLEU scores, suitable for handling NLP and computer vision applications.
Unique Features: The library introduces elements not found in Flax/Jax, including RoPE, GQA, MQA, and SWin attention layers, allowing for more sophisticated model architecture.
GPU/TPU Acceleration: Classical machine learning models such as PCA, KMeans, Regression, and Gaussian Processes are accelerated for GPUs and TPUs.
Pedagogic Code Implementation: The codes are written in a teaching style, with each model self-contained within a single file, ensuring ease of understanding and experimentation.

Getting Started

NanoDL requires Python 3.9 or later, with JAX, FLAX, and OPTAX installed. The packages can be installed using the Python package manager pip:

pip install --upgrade pip
pip install jax flax optax
pip install nanodl

This quick setup allows you to start designing and training your models efficiently.

Example Usages

Language Model: The example shows how to set up a language model such as GPT4 using datasets and data loaders, configuring the model parameters, and performing training.
Vision Model: Illustrated by a diffusion model example, where you can learn how to process image data and train the model.
Audio Model: Demonstrated with a Whisper model to handle and train on audio data.
Reward Model for RLHF: Showcased with a Mistral-based Reward model, helpful in Reinforcement Learning through Human Feedback (RLHF).
Classical ML: Usage of PCA for dimensionality reduction in datasets.

Contributions and Community

NanoDL warmly welcomes contributions from the community. Users can help by writing documentation, contributing code improvements, reporting bugs, and providing real-world usage examples. Collaboration is encouraged through their Discord Server.

Vision for the Future

The primary mission of NanoDL, reflecting its name "Nano Deep Learning," is to develop smaller but highly efficient versions of existing models that can compete with their larger counterparts. By restricting parameters to under 1 billion, NanoDL aims to make these models more accessible to a wider audience, including those with limited computational resources.

Whether you are interested in contributing to the development of small, highly efficient models, or looking to implement state-of-the-art transformers in your projects, NanoDL is an excellent resource worth exploring.