levanter - Scalable and Reproducible Language Models Training Framework

Introduction to Levanter

Levanter is an advanced framework designed for training large language models (LLMs) and other foundation models. The framework emphasizes three critical aspects: legibility, scalability, and reproducibility.

Key Features

Legible: Levanter simplifies complex deep learning code with the help of the Haliax tensor library, making it easy to read and compose while maintaining top-tier performance.
Scalable: It is designed to handle large models effectively and can be deployed on various hardware, including GPUs and TPUs, making it suitable for extensive computational tasks.
Reproducible: Levanter ensures that the same configuration yields consistent results each time, ensuring bitwise determinism even with interruptions like preemption and resumption during training.

This framework has been constructed using JAX, Equinox, and Haliax to harness their robust capabilities in deep learning.

Features

Distributed Training: Supports training across multiple TPUs and, in the future, GPUs. It uses techniques like FSDP and tensor parallelism.
Compatibility: Facilitates seamless integration with the Hugging Face ecosystem, allowing for smooth import and export of models, tokenizers, and datasets.
Advanced Performance: Rivals performance levels of other advanced frameworks from major players like MosaicML and Google.
Cached On-Demand Data Preprocessing: Efficiently preprocesses data only once, speeding up subsequent training sessions by caching results inline.
Modern Optimization Techniques: Incorporates the Sophia optimizer for faster performance, nearly twice as fast as the traditional Adam optimizer. Also supports optimization with Optax.
Versatile Logging Options: Compatible with multiple logging services like WandB and TensorBoard, even allowing logging within JAX's compiled functions.
Distributed Checkpointing: Comprehensive checkpointing backed by TensorStore, enabling flexibility in training across different numbers of hosts.

Getting Started

For those new to Levanter, the documentation provides a wealth of resources to get started, including installation instructions and sample configurations. Installation is straightforward using pip, and a detailed guide is available for both TPU and GPU setups.

Training Examples

Training a GPT2-nano: Users can start by training a small model like GPT2-nano using datasets such as WikiText-103.
Training with Custom Data: Levanter allows customization through configuration files, enabling training with personal datasets or using existing datasets from platforms like Hugging Face.

Supported Architectures

Levanter currently supports architectures such as GPT-2, LLama 1 or 2, Backpacks, and MosaicML's MPT, with plans to incorporate more in the future.

Distributed and Cloud Training

Levanter supports cloud-based training on TPU VMs, with a comprehensive guide available for seamless setup and operation. There's also support for CUDA-based environments for those using GPUs.

Contribution and Licensing

The project encourages contributions from the community, and detailed contribution guidelines are available. Levanter is released under an Apache License, Version 2.0.

Whether you're a researcher or developer, Levanter provides the tools needed to train cutting-edge language models with efficiency and ease.