libai - Comprehensive Open-Source Toolbox for Training Large-Scale Models with OneFlow

LiBai Project Introduction

Overview

LiBai is an open-source large-scale model training toolbox that is built on OneFlow. It is designed to facilitate efficient and scalable model training and is packed with features that cater to both computer vision (CV) and natural language processing (NLP) tasks. The current version of LiBai supports OneFlow 0.7.0, ensuring compatibility with this robust deep learning framework.

Key Highlights

Parallel Training Components

LiBai stands out with its comprehensive support for parallel training components. It offers various parallelism strategies including Data Parallelism, Tensor Parallelism, and Pipeline Parallelism. The extensibility of LiBai also allows it to incorporate new forms of parallelism, making it a versatile tool for researchers and developers.

Diverse Training Techniques

The toolbox comes equipped with numerous training techniques that users can implement out of the box. These include Distributed Training, which leverages multiple computing nodes; Mixed Precision Training for improved performance with reduced precision computations; Activation Checkpointing and Recomputation to manage memory consumption efficiently; Gradient Accumulation for handling large batch sizes; and the Zero Redundancy Optimizer (ZeRO) to optimize memory usage in large model training.

Support for CV and NLP Tasks

LiBai makes data preparation convenient by providing predefined processes for popular datasets in both the computer vision and natural language processing domains. This includes datasets like CIFAR, ImageNet for vision tasks, and BERT Dataset for NLP tasks, simplifying the workflow of data scientists and engineers.

User-Friendly Design

The modular design of LiBai emphasizes ease of use. Features such as the LazyConfig system offer a flexible configuration syntax without rigid predefined structures. This adaptability is further supported by its user-friendly trainer and engine, facilitating both the deployment of existing research and the building of new projects based on LiBai's infrastructure.

High Efficiency

Efficiency in LiBai is a core focus, ensuring that model training and related processes are optimized for speed and resource utilization.

Installation and Usage

Users can quickly get started with LiBai by following the installation instructions provided. A dedicated guide on getting started is also available for basic usage scenarios.

Documentation

Comprehensive documentation is available, providing full API references and tutorials for users at all levels. This can be accessed through LiBai's documentation portal.

Recent Updates

The latest release, Beta 0.3.0, came out on March 11, 2024. This update includes several notable features such as support for mock transformers and model evaluation tools like lm-evaluation-harness. User experience has also been improved. New models supported include BLOOM, ChatGLM, Couplets, DALLE2, Llama2, MAE, and Stable_Diffusion, across various parallel training methods.

Community and Contributions

The LiBai project welcomes contributions from the community. Interested individuals can refer to the project's CONTRIBUTING guide for more information on how to get involved.

Licensing and Citation

LiBai is released under the Apache 2.0 license. Researchers utilizing LiBai in their work are encouraged to cite the project using the provided BibTeX entry.

Join the community and explore the full potential of large-scale model training with LiBai!