low-bit-optimizers - Improve Neural Network Efficiency with Low-bit Optimizers

Introduction to Low-bit Optimizers

Low-bit Optimizers is an innovative project aimed at creating more memory-efficient neural network optimizers by reducing the bitwidth of optimizer states from the conventional 32-bit floating points to just 4-bit. This groundbreaking approach is derived from the exhaustive analysis of first and second-order momentums within neural network training processes.

The Need for Low-bit Optimizers

Training large neural networks requires substantial memory resources, primarily due to the optimizer states stored during the training. Historically, the reduction of these states to lower bitwidths has already shown promise in decreasing memory consumption, with 8-bit being the lowest achieved before this project. By advancing the bitwidth reduction to 4-bit, this project offers a significant leap forward, enabling more extensive models to be trained within limited memory resources without compromising performance.

Technical Insights

Low-bit Optimizers employs sophisticated techniques to handle the complex outlier patterns present in momentums. Traditional block-wise quantization methods fall short here, so this project introduces a solution with smaller block sizes, considering both row-wise and column-wise data for a more accurate quantization process. Additionally, the team tackled and resolved the zero point problem encountered in second-order momentum quantization by excluding the zero point using a linear quantizer.

Performance and Evaluation

The 4-bit optimizer is rigorously evaluated across numerous benchmarks including, but not limited to, natural language understanding, machine translation, image classification, and instruction tuning. The results consistently indicate that the 4-bit optimizer performs on par with full-precision optimizers while requiring significantly less memory.

Getting Started with Low-bit Optimizers

Installation

To work with the Low-bit Optimizers, ensure you have Python version 3.7 or newer, CUDA 11.0 or newer, and torch 1.13.0 or newer. Installation steps are straightforward:

git clone https://github.com/thu-ml/low-bit-optimizers.git
pip install -v -e .

Usage

Using 4-bit Optimizers

Adopting 4-bit optimizers in your projects involves replacing existing optimizers with one of the 4-bit variants—4-bit AdamW, 4-bit Factor, or 4-bit AdamW (fused). Here’s a basic example:

import lpmm

# Replace the standard AdamW optimizer
optimizer = lpmm.optim.AdamW(model.parameters(), lr=1e-3, betas=(0.9, 0.999))

These optimizers are compatible with Adam (AdamW) and SGD at present.

Customizing Quantization Settings

For non-fused optimizers, users can alter the quantization configuration by crafting a new configuration file and referencing its path. Example configuration files are present in the lpmm/configs directory, with default settings in lpmm/configs/default.yml. The configuration for fused optimizers remains fixed.

To apply a custom configuration:

config_path = "configs/default.yml"  # your configuration file path
optimizer = lpmm.optim.AdamW(model.parameters(), lr=1e-3, betas=(0.9, 0.999), qconfig=config_path)

Common parameters include SCALE_TYPE, QUANT_TYPE, and BITS (with recommended settings being 4 or 8).

Overriding Quantization for Certain Parameters

For specific parameters requiring higher precision (32-bit), without being quantized, the optimizer provides an override_quantize_enable method:

optimizer = lpmm.optim.AdamW(model.parameters(), lr=1e-3, betas=(0.9, 0.999))
optimizer.override_quantize_enable(module, param_name, enable=False)

This allows tailored optimization approaches where specific parameters remain at full precision if necessary.

In conclusion, Low-bit Optimizers offers a groundbreaking way to train neural networks more efficiently without sacrificing performance, expanding the boundaries of what can be achieved within existing memory limits.