InternEvo - Improve Training Efficiency with the InternEvo Lightweight Framework

InternEvo: An Overview

InternEvo is a transformative open-source training framework designed for efficient model pre-training and fine-tuning. It offers flexibility in training environments, requiring minimal dependencies. From vast GPU clusters to a single GPU, InternEvo excels by providing remarkable performance optimizations.

Latest Updates

August 2024: InternEvo now supports streaming datasets in the Huggingface format, accompanied by extensive data flow instructions.
April 2024: Training models on the NPU-910B cluster is now supported.
January 2024: For a deeper exploration of models within the InternLM series, visit the InternLM site.

Key Features

Pre-training Efficiency: InternEvo achieves close to 90% training acceleration efficiency across 1024 GPUs, thanks to its unified codebase that supports scalable training from single to multiple GPUs.

Model Excellence: This framework powers multiple large language models like the InternLM-7B and InternLM-20B series, surpassing other prominent open-source language models such as LLaMA.

Installation Guide

To begin using InternEvo, you need to install specific versions of torch, torchvision, torchaudio, and torch-scatter:

pip install --extra-index-url https://download.pytorch.org/whl/cu118 torch==2.1.0+cu118 torchvision==0.16.0+cu118 torchaudio==2.1.0+cu118
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.1.0+cu118.html

Then, install InternEvo:

pip install InternEvo

If needed, install flash-attention v2.2.1 to enhance training speed:

pip install flash-attn==2.2.1

For more comprehensive installation instructions, visit the Install Tutorial.

Getting Started

Training Script: Access a ready-to-use training script such as the train.py for guidance. Detailed instructions are available in the Training Tutorial.

Data Preparation: Collect your datasets, as shown with the Huggingface roneneldan/TinyStories dataset. Setup involves downloading datasets and tokenizer files locally.

Example:

huggingface-cli download --repo-type dataset --resume-download "roneneldan/TinyStories" --local-dir "/mnt/petrelfs/hf-TinyStories"

Configure your environment:

TRAIN_FOLDER = "/mnt/petrelfs/hf-TinyStories"
data = dict(
    type="streaming",
    tokenizer_path="/mnt/petrelfs/hf-internlm2-tokenizer",
)

Initiate Training:

On Slurm: Using 2 nodes and 16 GPUs:

$ srun -p internllm -N 2 -n 16 --ntasks-per-node=8 --gpus-per-task=1 python train.py --config ./configs/7B_sft.py

With Torch: Using 1 node and 8 GPUs:

$ torchrun --nnodes=1 --nproc_per_node=8 train.py --config ./configs/7B_sft.py --launcher "torch"

Features and Tools

Data Options: Tokenized and streaming datasets are supported.

Model Variety: Provides diverse internal models like InternLM and models from renowned architectures like LLaMA2.

Parallel Processing: Incorporates innovative parallel processing methods like ZeRO, Pipeline Parallel, and others for optimized computation scalability.

Tool Sets: Includes conversion tools to and from Huggingface formats, as well as data tokenizer utilities.

Support and Community

InternEvo is a collaborative effort from the Shanghai AI Laboratory and various institutions and companies. Contributions and community engagement are highly encouraged to foster an evolving ecosystem. The project appreciates all community feedback and open-source support.

For a deeper technical dive, see the System Architecture document or Contribution Guidelines.

Acknowledgements

The InternEvo project owes much to contributions from a variety of sources, particularly acknowledging open-source platforms like flash-attention and ColossalAI.

With InternEvo, developers can efficiently test and refine their machine learning models, propelling both innovation and community-driven advancements in language modeling.