LLamaTuner - Toolkit for Efficient Fine-Tuning of Large Language Models on Various GPU Configurations

Project Overview: LLamaTuner

Introduction
LLamaTuner is a comprehensive toolkit designed for the fine-tuning of Large Language Models (LLMs). It supports various models such as Llama3, Phi3, Qwen, and Mistral, among others. LLamaTuner stands out due to its efficiency, flexibility, and feature-rich capabilities, making it a preferred choice for those working with large-scale language models.

Key Features

Efficiency

LLamaTuner is compatible with almost all GPUs, supporting both pre-training and fine-tuning of LLM and VLM (Vision Language Models).
It can fine-tune models as large as 70 billion parameters across multi-node setups.
The toolkit optimizes training throughput using high-performance operators like FlashAttention and Triton kernels.
It integrates seamlessly with DeepSpeed, employing various ZeRO optimization techniques to improve performance.

Flexibility

Compatible with a wide range of LLMs including Llama 3, Llama 2, Mixtral, ChatGLM, Qwen, and Baichuan, to name a few.
Supports Vision Language Models such as LLaVA.
Provides a well-designed data pipeline that accommodates diverse datasets, whether open-source or custom.
Supports various training algorithms like QLoRA, LoRA, and full-parameter fine-tuning, allowing users to choose the best fit for their particular needs.

Full-featured Capabilities

Supports continuous pre-training, instruction fine-tuning, and agent fine-tuning.
Facilitates interactive sessions with large models using pre-defined templates for seamless integration.

Supported Models

LLamaTuner supports a range of models with different sizes and configurations. These include:

Baichuan (7B/13B)
LLaMA (7B/13B/33B/65B)
Falcon (7B/11B/40B/180B)
Various other models such as BLOOM, ChatGLM3, Mistral, and Qwen

Training Approaches Supported

LLamaTuner offers several methodologies for training including:

Pre-Training and Supervised Fine-Tuning
Reward Modeling
PPO, DPO, KTO, and ORPO Training

All these approaches are supported with full-tuning, freeze-tuning, LoRA, and QLoRA options.

Datasets and Data Preprocessing

The toolkit supports numerous datasets available in the Hugging Face datasets library, including Stanford Alpaca, BELLE, Databricks Dolly, and more. It includes data preprocessing tools to assist in refining the datasets for specific applications.

Model Zoo

LLamaTuner offers a Model Zoo where various models, trained using QLoRA, are available for inference and further fine-tuning. These models are accessible through the Hugging Face model hub.

Requirements

Hardware requirements vary based on the training method chosen. Software requirements include Python 3.8 or later, PyTorch 1.13.1 or later, and other optional components like CUDA, DeepSpeed, and Flash-Attn for optimized performance.

Getting Started

To start using LLamaTuner, users can clone the repository and follow the provided scripts to train their models. Instructions for full finetuning, as well as LoRA and QLoRA, are included in the toolkit.

In summary, LLamaTuner is a versatile and robust tool that empowers users to train and deploy large language models efficiently and effectively, equipped with extensive support for various models, datasets, and training techniques.