mistral-finetune - Memory-Saving Techniques for Fine-Tuning Mistral Models

Mistral-finetune Project Overview

Mistral-finetune is a streamlined and effective solution for fine-tuning Mistral's models, prioritizing memory-efficiency. It leverages a training technique called Low-Rank Adaptation (LoRA), which smartly freezes most model weights and only fine-tunes a small percentage (1-2%) of additional weights. This approach helps in achieving high performance without hogging too much memory.

Key Highlights

Hardware Optimization: The code is optimized for use on powerful GPUs, particularly A100 or H100, and works best in settings where multiple GPUs are used on a single node. However, for smaller models, such as the 7B, a single GPU can handle the task.
Usability Focus: The project’s primary aim is to offer a straightforward, guided introduction for fine-tuning Mistral models. It focuses on a particular style of data formatting, keeping the process simple and leaving advanced customizations to other projects.

Recent Updates

Compatibility Enhancements: Recently, both Mistral Large v2 and Mistral Nemo have been made compatible with mistral-finetune. This allows for the fine-tuning of these larger models, albeit with adjustments like increased memory requirements and learning rate modifications.

Getting Started

Installation: Begin by cloning the Mistral-finetune repository and installing the necessary dependencies. This sets the stage for downloading and subsequently fine-tuning various official Mistral models.

cd $HOME && git clone https://github.com/mistralai/mistral-finetune.git
cd mistral-finetune
pip install -r requirements.txt

Model Download: Users can choose from a variety of Mistral models to fine-tune, each coming with specific links and checksum values to ensure authenticity. E.g., the 7B Base V3 model can be obtained and prepared for use with a few simple commands.

Data Preparation

For effective model training, proper data formatting is crucial. Datasets must be in JSONL format and can be prepared as either pretrain or instruct data, with specific requirements outlined for each. Instruction following and function calling data are supported, each with its unique format.

Validating and Starting Training

Before training, it's imperative to verify the dataset’s formatting using a provided script. This step ensures a seamless training experience and provides an estimate on the expected training time. Once verified, users can dive into training, adjusting parameters like the number of training steps for optimal results.

Training on smaller datasets like Ultra-chat might take around 30 minutes on high-end hardware like a 8xH100 GPU setup, while more complex function calling datasets might take slightly longer.

Customization and Advancement

The mistral-finetune project is built to be adaptable, with example configurations providing reasonable defaults for learning rate and other training parameters. Users are encouraged to tweak these settings based on their specific needs or hardware capabilities to get the best possible outcomes.

In summary, Mistral-finetune offers a robust ecosystem for fine-tuning neural language models, balancing ease of use with advanced capabilities. This project serves as a vital tool for researchers and developers looking to enhance Mistral models efficiently.