mergekit - Optimize and Merge Language Models Using CPU and GPU

Project Introduction: Mergekit

mergekit is an innovative toolkit designed for merging pre-trained language models. It stands out by employing an out-of-core approach, which is particularly beneficial in situations where resources are limited. This toolkit provides the flexibility to execute merges entirely on a CPU, or with minimal acceleration requiring just 8 GB of VRAM. Its wide-ranging support includes various merging algorithms, with new ones being integrated as they become available.

Key Features

Versatile Model Support: mergekit can handle a variety of models such as Llama, Mistral, GPT-NeoX, and StableLM, among others.
Extensive Merge Methods: Offers a substantial selection of merge methods to accommodate different needs.
Scalable Execution: Allows for execution on GPU or CPU, catering to different hardware configurations.
Efficient Memory Usage: Utilizes lazy loading of tensors to minimize memory consumption.
Interpolated Gradients: Implements parameter value interpolation inspired by Gryphe's BlockMerge_Gradient script.
Frankenmerging: Allows for the piecewise assembly of language models from their layers.
Mixture of Experts Merging: Supports this approach to combine multiple expert models.
LORA Extraction: Facilitates the extraction of low-rank approximations compatible with model fine-tuning.
Evolutionary Merge Methods: Features merge methods that evolve over time based on user needs.

Enhanced User Experience

mergekit recently introduced a graphical user interface (GUI) within the Arcee platform. This mega-GPU-backed GUI is designed to simplify the merging process, making it accessible to a wider audience. Users can explore and contribute through the Arcee App or visit the Hugging Face Space, which offers limited GPU resources.

Getting Started

To start using mergekit, clone the repository and install the package using pip:

git clone https://github.com/arcee-ai/mergekit.git
cd mergekit
pip install -e .

In case of errors related to setup.py or setup.cfg, ensure your pip version is greater than 21.3.

Using Mergekit

The primary script for mergekit is mergekit-yaml, which requires a YAML configuration file:

mergekit-yaml path/to/your/config.yml ./output-model-directory [--cuda] [--lazy-unpickle] [--allow-crimes] [... other options]

To upload your merged model to the Hugging Face Hub, a basic README.md is generated, which you can modify or use as-is.

Merge Configuration

Merge configurations dictate how models are merged. These configurations are specified in a YAML document and include elements like:

Merge Methods: The strategy used to combine models.
Slices and Models: Determines model slices or entire models used in merging.
Base Model: The foundational model upon which some methods rely.
Parameters: Specify weights and densities, tailored to distinct parts of the configuration.

Tokenizer Source

Choosing a tokenizer source is key to ensuring the correct merging of language models and may involve using a base model, uniting all model tokens, or selecting a tokenizer from a specific model.

Merge Methods

mergekit supports numerous methods each with unique features. From classic linear merges to advanced approaches like Model Stock and DELLA, each method offers specific capabilities and flexibility.

Additional Features

LoRA Extraction: Extract lightweight approximations of models using PEFT-compatible methods.
Cloud Merging: Merge models in the cloud using Arcee’s infrastructure, facilitating large-scale operations.

Conclusion

mergekit is an exceptional toolkit for anyone needing robust, flexible merging capabilities across various language models. Its user-friendly design, coupled with a rich set of features, makes it an essential tool for both researchers and developers in the machine learning community.