matmulfreellm - MatMul-Free Language Model: A New Approach to Eliminating Matrix Multiplication in AI Models

MatMul-Free Language Model

MatMul-Free LM is an innovative language model architecture that eliminates the need for matrix multiplication operations, commonly known as MatMul. This project provides an implementation of MatMul-Free LM, which is compatible with the 🤗 Transformers library, making it accessible and easy to use for developers worldwide.

Introduction

The MatMul-Free LM project is designed to reduce computational requirements by removing the necessity for matrix multiplications, a fundamental operation in traditional machine learning models. This novel approach can lead to more efficient models that maintain high performance.

Scaling Law

The project explores how scaling laws apply to models with different parameters: 370 million, 1.3 billion, and 2.7 billion, comparing the MatMul-Free architecture with the existing Transformer++ model. A fascinating outcome is the steeper scaling projection of the MatMul-Free LM, which indicates enhanced efficiency in utilizing additional computational resources to boost performance.

Installation

To work with MatMul-Free LM, certain software prerequisites must be satisfied:

PyTorch >= 2.0
Triton >= 2.2
einops

Installation can be easily completed with the following command:

pip install -U git+https://github.com/ridgerchu/matmulfreellm

Usage

Pre-trained Model Zoo

The project offers several pre-trained models with various sizes, layers, hidden dimensions, and trained tokens:

Model Size	Layer	Hidden dimension	Trained tokens
370M	24	1024	15B
1.3B	24	2048	100B
2.7B	32	2560	100B

Model Implementation

For developers, the implementation of these models is designed to be compatible with the 🤗 Transformers library. Here is how to initialize a model:

from mmfreelm.models import HGRNBitConfig
from transformers import AutoModel

config = HGRNBitConfig()
AutoModel.from_config(config)

This compatibility with the Hugging Face library ensures a streamlined experience for initializing and using the models.

Text Generation

Once a model is pre-trained, it can be used for text generation via the 🤗 text generation APIs. Here's an example to generate text:

import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"
import mmfreelm
from transformers import AutoModelForCausalLM, AutoTokenizer

name = ''  # Specify the model name
tokenizer = AutoTokenizer.from_pretrained(name)
model = AutoModelForCausalLM.from_pretrained(name).cuda().half()

input_prompt = "In a shocking finding, scientist discovered a herd of unicorns living in a remote, "
input_ids = tokenizer(input_prompt, return_tensors="pt").input_ids.cuda()
outputs = model.generate(input_ids, max_length=32, do_sample=True, top_p=0.4, temperature=0.6)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])

This snippet showcases how easily developers can leverage MatMul-Free LM to innovate with text generation.

Citation

For individuals using this repository in their research or projects, it is advisable to cite the preprint:

@article{zhu2024scalable,
title={Scalable MatMul-free Language Modeling},
author={Zhu, Rui-Jie and Zhang, Yu and Sifferman, Ethan and Sheaves, Tyler and Wang, Yiqiao and Richmond, Dustin and Zhou, Peng and Eshraghian, Jason K},
journal={arXiv preprint arXiv:2406.02528},
year={2024}
}

This project represents a step forward in the efficiency and capability of language models, promising improvements in both computation and performance.