MatMul-Free Language Model
MatMul-Free LM is an innovative language model architecture that eliminates the need for matrix multiplication operations, commonly known as MatMul. This project provides an implementation of MatMul-Free LM, which is compatible with the 🤗 Transformers library, making it accessible and easy to use for developers worldwide.
Introduction
The MatMul-Free LM project is designed to reduce computational requirements by removing the necessity for matrix multiplications, a fundamental operation in traditional machine learning models. This novel approach can lead to more efficient models that maintain high performance.
Scaling Law
The project explores how scaling laws apply to models with different parameters: 370 million, 1.3 billion, and 2.7 billion, comparing the MatMul-Free architecture with the existing Transformer++ model. A fascinating outcome is the steeper scaling projection of the MatMul-Free LM, which indicates enhanced efficiency in utilizing additional computational resources to boost performance.
Installation
To work with MatMul-Free LM, certain software prerequisites must be satisfied:
- PyTorch >= 2.0
- Triton >= 2.2
- einops
Installation can be easily completed with the following command:
pip install -U git+https://github.com/ridgerchu/matmulfreellm
Usage
Pre-trained Model Zoo
The project offers several pre-trained models with various sizes, layers, hidden dimensions, and trained tokens:
Model Implementation
For developers, the implementation of these models is designed to be compatible with the 🤗 Transformers library. Here is how to initialize a model:
from mmfreelm.models import HGRNBitConfig
from transformers import AutoModel
config = HGRNBitConfig()
AutoModel.from_config(config)
This compatibility with the Hugging Face library ensures a streamlined experience for initializing and using the models.
Text Generation
Once a model is pre-trained, it can be used for text generation via the 🤗 text generation APIs. Here's an example to generate text:
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"
import mmfreelm
from transformers import AutoModelForCausalLM, AutoTokenizer
name = '' # Specify the model name
tokenizer = AutoTokenizer.from_pretrained(name)
model = AutoModelForCausalLM.from_pretrained(name).cuda().half()
input_prompt = "In a shocking finding, scientist discovered a herd of unicorns living in a remote, "
input_ids = tokenizer(input_prompt, return_tensors="pt").input_ids.cuda()
outputs = model.generate(input_ids, max_length=32, do_sample=True, top_p=0.4, temperature=0.6)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])
This snippet showcases how easily developers can leverage MatMul-Free LM to innovate with text generation.
Citation
For individuals using this repository in their research or projects, it is advisable to cite the preprint:
@article{zhu2024scalable,
title={Scalable MatMul-free Language Modeling},
author={Zhu, Rui-Jie and Zhang, Yu and Sifferman, Ethan and Sheaves, Tyler and Wang, Yiqiao and Richmond, Dustin and Zhou, Peng and Eshraghian, Jason K},
journal={arXiv preprint arXiv:2406.02528},
year={2024}
}
This project represents a step forward in the efficiency and capability of language models, promising improvements in both computation and performance.