UltraFastBERT Project Overview
Introduction
UltraFastBERT is an innovative repository centered around the research paper "Exponentially Faster Language Modelling", which is accessible via the provided arXiv link. This project offers a new approach to language modeling, building upon existing BERT architectures with advanced optimizations and configurations for enhanced performance.
Project Structure
The project is organized into several key directories, each playing a crucial role in the development and benchmarking of the UltraFastBERT model:
-
Training Directory: This folder includes a modified version of the crammedBERT repository from October 2023. The modifications enable the use of Feature-Fusion Finetuning (FFFs), a special technique designed to selectively engage neural features during training. This enhancement simulates the use of traditional feedforward layers but is aimed at improving performance without speed loss.
-
Benchmark CPU Directory: Within this folder, users can find C++ code, leveraging Intel MKL 2023.2.0, to execute accelerated CPU-based versions of FFF inference. Also, baseline implementations for traditional feedforward (FF) layers are provided as a comparison.
-
Benchmark PyTorch Directory: This contains C++ code implementations that are designed to work natively with PyTorch. Here users will find both the "Native fused" and the "PyTorch Batch Matrix Multiplication (BMM)" setups for executing FF and FFF inferences.
-
Benchmark CUDA Directory: This directory offers C++/CUDA kernel code for executing naive CUDA implementations of FF and FFF, showcasing performance capabilities on GPU hardware.
Utilizing UltraFastBERT
UltraFastBERT provides pre-trained configurations and weights, specifically for the UltraFastBERT-1x11-long version, which are readily available on HuggingFace. These can be seamlessly integrated into applications using transformers
library tools such as AutoTokenizer
and AutoModelForMaskedLM
.
Quickstart Guide
-
First, establish a new Python/conda environment to avoid any conflicts with existing crammed projects. Ensure that you are using the correct
training
folder provided by the UltraFastBERT repository. -
Navigate to the
training
directory and install the necessary packages by running:pip install .
-
Create a new Python script called
minimal_example.py
. -
Add the following code to your script to load and use the UltraFastBERT model:
import cramming from transformers import AutoModelForMaskedLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("pbelcak/UltraFastBERT-1x11-long") model = AutoModelForMaskedLM.from_pretrained("pbelcak/UltraFastBERT-1x11-long") text = "Replace me by any text you'd like." encoded_input = tokenizer(text, return_tensors='pt') output = model(**encoded_input)
-
Execute the script with:
python minimal_example.py
Reproducing Results
Training and Finetuning:
To replicate the project's training and finetuning results, follow the detailed instructions in the training
folder's README file.
CPU Benchmarking:
To assess CPU performance, navigate to benchmark_cpu
. Compilation and execution are straightforward, particularly on Windows using Visual Studio 2022 with the Intel oneAPI extension.
PyTorch Benchmarking:
For PyTorch results, execute python main.py
within the benchmark_pytorch
directory. The output is conveniently stored in a SQLite database for analysis.
CUDA Setup:
Using CUDA requires the installation of the CUDA Toolkit. Once set up, navigate to benchmark_cuda
and run python setup.py install
to compile the necessary CUDA extensions for use.
Conclusion
UltraFastBERT emerges as a breakthrough in language modeling, introducing efficient training and inference methods. By extending crammedBERT with novel techniques and providing comprehensive benchmarking tools, this project sets a new standard for language model speed and performance. Researchers and developers can now delve into its structure, reproduce results, and leverage its advancements for their own applications.