UltraFastBERT - Enhance Language Modelling Speed with Selective Neuron Activation

UltraFastBERT Project Overview

Introduction

UltraFastBERT is an innovative repository centered around the research paper "Exponentially Faster Language Modelling", which is accessible via the provided arXiv link. This project offers a new approach to language modeling, building upon existing BERT architectures with advanced optimizations and configurations for enhanced performance.

Project Structure

The project is organized into several key directories, each playing a crucial role in the development and benchmarking of the UltraFastBERT model:

Training Directory: This folder includes a modified version of the crammedBERT repository from October 2023. The modifications enable the use of Feature-Fusion Finetuning (FFFs), a special technique designed to selectively engage neural features during training. This enhancement simulates the use of traditional feedforward layers but is aimed at improving performance without speed loss.
Benchmark CPU Directory: Within this folder, users can find C++ code, leveraging Intel MKL 2023.2.0, to execute accelerated CPU-based versions of FFF inference. Also, baseline implementations for traditional feedforward (FF) layers are provided as a comparison.
Benchmark PyTorch Directory: This contains C++ code implementations that are designed to work natively with PyTorch. Here users will find both the "Native fused" and the "PyTorch Batch Matrix Multiplication (BMM)" setups for executing FF and FFF inferences.
Benchmark CUDA Directory: This directory offers C++/CUDA kernel code for executing naive CUDA implementations of FF and FFF, showcasing performance capabilities on GPU hardware.

Utilizing UltraFastBERT

UltraFastBERT provides pre-trained configurations and weights, specifically for the UltraFastBERT-1x11-long version, which are readily available on HuggingFace. These can be seamlessly integrated into applications using transformers library tools such as AutoTokenizer and AutoModelForMaskedLM.

Quickstart Guide

First, establish a new Python/conda environment to avoid any conflicts with existing crammed projects. Ensure that you are using the correct training folder provided by the UltraFastBERT repository.
Navigate to the training directory and install the necessary packages by running:
```
pip install .
```
Create a new Python script called minimal_example.py.

Add the following code to your script to load and use the UltraFastBERT model:

import cramming
from transformers import AutoModelForMaskedLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("pbelcak/UltraFastBERT-1x11-long")
model = AutoModelForMaskedLM.from_pretrained("pbelcak/UltraFastBERT-1x11-long")

text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)

Execute the script with:
```
python minimal_example.py
```

Reproducing Results

Training and Finetuning:

To replicate the project's training and finetuning results, follow the detailed instructions in the training folder's README file.

CPU Benchmarking:

To assess CPU performance, navigate to benchmark_cpu. Compilation and execution are straightforward, particularly on Windows using Visual Studio 2022 with the Intel oneAPI extension.

PyTorch Benchmarking:

For PyTorch results, execute python main.py within the benchmark_pytorch directory. The output is conveniently stored in a SQLite database for analysis.

CUDA Setup:

Using CUDA requires the installation of the CUDA Toolkit. Once set up, navigate to benchmark_cuda and run python setup.py install to compile the necessary CUDA extensions for use.

Conclusion

UltraFastBERT emerges as a breakthrough in language modeling, introducing efficient training and inference methods. By extending crammedBERT with novel techniques and providing comprehensive benchmarking tools, this project sets a new standard for language model speed and performance. Researchers and developers can now delve into its structure, reproduce results, and leverage its advancements for their own applications.