mamba-minimal - Simple and comprehensible PyTorch Mamba architecture for sequence modeling

Introduction to mamba-minimal

mamba-minimal is a straightforward and minimalistic project that aims to implement Mamba using just a single PyTorch file. It is designed to be simple yet effective, offering essential functionalities for sequence modeling without the complexities inherent in more extensive implementations.

Key Features

mamba-minimal boasts several notable features:

Consistency with Official Implementation: It provides numerical outputs that are equivalent to the official Mamba implementation for both forward and backward computational passes. This ensures that users can rely on it for accurate results.
Simplicity and Readability: The code is both simplified and thoroughly annotated, making it easier to understand and work with. This enhances accessibility for those who wish to learn about or utilize the Mamba architecture in a more manageable format.

What It Doesn't Include

The simplicity of mamba-minimal comes with certain exclusions:

Lack of Speed Optimization: Unlike the official Mamba implementation, which is highly optimized for performance, mamba-minimal sacrifices these optimizations in favor of clarity and ease of understanding. This makes it less suitable for performance-critical applications where speed is a primary concern.
Parameter Initialization: Proper parameter initialization is not included, though it's noted that such features could be added without significantly affecting the readability of the code.

Demonstration

For those interested in seeing mamba-minimal in action, there is a demonstration available in the demo.ipynb file. Here's a brief code snippet showing how to use the Mamba model with a tokenizer:

from model import Mamba
from transformers import AutoTokenizer

model = Mamba.from_pretrained('state-spaces/mamba-370m')
tokenizer = AutoTokenizer.from_pretrained('EleutherAI/gpt-neox-20b')

generate(model, tokenizer, 'Mamba is the')

This generates a sequence starting with "Mamba is the...", showcasing the model's capabilities in sequence modeling.

References

mamba-minimal is based on the Mamba architecture, which is detailed in the paper titled "Mamba: Linear-Time Sequence Modeling with Selective State Spaces," by Albert Gu and Tri Dao. More information on this architecture, including its fully optimized official implementation, can be found at the official Mamba GitHub repository.

In summary, mamba-minimal provides an accessible entry point into Mamba's sequence modeling by focusing on clarity and functionality over performance optimization, making it an excellent educational tool or starting point for further development.