attention-gym - Discover Tools for Experimenting with FlexAttention Mechanisms

📖 Overview of Attention Gym

Attention Gym is a comprehensive collection of tools and examples designed to help users work with FlexAttention in PyTorch. The project serves as a playground for experimenting with different attention mechanisms. It aims to assist researchers and developers in exploring and optimizing attention models by providing various implementations, performance comparisons, and utility functions.

🎯 Features

Various Attention Mechanisms: Attention Gym includes implementations of multiple attention models using the FlexAttention API.
Utility Functions: It offers tools to create and combine attention masks, which are crucial for model performance.
Real-world Examples: There are practical examples available to demonstrate how the FlexAttention can be applied in real-world scenarios.

🚀 Getting Started

Prerequisites

To work with Attention Gym, ensure that you have PyTorch version 2.5 or higher installed.

Installation

To get started with Attention Gym, clone the repository and install the package:

git clone https://github.com/pytorch-labs/attention-gym.git
cd attention-gym
pip install .

💻 Usage

There are two main ways to use Attention Gym in projects:

Running Example Scripts: Users can execute files in the project directly to see how they function. For example, running the following command will demonstrate the functionality of a specific script:
```
python attn_gym/masks/document_mask.py
```
These scripts often include visualizations to help users understand different attention mechanisms.

Importing into Your Projects: Attention Gym components can be integrated into personal projects by importing them. Here is a simple usage example:

from torch.nn.attention.flex_attention import flex_attention, create_block_mask
from attn_gym.masks import generate_sliding_window

# Applying imported functions
sliding_window_mask_mod = generate_sliding_window(window_size=1024)
block_mask = create_block_mask(sliding_window_mask_mod, 1, 1, S, S, device=device)
out = flex_attention(query, key, value, block_mask=block_mask)

For more detailed examples of how FlexAttention can be used in practical situations, the project includes an examples/ directory showcasing complete implementations.

Note on Development

Attention Gym is actively evolving, which means that backward compatibility isn't guaranteed. It's advised that users pin to a specific version and thoroughly review any changes when upgrading.

📁 Structure

Attention Gym is well-organized to facilitate easy exploration:

attn_gym.masks: This directory includes examples for creating block masks.
attn_gym.mods: This directory provides examples for creating score modifications.
examples/: Here, end-to-end implementations using FlexAttention are available.

🛠️ Development

To set up the development environment, install the necessary requirements with:

pip install -e ".[dev]"

Additionally, pre-commit hooks can be installed using:

pre-commit install

🤝 Contributing

Contributions to Attention Gym are welcome! The project encourages the addition of new masks or score modifications. Here’s how contributors can get involved:

Develop a new file in attn_gym/masks/ for mask modifications or attn_gym/mods/ for score modifications.
Implement the function and include a simple main function to demonstrate its usage.
Ensure to update the attn_gym/*/__init__.py files to include the new function.
Optionally, contributors can add an example in the examples/ directory showcasing their new function.

For more detailed contribution guidelines, refer to the CONTRIBUTING.md file in the repository.

⚖️ License

Attention Gym is available under the BSD 3-Clause License. This ensures the project remains open and accessible to the community, encouraging ongoing collaboration and development.