๐ Overview of Attention Gym
Attention Gym is a comprehensive collection of tools and examples designed to help users work with FlexAttention in PyTorch. The project serves as a playground for experimenting with different attention mechanisms. It aims to assist researchers and developers in exploring and optimizing attention models by providing various implementations, performance comparisons, and utility functions.
๐ฏ Features
- Various Attention Mechanisms: Attention Gym includes implementations of multiple attention models using the FlexAttention API.
- Utility Functions: It offers tools to create and combine attention masks, which are crucial for model performance.
- Real-world Examples: There are practical examples available to demonstrate how the FlexAttention can be applied in real-world scenarios.
๐ Getting Started
Prerequisites
To work with Attention Gym, ensure that you have PyTorch version 2.5 or higher installed.
Installation
To get started with Attention Gym, clone the repository and install the package:
git clone https://github.com/pytorch-labs/attention-gym.git
cd attention-gym
pip install .
๐ป Usage
There are two main ways to use Attention Gym in projects:
-
Running Example Scripts: Users can execute files in the project directly to see how they function. For example, running the following command will demonstrate the functionality of a specific script:
python attn_gym/masks/document_mask.py
These scripts often include visualizations to help users understand different attention mechanisms.
-
Importing into Your Projects: Attention Gym components can be integrated into personal projects by importing them. Here is a simple usage example:
from torch.nn.attention.flex_attention import flex_attention, create_block_mask from attn_gym.masks import generate_sliding_window # Applying imported functions sliding_window_mask_mod = generate_sliding_window(window_size=1024) block_mask = create_block_mask(sliding_window_mask_mod, 1, 1, S, S, device=device) out = flex_attention(query, key, value, block_mask=block_mask)
For more detailed examples of how FlexAttention can be used in practical situations, the project includes an examples/
directory showcasing complete implementations.
Note on Development
Attention Gym is actively evolving, which means that backward compatibility isn't guaranteed. It's advised that users pin to a specific version and thoroughly review any changes when upgrading.
๐ Structure
Attention Gym is well-organized to facilitate easy exploration:
attn_gym.masks
: This directory includes examples for creating block masks.attn_gym.mods
: This directory provides examples for creating score modifications.examples/
: Here, end-to-end implementations using FlexAttention are available.
๐ ๏ธ Development
To set up the development environment, install the necessary requirements with:
pip install -e ".[dev]"
Additionally, pre-commit hooks can be installed using:
pre-commit install
๐ค Contributing
Contributions to Attention Gym are welcome! The project encourages the addition of new masks or score modifications. Hereโs how contributors can get involved:
- Develop a new file in
attn_gym/masks/
for mask modifications orattn_gym/mods/
for score modifications. - Implement the function and include a simple main function to demonstrate its usage.
- Ensure to update the
attn_gym/*/__init__.py
files to include the new function. - Optionally, contributors can add an example in the
examples/
directory showcasing their new function.
For more detailed contribution guidelines, refer to the CONTRIBUTING.md
file in the repository.
โ๏ธ License
Attention Gym is available under the BSD 3-Clause License. This ensures the project remains open and accessible to the community, encouraging ongoing collaboration and development.