EasyContext

Overview

EasyContext is an innovative project designed to explore the capabilities of extending language models' context length to 1 million tokens using minimal hardware resources. The project brings together a variety of existing techniques to achieve this goal, making it accessible for those outside large companies with limited computing power. It demystifies the process of scaling context length and provides practical training recipes for language models.

Recent Updates

As of June 25th, EasyContext has become a part of LongVA, a project focused on long-context vision language models.
In May, several updates were made, including the addition of Ulysses and new features in the NIAH evaluation script.

Goals

EasyContext aims to prove that handling long context lengths in language models is achievable without needing vast computational resources. It uses techniques like full finetuning, full attention, and full sequence length to train models efficiently.

Featured Techniques

EasyContext leverages several advanced strategies to optimize training and memory usage:

Sequence Parallelism: Distributes data across multiple sequences to enhance efficiency.
Deepspeed Zero3 Offload: Reduces memory consumption by offloading memory to more efficient storages like CPU memory.
Flash Attention and Kernel: Improves model computation by accelerating attention mechanisms.
USP (Unified Sequence Parallelism): Aids in seamless parallel training through the use of Ulysses implementation.
Activation Checkpointing: Saves memory during training by storing activations selectively.

Training and Performance

The training process is streamlined with scripts under 200 lines, demonstrating efficiency in design. Testing has shown that models can be trained to a context length of 700K using 8 A100 GPUs or 1M with 16 A100 GPUs. The training strategies support multiple approaches including Ring Attention and Dist Flash Attention.

Results

EasyContext has shown promising results in "needle-in-a-haystack" challenges and perplexity testing. The models perform exceptionally well on long documents, reaching context lengths of up to 600K tokens.

Installation and Usage

To install EasyContext, users need Python 3.10.0 and PyTorch 2.4.0. The repository includes straightforward instructions for setting up the environment. Users can employ EasyContext in their projects by integrating the sequence parallel methods into their existing codebases.

Training and Speed

Switching from data parallel to ring attention has shown minor throughput reductions. However, increasing sequence length reveals the computational intensity due to the quadratic nature of self-attention, thus elevating the importance of simplifying processes to maintain efficiency.

Roadmap and Future Plans

EasyContext looks forward to implementing further updates, including the possibility of instruction tuning and additional model configurations, contingent on available resources. The project solicits community collaboration to help advance these goals.

Implications Beyond Language Models

While primarily focused on language models, EasyContext's ability to manage substantial context lengths has positive implications for video generation models, potentially handling up to 1500 frames seamlessly in one go.

Acknowledgements

This project draws from various renowned papers and contributions from researchers and developers in the field, indicating broad collaboration and community support.

Conclusion

EasyContext showcases a powerful yet accessible approach to scaling language model contexts, making it a valuable resource for researchers and developers keen on exploring longer context lengths without access to extensive computational hardware.