InfiniTransformer
InfiniTransformer provides an unofficial implementation of Infi-attention in PyTorch and Hugging Face Transformers, designed to optimize infinite context transformers like Llama3 and Gemma. It offers two types of implementation: model-wise, which requires overrides and custom training for drastically reduced memory use, and attention-layer focused, maintaining compatibility with standard HF Trainer. Key features include efficient memory use for large sequence lengths and practical guidelines, with training script examples for MiniPile and WikiText datasets, making InfiniTransformer a groundbreaking solution for scalable, context-rich transformer models.