en

#long context

This project demonstrates how established methods can expand language models to manage contexts as long as 1 million tokens using efficient strategies such as sequence parallelism, Deepspeed zero3 offload, and flash attention. It delivers comprehensive training scripts, supports various parallel approaches, and highlights significant improvements in both perplexity and 'needle-in-a-haystack' evaluations for Llama2 models.

Terms of Use Privacy Policy Advertising Services

Feedback Email: [email protected]