Project Icon

gpt-neox

Scalable Training of Large Language Models with Enhanced Techniques

Product DescriptionThis repository offers a robust platform for training large-scale autoregressive language models with advanced optimizations and extensive system compatibility. Utilizing NVIDIA's Megatron and DeepSpeed, it supports distributed training through ZeRO and 3D parallelism on various hardware environments like AWS and ORNL Summit. Widely adopted by academia and industry, it provides predefined configurations for popular model architectures and integrates seamlessly with the open-source ecosystem, including Hugging Face libraries and WandB. Recent updates introduce support for AMD GPUs, preference learning models, and improved Flash Attention, promoting continued advancements in large-scale model research.
Project Details