#DeepSpeed

Logo of xtuner
xtuner
XTuner is a versatile toolkit for efficiently fine-tuning both language and vision models, compatible with a variety of GPU platforms. It offers support for models like InternLM, Mixtral, Llama, and VLMs such as LLaVA, ensuring flexibility and scalability. With features such as FlashAttention and Triton kernels, XTuner optimizes training processes and integrates seamlessly with DeepSpeed. It supports several training algorithms, including QLoRA and LoRA, and provides a structured data pipeline that accommodates diverse dataset formats. XTuner models are ready for deployment through systems like LMDeploy and can be evaluated with tools such as OpenCompass. Recent updates include support enhancements and installation guidance.
Logo of CoLLiE
CoLLiE
CoLLiE provides an efficient solution for training large language models with tools for data preprocessing, model fine-tuning, and training metric monitoring. Utilizing DeepSpeed and PyTorch, it supports models like MOSS, InternLM, and LLaMA, facilitating seamless model transitions. CoLLiE's advanced parallel strategies and optimization techniques enhance speed and quality while minimizing costs. Its detailed documentation and customization options cater to users of all experience levels, ensuring efficient model training processes.
Logo of Finetune_LLMs
Finetune_LLMs
The project provides an in-depth guide to fine-tuning Large Language Models (LLMs) using a famous quotes dataset, with support for advanced methods like DeepSpeed, Lora, and QLora. It includes a comprehensive Docker walkthrough to integrate Nvidia-docker for GPU acceleration on Linux systems with modern Nvidia GPUs. The repository offers both updated and legacy code, catering to users with varying familiarity levels, and professional assistance is available if needed.
Logo of accelerate
accelerate
Discover a seamless PyTorch training experience with an innovative library that simplifies multi-device and distributed environments. Easily integrate minimal changes to enable smooth transitions between CPUs, GPUs, and TPUs with mixed precision. A user-friendly CLI aids in configuring and deploying scripts while maintaining control over training loops. Offering exceptional flexibility in scaling machine learning models, it supports frameworks like DeepSpeed and PyTorch FSDP, making it suitable for developers focusing on simplicity and adaptability.
Logo of x-flux
x-flux
This repository provides fine-tuning scripts for the Flux model, utilizing LoRA and ControlNet technologies. With support for high-resolution output through DeepSpeed integration, it enables training of models like the IP-Adapter and various ControlNet versions at 1024x1024 resolution. Necessary tools include Python 3.10+, PyTorch 2.1+, and HuggingFace CLI for downloading models. Testing is supported through ComfyUI, Gradio, and CLI, with a low-memory mode available using Flux-dev-F8 on HuggingFace. Models are under the FLUX.1 Non-Commercial License.
Logo of DeepSpeed
DeepSpeed
DeepSpeed optimizes deep learning training and inference through a sophisticated software suite that boosts speed and scalability. It facilitates the handling of large models and efficient GPU scaling, delivering exceptional system throughput. Utilizing technologies such as ZeRO and parallelism, DeepSpeed significantly reduces latency and increases throughput, streamlining model deployment processes. Its capabilities are instrumental in powering advanced language models, representing a substantial advancement in AI capabilities.
Logo of SimpleTuner
SimpleTuner
Explore a simplified and versatile codebase that enhances image training across diverse datasets. Features like Multi-GPU support, aspect bucketing, and optional EMA enable optimization of various image sizes and aspect ratios. Technologies such as Flux, PixArt Sigma, and Stable Diffusion ensure compatibility with modern GPUs, and DeepSpeed integration facilitates training on hardware with lower VRAM. Access streamlined installations, comprehensive tutorials, and community support for effective image processing.
Logo of LLamaTuner
LLamaTuner
LLamaTuner is a sophisticated toolkit providing efficient and flexible solutions for fine-tuning large language models such as Llama3, Phi3, and Mistral on different GPU setups. It supports both single and multi-node configurations by using features like FlashAttention and Triton kernels to enhance training throughput. The toolkit's compatibility with DeepSpeed enables the use of ZeRO optimization techniques for efficient training. LLamaTuner also offers broad support for various models, datasets, and training methods, making it versatile for open-source and customized data formats. It is well-suited for continuous pre-training, instruction fine-tuning, and chat interactions.
Logo of gpt-neox
gpt-neox
This repository offers a robust platform for training large-scale autoregressive language models with advanced optimizations and extensive system compatibility. Utilizing NVIDIA's Megatron and DeepSpeed, it supports distributed training through ZeRO and 3D parallelism on various hardware environments like AWS and ORNL Summit. Widely adopted by academia and industry, it provides predefined configurations for popular model architectures and integrates seamlessly with the open-source ecosystem, including Hugging Face libraries and WandB. Recent updates introduce support for AMD GPUs, preference learning models, and improved Flash Attention, promoting continued advancements in large-scale model research.
Logo of tevatron
tevatron
Designed for scalable neural retrieval, this toolkit facilitates efficient model training and inference. It integrates parameter-efficient methods such as LoRA and advanced technologies like DeepSpeed and flash attention. Users can access and finetune top pre-trained models, including BGE-Embedding and Instruct-E5, via HuggingFace. Self-contained datasets support various tasks, ensuring efficient training on billion-scale LLMs with GPUs and TPUs. This makes it an excellent choice for researchers seeking to enhance retrieval systems using sophisticated techniques.
Logo of YAYI
YAYI
YAYI leverages refined domain-specific training data to enhance capabilities across media, sentiment analysis, security, finance, and governance. Its continuous development includes input from user feedback, improving its proficiency in Chinese language and analytics. By contributing to the open-source community, YAYI supports the evolution of Chinese AI models. The integration of these features in the latest LLaMA 2 optimized release provides valuable insights for diverse applications, fostering community-driven innovation and reliable performance.
Logo of open-chatgpt
open-chatgpt
Investigate an open-source framework designed for crafting AI models akin to ChatGPT with straightforward processes. Utilize a system that optimizes limited computational capabilities for training through RLHF and advanced distributed training solutions. The project supports expansive language models, incorporates fine-tuning with LoRA, and ensures compatibility with DeepSpeed for enhanced scalability. Access a complete toolkit to create instruction-following models, featuring diverse datasets for multilingual and task-specific uses.
Logo of vall-e
vall-e
Discover an unofficial PyTorch implementation VALL-E, leveraging EnCodec for audio tokenization in text-to-speech synthesis. This project supports experimenting with AR and NAR models, offering customizable configurations and synthesis scripts. While the pretrained model is pending, the framework allows in-depth exploration with GPU-supported DeepSpeed.
Logo of EasyContext
EasyContext
This project demonstrates how established methods can expand language models to manage contexts as long as 1 million tokens using efficient strategies such as sequence parallelism, Deepspeed zero3 offload, and flash attention. It delivers comprehensive training scripts, supports various parallel approaches, and highlights significant improvements in both perplexity and 'needle-in-a-haystack' evaluations for Llama2 models.
Logo of xllm
xllm
The library supports a range of techniques like QLoRA, DeepSpeed, and Gradient Checkpointing to enhance the efficiency of Large Language Model training. It offers features such as checkpoint integration with the HuggingFace Hub and training progress tracking with W&B, allowing for customizable configurations that meet various training requirements. The library accommodates a wide range of models and integrates seamlessly with existing projects, facilitating both rapid prototyping and production-level deployments.
Logo of MPP-LLaVA
MPP-LLaVA
This project enables exploration into advanced multimodal communication and processing, supporting image and video dialogues. It leverages QwenLM for seamless multi-round conversations, offering efficient solutions for complex interactions via pipeline and model parallelism. The framework is optimized for training and inference on multiple GPUs with DeepSpeed implementations and provides open-source pre-trained and SFT weights for diverse AI applications.
Logo of happy-transformer
happy-transformer
Happy Transformer facilitates the fine-tuning and inference of NLP Transformer models. Version 3.0.0 brings features like DeepSpeed for training efficiency, Apple's MPS support, and WandB for monitoring. It includes automated data partitioning for training and evaluation, and supports direct model uploads to Hugging Face Model Hub. Available tasks include text generation, classification, word prediction, and more. Simple installation and tutorials are provided.
Logo of DeeperSpeed
DeeperSpeed
DeeperSpeed is a tailored version of Microsoft's DeepSpeed library, focusing on optimizing EleutherAI’s GPT-NeoX. It provides distinct versioned releases, keeping older versions for compatibility, while adopting the newest updates of DeepSpeed for improved performance and continued development.