DiT-MoE
DiT-MoE offers a scalable and efficient solution with its PyTorch implementation of Sparse Diffusion Transformers, designed to handle up to 16 billion parameters. Featuring advanced techniques like rectified flow-based training and expert routing, it aids in reducing computational load while enhancing model accuracy and convergence, with practical support from DeepSpeed. This project provides valuable assets such as pre-trained models and detailed scripts, catering to researchers requiring flexible and high-performing AI frameworks.