llama-moe
The LLaMA-MoE project features open-source Mixture-of-Experts models optimized for efficient deployment with 3.0~3.5B active parameters. These models are continually pre-trained using LLaMA's FFNs on selected datasets, improving performance through methods like random and clustering for expert construction and utilizing gating strategies such as TopK Noisy and Switch Gating. With FlashAttention-v2 integration, the project offers swift continual pre-training and dynamic weight sampling, ensuring adaptability and optimization for diverse AI applications.