mixture-of-experts
Discover the Pytorch implementation of Sparsely Gated Mixture of Experts intended to enhance language model capacity by increasing parameters without additional computation. This version adds features to the original TensorFlow model, supporting complex architectures such as hierarchical mixtures, and enables customization of expert networks with various activation functions and gating policies. Suitable for developers who wish to scale models effectively while maintaining performance, it includes setup and usage instructions for easy integration.