torchscale
TorchScale, a PyTorch library, enables the scaling of Transformers for researchers and developers. It supports the development of new architectures for foundation models, enhancing stability, generality, capability, and efficiency in modeling. Key features include scaling Transformers to 1,000 layers, tuning sparse Mixture-of-Expert models, and achieving length extrapolation with new position embeddings. Recent innovations such as DeepNet, BitNet, RetNet, and LongNet enhance model stability and capacity across various tasks like language, vision, and speech. TorchScale offers straightforward installation for easy integration.