en

#scaling law

Discover the MatMul-Free LM, a groundbreaking architecture that removes matrix multiplication, optimized for the Transformers library. Leveraging efficient ternary weights, it outperforms traditional models such as Transformer++ in computational efficiency. This model ranges from 370M to 2.7B parameters, ensuring easy integration with PyTorch, Triton, and einops for seamless language model deployment.

Discover strategies for scaling language models in data-limited contexts. This repository includes experiments on data repetition and computational budgets, working with up to 900 billion tokens and models with 9 billion parameters. It offers a scaling law for computational efficiency, considering the decreasing utility of repeated tokens and excess parameters. Methods to address data limitations, such as code augmentation and filtering techniques including perplexity and deduplication, are explained. Access to over 400 training models and datasets is provided, supporting robust language model development in constrained environments.

Terms of Use Privacy Policy Advertising Services

Feedback Email: [email protected]