cramming
This project investigates efficient BERT-like pretraining on a single GPU in just one day, challenging high-compute norms. The research highlights pipeline modifications achieving near-BERT performance under constraints, exploring scaling laws and training advancements. Key features include enhanced data preprocessing from Hugging Face and PyTorch 2.0 compatibility, aiding researchers with limited resources.