#pretraining

Logo of LLMs-from-scratch
LLMs-from-scratch
This comprehensive guide covers the entire process of building a GPT-like large language model, starting from coding the basics. It provides step-by-step instructions with clear explanations and examples, making it a valuable resource for understanding model development, pretraining, and finetuning techniques. The guide parallels techniques used in technologies like ChatGPT and includes information on loading and refining larger pre-trained models. Access the official code repository for updates and additional resources.
Logo of pixel
pixel
Explore an innovative language modeling approach with image-based text processing, removing fixed vocabulary limitations. This approach enables smooth language adaptation across different scripts. Pretrained with 3.2 billion words, this model surpasses BERT in handling non-Latin scripts. Utilizing components like a text renderer, encoder, and decoder, it reconstructs images at the pixel level, enhancing syntactic and semantic tasks. Access detailed pretraining and finetuning guidelines via Hugging Face for enhanced multilingual text processing.
Logo of mint
mint
Discover a minimalistic PyTorch library implementing common Transformer architectures, ideal for model development from scratch. Engage with sequential tutorials featuring BERT, GPT, and additional models crafted to enhance understanding of Transformers. Utilize fast subword tokenization with HuggingFace tokenizers. The library supports pretraining on various dataset sizes using in-memory and out-of-memory techniques and includes fine-tuning capabilities. Experience features such as the BERT completer for masked string completion. A functional toolkit to support machine learning projects.
Logo of SparK
SparK
SparK offers an innovative method for applying BERT-style self-supervised pretraining to all types of convolutional neural networks. Compatible with various CNN architectures such as ResNet, this approach minimizes dependencies and advances image classification capabilities. By employing sophisticated masked modeling, SparK-trained CNNs can surpass untrained larger models and challenge Swin-Transformer models. The pretraining shows significant scalability, enhancing all models involved. For detailed analysis and insights into the advantages of generative self-supervised pretraining, refer to our ICLR 2023 Spotlight paper. Additionally, our accessible Colab demos illustrate model reconstruction and conv layer masking issues.