Curated Transformers: An Introduction
Curated Transformers is a specialized library designed for PyTorch that delivers state-of-the-art transformer models. The library distinguishes itself by building each model from a collection of reusable components, making it both powerful and flexible. Let's explore the features that make Curated Transformers stand out.
Key Features
- Advanced Model Support: Curated Transformers supports cutting-edge models including Large Language Models (LLMs) like Falcon, Llama, and the Dolly v2.
- Reusable Components: Each model is crafted using a set of reusable building blocks, which offers several advantages:
- Enhancements or bug fixes apply to all models. For instance, all models can perform 4/8-bit inference via the
bitsandbytes
library and can leverage PyTorch'smeta
device to avoid unnecessary memory use. - Introducing new models to the library is streamlined and efficient.
- Experimenting with new transformer architectures, like a BERT encoder with rotary embeddings, can be done quickly and easily.
- Enhancements or bug fixes apply to all models. For instance, all models can perform 4/8-bit inference via the
- Consistent Type Annotations: The library features consistently applied type annotations across its public APIs. This integration makes it easier to work with your IDE and complements existing type-checked code.
- Educational Utility: The modular building blocks are easy to analyze, making the library ideal for educational purposes.
- Minimal Dependencies: The library is lightweight, minimizing additional requirements for operation.
Curated Transformers has been rigorously tested by Explosion and is set to become the default transformer implementation in spaCy version 3.7.
Supported Model Architectures
Encoder-Only Models
- ALBERT
- BERT
- CamemBERT
- RoBERTa
- XLM-RoBERTa
Decoder-Only Models
- Falcon
- GPT-NeoX
- Llama 1/2
- MPT
Generator wrappers are available for Dolly v2, Falcon, Llama 1/2, and MPT. All model types can be conveniently accessed from the Huggingface Hub. Additionally, the spacy-curated-transformers
package facilitates integration with spaCy.
Installation Process
Installing Curated Transformers is straightforward:
pip install curated-transformers
CUDA Capabilities
For those interested in CUDA support: the default PyTorch build for Linux supports CUDA 11.7. However, for Windows or when utilizing Ada-generation GPUs on Linux, installing PyTorch with CUDA 11.8 can significantly enhance performance:
pip install torch --index-url https://download.pytorch.org/whl/cu118
Usage Example
Curated Transformers simplifies tasks like text generation. Here's a brief example:
>>> import torch
>>> from curated_transformers.generation import AutoGenerator, GreedyGeneratorConfig
>>> generator = AutoGenerator.from_hf_hub(name="tiiuae/falcon-7b-instruct", device=torch.device("cuda"))
>>> generator(["What is Python in one sentence?", "What is Rust in one sentence?"], GreedyGeneratorConfig())
['Python is a high-level programming language that is easy to learn and widely used for web development, data analysis, and automation.',
'Rust is a programming language that is designed to be a safe, concurrent, and efficient replacement for C++.']
For more examples, consult the usage section of the documentation or browse the example programs in the examples
directory.
Documentation and Additional Resources
The official documentation provides comprehensive insights into how to effectively use Curated Transformers:
- Overview and development insights
- Usage guides
- Detailed API documentation
Model Quantization
With Curated Transformers, dynamic 8-bit and 4-bit model quantization is possible using the bitsandbytes
library. The library includes a quantization variant for easy installation:
pip install curated-transformers[quantization]
Curated Transformers' design philosophy, focusing on modularity and reusability, makes it not only a powerful tool for developers but also a conducive platform for learning and experimentation.