curated-transformers - Comprehensive PyTorch Transformer Library Featuring Modular and Reusable Frameworks

Curated Transformers: An Introduction

Curated Transformers is a specialized library designed for PyTorch that delivers state-of-the-art transformer models. The library distinguishes itself by building each model from a collection of reusable components, making it both powerful and flexible. Let's explore the features that make Curated Transformers stand out.

Key Features

Advanced Model Support: Curated Transformers supports cutting-edge models including Large Language Models (LLMs) like Falcon, Llama, and the Dolly v2.
Reusable Components: Each model is crafted using a set of reusable building blocks, which offers several advantages:
- Enhancements or bug fixes apply to all models. For instance, all models can perform 4/8-bit inference via the bitsandbytes library and can leverage PyTorch's meta device to avoid unnecessary memory use.
- Introducing new models to the library is streamlined and efficient.
- Experimenting with new transformer architectures, like a BERT encoder with rotary embeddings, can be done quickly and easily.
Consistent Type Annotations: The library features consistently applied type annotations across its public APIs. This integration makes it easier to work with your IDE and complements existing type-checked code.
Educational Utility: The modular building blocks are easy to analyze, making the library ideal for educational purposes.
Minimal Dependencies: The library is lightweight, minimizing additional requirements for operation.

Curated Transformers has been rigorously tested by Explosion and is set to become the default transformer implementation in spaCy version 3.7.

Supported Model Architectures

Encoder-Only Models

ALBERT
BERT
CamemBERT
RoBERTa
XLM-RoBERTa

Decoder-Only Models

Falcon
GPT-NeoX
Llama 1/2
MPT

Generator wrappers are available for Dolly v2, Falcon, Llama 1/2, and MPT. All model types can be conveniently accessed from the Huggingface Hub. Additionally, the spacy-curated-transformers package facilitates integration with spaCy.

Installation Process

Installing Curated Transformers is straightforward:

pip install curated-transformers

CUDA Capabilities

For those interested in CUDA support: the default PyTorch build for Linux supports CUDA 11.7. However, for Windows or when utilizing Ada-generation GPUs on Linux, installing PyTorch with CUDA 11.8 can significantly enhance performance:

pip install torch --index-url https://download.pytorch.org/whl/cu118

Usage Example

Curated Transformers simplifies tasks like text generation. Here's a brief example:

>>> import torch
>>> from curated_transformers.generation import AutoGenerator, GreedyGeneratorConfig
>>> generator = AutoGenerator.from_hf_hub(name="tiiuae/falcon-7b-instruct", device=torch.device("cuda"))
>>> generator(["What is Python in one sentence?", "What is Rust in one sentence?"], GreedyGeneratorConfig())
['Python is a high-level programming language that is easy to learn and widely used for web development, data analysis, and automation.',
 'Rust is a programming language that is designed to be a safe, concurrent, and efficient replacement for C++.']

For more examples, consult the usage section of the documentation or browse the example programs in the examples directory.

Documentation and Additional Resources

The official documentation provides comprehensive insights into how to effectively use Curated Transformers:

Overview and development insights
Usage guides
Detailed API documentation

Model Quantization

With Curated Transformers, dynamic 8-bit and 4-bit model quantization is possible using the bitsandbytes library. The library includes a quantization variant for easy installation:

pip install curated-transformers[quantization]

Curated Transformers' design philosophy, focusing on modularity and reusability, makes it not only a powerful tool for developers but also a conducive platform for learning and experimentation.