Introduction to CTransformers
CTransformers is an innovative Python library that serves as a bridge to access transformer models implemented in C/C++ utilizing the GGML library. It is designed to help developers leverage the capabilities of these robust models with ease, providing seamless integration into various applications.
Supported Models
CTransformers supports a wide array of models, each catering to different needs and applications in natural language processing. The table below highlights the models supported, their respective types, and compatibility with CUDA and Metal for enhanced performance:
Models | Model Type | CUDA | Metal |
---|---|---|---|
GPT-2 | gpt2 | ||
GPT-J, GPT4All-J | gptj | ||
GPT-NeoX, StableLM | gpt_neox | ||
Falcon | falcon | ✅ | |
LLaMA, LLaMA 2 | llama | ✅ | ✅ |
MPT | mpt | ✅ | |
StarCoder, StarChat | gpt_bigcode | ✅ | |
Dolly V2 | dolly-v2 | ||
Replit | replit |
Installation
Installing CTransformers is straightforward. You can get started with the following command:
pip install ctransformers
This command will install the library along with its necessary dependencies.
Usage
CTransformers offers a unified interface for all models, making it exceptionally user-friendly. Here's a basic example to load a model and generate text:
from ctransformers import AutoModelForCausalLM
llm = AutoModelForCausalLM.from_pretrained("/path/to/ggml-model.bin", model_type="gpt2")
print(llm("AI is going to"))
Users can also stream output text as shown below:
for text in llm("AI is going to", stream=True):
print(text, end="", flush=True)
To access models from the Hugging Face Hub directly, the following syntax is used:
llm = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml")
Integration with 🤗 Transformers and LangChain
CTransformers can integrate seamlessly with the 🤗 Transformers library, allowing the use of its powerful tokenizers and text generation pipelines:
from transformers import pipeline
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
print(pipe("AI is going to", max_new_tokens=256))
Additionally, the library is integrated with LangChain, extending its capabilities and providing a broader range of applications.
GPU and Enhanced Performance Options
CTransformers supports acceleration using GPUs. To run model layers on GPU, users can specify the number of GPU layers:
llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7B-GGML", gpu_layers=50)
For users with specific hardware, CUDA, ROCm, and Metal supports can be enabled through respective installation commands.
GPTQ and Experimental Features
The library also has experimental features like GPTQ, which currently supports LLaMA models. This ensures users stay ahead with cutting-edge functionalities.
Documentation and Support
The project includes comprehensive documentation which outlines the API details and configurations, ensuring developers can utilize the library to its full potential.
License
CTransformers is released under the MIT License, allowing for flexibility and encouraging wide usage and contribution to its development.
In summary, CTransformers is a powerful tool that simplifies access to and use of transformer models, making it a valuable asset for developers working in the field of natural language processing and machine learning.