ctransformers - Unified Interface and GPU Support for Transformer Models

Introduction to CTransformers

CTransformers is an innovative Python library that serves as a bridge to access transformer models implemented in C/C++ utilizing the GGML library. It is designed to help developers leverage the capabilities of these robust models with ease, providing seamless integration into various applications.

Supported Models

CTransformers supports a wide array of models, each catering to different needs and applications in natural language processing. The table below highlights the models supported, their respective types, and compatibility with CUDA and Metal for enhanced performance:

Models	Model Type	CUDA	Metal
GPT-2	`gpt2`
GPT-J, GPT4All-J	`gptj`
GPT-NeoX, StableLM	`gpt_neox`
Falcon	`falcon`	✅
LLaMA, LLaMA 2	`llama`	✅	✅
MPT	`mpt`	✅
StarCoder, StarChat	`gpt_bigcode`	✅
Dolly V2	`dolly-v2`
Replit	`replit`

Installation

Installing CTransformers is straightforward. You can get started with the following command:

pip install ctransformers

This command will install the library along with its necessary dependencies.

Usage

CTransformers offers a unified interface for all models, making it exceptionally user-friendly. Here's a basic example to load a model and generate text:

from ctransformers import AutoModelForCausalLM

llm = AutoModelForCausalLM.from_pretrained("/path/to/ggml-model.bin", model_type="gpt2")

print(llm("AI is going to"))

Users can also stream output text as shown below:

for text in llm("AI is going to", stream=True):
    print(text, end="", flush=True)

To access models from the Hugging Face Hub directly, the following syntax is used:

llm = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml")

Integration with 🤗 Transformers and LangChain

CTransformers can integrate seamlessly with the 🤗 Transformers library, allowing the use of its powerful tokenizers and text generation pipelines:

from transformers import pipeline

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
print(pipe("AI is going to", max_new_tokens=256))

Additionally, the library is integrated with LangChain, extending its capabilities and providing a broader range of applications.

GPU and Enhanced Performance Options

CTransformers supports acceleration using GPUs. To run model layers on GPU, users can specify the number of GPU layers:

llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7B-GGML", gpu_layers=50)

For users with specific hardware, CUDA, ROCm, and Metal supports can be enabled through respective installation commands.

GPTQ and Experimental Features

The library also has experimental features like GPTQ, which currently supports LLaMA models. This ensures users stay ahead with cutting-edge functionalities.

Documentation and Support

The project includes comprehensive documentation which outlines the API details and configurations, ensuring developers can utilize the library to its full potential.

License

CTransformers is released under the MIT License, allowing for flexibility and encouraging wide usage and contribution to its development.

In summary, CTransformers is a powerful tool that simplifies access to and use of transformer models, making it a valuable asset for developers working in the field of natural language processing and machine learning.