fastembed - Efficient Python Library for Generating Text and Image Embeddings

⚡️ What is FastEmbed?

FastEmbed is a lightweight and quick Python library specifically designed for creating embeddings. Embeddings are numerical representations of data, often used in machine learning and natural language processing. FastEmbed supports a range of popular text models, which can be explored in detail through their GitHub documentation. If you need additional models, users are encouraged to request them by opening an issue on their GitHub page.

The standard model used by FastEmbed for text embeddings is the 'Flag Embedding', which has been highlighted on the MTEB leaderboard. This model is capable of handling both "query" and "passage" prefixes in the input text, which is particularly useful for retrieval-based embedding generation. Detailed usage examples are available on how to integrate FastEmbed with Qdrant.

📈 Why FastEmbed?

Light: FastEmbed is a highly lightweight library with minimal external dependencies. A unique feature is its lack of requirement for a GPU, and the absence of the need to download large PyTorch libraries. Instead, it uses the ONNX Runtime, making it ideal for serverless computing environments like AWS Lambda.
Fast: Designed with speed in mind, FastEmbed leverages the ONNX Runtime, which typically outperforms PyTorch in execution speed. Additionally, it facilitates data parallelism to encode large datasets efficiently.
Accurate: FastEmbed has shown to surpass the capabilities of OpenAI's Ada-002 model. It continues to expand its repertoire of supported models, including multilingual varieties.

🚀 Installation

Installing FastEmbed is straightforward using pip. Here’s how you can install it:

For normal installation:

pip install fastembed

For installation with GPU support:

pip install fastembed-gpu

📖 Quickstart

Here's a quick guide to getting started with FastEmbed for embedding operations:

from fastembed import TextEmbedding
from typing import List

# Sample list of documents to embed
documents: List[str] = [
    "This is built to be faster and lighter than other embedding libraries e.g. Transformers, Sentence-Transformers, etc.",
    "fastembed is supported by and maintained by Qdrant.",
]

# Initializing the model, which will download and set up the necessary resources
embedding_model = TextEmbedding()
print("The model BAAI/bge-small-en-v1.5 is ready to use.")

# Generating embeddings for the documents
embeddings_generator = embedding_model.embed(documents)
embeddings_list = list(embedding_model.embed(documents))
# The resulting embeddings are vectors of 384 dimensions
len(embeddings_list[0])

FastEmbed offers a variety of models tailored for specific tasks and data types. A comprehensive list of supported models is available in their documentation.

🎒 Dense text embeddings

FastEmbed can also be utilized for dense text embeddings. The example code below demonstrates how dense embeddings can be generated:

from fastembed import TextEmbedding

model = TextEmbedding(model_name="BAAI/bge-small-en-v1.5")
embeddings = list(model.embed(documents))

# Example output:
# [
#   array([-0.1115,  0.0097,  0.0052,  0.0195, ...], dtype=float32),
#   array([-0.1019,  0.0635, -0.0332,  0.0522, ...], dtype=float32)
# ]

🔱 Sparse text embeddings

FastEmbed also offers configurations for sparse text embeddings, such as the SPLADE++:

from fastembed import SparseTextEmbedding

model = SparseTextEmbedding(model_name="prithivida/Splade_PP_en_v1")
embeddings = list(model.embed(documents))

# Example output:
# [
#   SparseEmbedding(indices=[ 17, 123, 919, ... ], values=[0.71, 0.22, 0.39, ...]),
#   SparseEmbedding(indices=[ 38,  12,  91, ... ], values=[0.11, 0.22, 0.39, ...])
# ]

🦥 Late interaction models (aka ColBERT)

FastEmbed includes support for late interaction models such as ColBERT:

from fastembed import LateInteractionTextEmbedding

model = LateInteractionTextEmbedding(model_name="colbert-ir/colbertv2.0")
embeddings = list(model.embed(documents))

# Example output:
# [
#   array([
#       [-0.1115,  0.0097,  0.0052,  0.0195, ...],
#       [-0.1019,  0.0635, -0.0332,  0.0522, ...],
#   ]),
#   array([
#       [-0.9019,  0.0335, -0.0032,  0.0991, ...],
#       [-0.2115,  0.8097,  0.1052,  0.0195, ...],
#   ]),  
# ]

🖼️ Image embeddings

FastEmbed isn't limited to text embeddings—it also supports image embeddings:

from fastembed import ImageEmbedding

images = [
    "./path/to/image1.jpg",
    "./path/to/image2.jpg",
]

model = ImageEmbedding(model_name="Qdrant/clip-ViT-B-32-vision")
embeddings = list(model.embed(images))

# Example output:
# [
#   array([-0.1115,  0.0097,  0.0052,  0.0195, ...], dtype=float32),
#   array([-0.1019,  0.0635, -0.0332,  0.0522, ...], dtype=float32)
# ]

⚡️ FastEmbed on a GPU

FastEmbed can be configured to run on GPU devices, enhancing its performance:

To install the GPU-supporting version:

pip install fastembed-gpu

from fastembed import TextEmbedding

embedding_model = TextEmbedding(
    model_name="BAAI/bge-small-en-v1.5", 
    providers=["CUDAExecutionProvider"]
)
print("The model BAAI/bge-small-en-v1.5 is ready to use on a GPU.")

See their detailed example for more information.

Usage with Qdrant

FastEmbed is designed to work seamlessly with Qdrant for managing and querying large collections of vector data.

To get started with Qdrant and FastEmbed, install the necessary package:

pip install qdrant-client[fastembed]

Or, with GPU support:

pip install qdrant-client[fastembed-gpu]

Below is an example of how you might integrate FastEmbed with Qdrant:

from qdrant_client import QdrantClient

# Initialize the client
client = QdrantClient("localhost", port=6333) # For production
# client = QdrantClient(":memory:") # For small experiments

# Prepare your documents, metadata, and IDs
docs = ["Qdrant has Langchain integrations", "Qdrant also has Llama Index integrations"]
metadata = [
    {"source": "Langchain-docs"},
    {"source": "Llama-index-docs"},
]
ids = [42, 2]

# Add to collection
client.add(
    collection_name="demo_collection",
    documents=docs,
    metadata=metadata,
    ids=ids
)

search_result = client.query(
    collection_name="demo_collection",
    query_text="This is a query document"
)
print(search_result)

In conclusion, FastEmbed is a versatile and efficient tool for embedding generation, suitable for a wide range of applications and models, from text to images, with full integration support for platforms like Qdrant.