vectordb - Focus on Local Text Retrieval Using Embeddings for AI

Introduction to VectorDB

VectorDB is a simple and efficient tool designed for text retrieval based on embeddings. It's a lightweight, fully localized solution that excels in quick and effective text searches. With its minimal memory requirements and low latency, VectorDB is a vital component powering AI features in Kagi Search, a popular search engine.

Installation Process

To get started with VectorDB, the installation process is straightforward. Simply use the Python package manager, pip:

pip install vectordb2

How It Works

VectorDB allows for fast and efficient text retrieval by handling all data operations locally. This includes everything from embeddings to vector searches, ensuring the process is entirely transparent with no performance compromise. Here is a quick illustration of how VectorDB can be used:

Load Your Data: Use the Memory object to save the text and any associated metadata you intend to work with.

from vectordb import Memory
memory = Memory()
memory.save(
    ["apples are green", "oranges are orange"],
    [{"url": "https://apples.com"}, {"url": "https://oranges.com"}]
)

Retrieve Information: Conduct a search for specific queries to get the most relevant results.
```
query = "green"
results = memory.search(query, top_n=1)
print(results)
```
This query returns the segments of text alongside any additional metadata with a calculated vector distance to indicate relevance.

Key Options

VectorDB offers various configuration options to fit diverse needs:

Memory Management: Choose whether to store data on disk or in memory.
Chunking Strategy: Control how texts are divided into manageable pieces for processing.
Embeddings Choice: Opt for different embeddings based on speed or accuracy, including using pre-trained models from resources like HuggingFace.

Practical Example

Beyond simple text retrieval, VectorDB can handle more complex scenarios such as comparing AI concepts:

texts = [
    "...",  # Extensive description of machine learning
    "..."   # Extensive description of artificial intelligence
]
metadata_list = [
    {"title": "Introduction to Machine Learning", "url": "..."},
    {"title": "Introduction to Artificial Intelligence", "url": "..."},
]
memory.save(texts, metadata_list)

query = "What is the relationship between AI and machine learning?"
results = memory.search(query, top_n=3, unique=True)
print(results)

This will yield the most relevant document sections that address the query, ensuring unique responses tailored to the user's needs.

Performance Insights

VectorDB continuously evaluates different embedding models to ensure optimal performance. It employs advanced vector search techniques, adapting the underlying search engine based on data volume to maintain high-speed operations without sacrificing accuracy.

Conclusion

VectorDB emerges as a robust, efficient, and user-friendly solution for text retrieval and embedding-based searches. Its seamless integration and diverse functionalities make it an invaluable tool for developers and AI applications. With an MIT license, VectorDB presents itself as a practical and open solution for the AI community.