Introduction to sqlite-vss
sqlite-vss
is a fascinating SQLite extension designed to incorporate vector search capabilities directly into SQLite databases. This tool, built on the Faiss library, can be particularly useful for creating semantic search engines, recommendation systems, or question-answering tools. However, it's important to note that sqlite-vss
is no longer under active development. Instead, focus has shifted to a similar project, sqlite-vec
, which promises to be easier to use and install.
Understanding the Usage
With sqlite-vss
, users can leverage a familiar API akin to the Full-Text Search Extension (fts5
) in SQLite. Principally, users can create virtual tables that store and query vector data efficiently utilizing the vss0
module.
For instance:
-- Assume a vector dimension of 384 for this example
create virtual table vss_articles using vss0(
headline_embedding(384),
description_embedding(384),
);
The flexibility of sqlite-vss
extends to its compatibility with any embedding or vector format. Whether you obtain vectors from OpenAI's API, HuggingFace's services, or even open-source models, sqlite-vss
integrates these seamlessly.
Data Insertion and Querying
Inserting vector data into the database can be achieved via JSON or raw byte insertion methods. Once the data is within the system, querying for similar vectors (a process called "k nearest neighbors") is done through the vss_search
function. An example query might look like:
select rowid, distance
from vss_articles
where vss_search(
headline_embedding,
(select headline_embedding from articles where rowid = 123)
)
limit 100;
Updates and Index Management
Though users can INSERT
and DELETE
data, note that UPDATE
commands aren't supported. Moreover, index management might involve specifying Faiss factory strings for particular columns, tailoring how indices are queried and stored. For example, adding an inverted file index (IVF) can significantly speed up big database queries:
create virtual table vss_ivf_articles using vss0(
headline_embedding(384) factory="IVF4096,Flat,IDMap2",
description_embedding(384) factory="IVF4096,Flat,IDMap2"
);
This setup, especially for large datasets, requires specific training of indices, which can take time, but pays off in improved query speed.
Installation Guidelines
sqlite-vss
provides pre-built binaries for Linux and macOS platforms, with broader support expected in the future. Additionally, it is accessible through common package managers like pip
and npm
.
Setting Up with Different Languages
-
Python: Install via pip:
pip install sqlite-vss
Initialize in Python:
import sqlite3 import sqlite_vss db = sqlite3.connect(':memory:') db.enable_load_extension(True) sqlite_vss.load(db)
-
Node.js: Install via npm:
npm install sqlite-vss
Usage in JavaScript:
import Database from "better-sqlite3"; import * as sqlite_vss from "sqlite-vss"; const db = new Database(":memory:"); sqlite_vss.load(db);
-
Deno: Use with the deno.land module:
import { Database } from "https://deno.land/x/[email protected]/mod.ts"; import * as sqlite_vss from "https://deno.land/x/sqlite_vss/mod.ts"; const db = new Database(":memory:"); db.enableLoadExtension = true; sqlite_vss.load(db);
Current Limitations
While powerful, sqlite-vss
comes with some constraints. Notably, the underlying Faiss indices have a maximum cap, additional filtering isn't supported on top of KNN searches, and it doesn't yet support GPU indices or mmap'ed indices. Currently, CPU-based Faiss indices are the only option.
Supporting the Developer
If you or your organization find sqlite-vss
valuable, consider supporting the developer's ongoing work on similar open source projects. Sponsorships or sharing the project within your network would go a long way in contributing to future developments.
Related Projects
sqlite-http
: A related extension for making HTTP requests from SQLite.sqlite-xsv
: A fast CSV querying extension for SQLite.sqlite-loadable-rs
: A framework for building SQLite extensions using Rust.
In conclusion, sqlite-vss
delivers robust, vector-based search capabilities to SQLite, offering a seamless integration for managing and querying complex datasets through standard SQL interfaces.