Vector Storage: A Comprehensive Introduction
Vector Storage is a lightweight and efficient vector database designed to store document vectors within the browser's IndexedDB. Its primary function is to enable semantic similarity searches on text documents using vector embeddings. Semantic search, in this context, allows the system to understand the meaning and context of text documents and queries, providing users with more accurate and relevant search results. Vector Storage employs OpenAI embeddings to translate text documents into vectors and offers a user-friendly interface for searching similar documents based on cosine similarity.
Key Features
- Document Storage: Vector Storage cleverly manages document vectors within IndexedDB, ensuring they are organized and easily accessible.
- Semantic Search: Users can perform similarity searches on text documents, enhancing the relevancy of search results.
- Filtering Capabilities: The system allows users to filter search results according to metadata or text content, refining search outcomes.
- Efficient Storage Management: Automatically oversees storage size and removes the least recently used documents to maintain efficiency within space limits.
Understanding Cosine Similarity
Cosine similarity is a crucial component of Vector Storage, serving as a metric to gauge the likeness between two non-zero vectors in an inner product space. It is defined as the cosine of the angle between the vectors, with values ranging from -1 to 1. A value of 1 indicates complete similarity, 0 signifies no similarity, and -1 represents total dissimilarity.
In Vector Storage, cosine similarity helps measure the resemblance between document vectors and a query vector. This similarity score is calculated through the dot product of the vectors, divided by the product of their magnitudes.
LRU Mechanism
The Least Recently Used (LRU) mechanism is a vital part of managing storage size. It automatically purges documents when the storage exceeds a specified limit by sorting them by their hit counter in ascending order, followed by their timestamp. Documents with the lowest hit count and the oldest timestamps are removed first, keeping the storage size within the defined limits.
Installation Process
To install Vector Storage, use the following npm command:
npm i vector-storage
How to Use Vector Storage
Here's a basic example illustrating the use of the VectorStorage class:
import { VectorStorage } from "vector-storage";
// Create an instance of VectorStorage
const vectorStore = new VectorStorage({ openAIApiKey: "your-openai-api-key" });
// Add a text document to the store
await vectorStore.addText("The quick brown fox jumps over the lazy dog.", {
category: "example",
});
// Perform a similarity search
const results = await vectorStore.similaritySearch({
query: "A fast fox leaps over a sleepy hound.",
});
// Display the search results
console.log(results);
API Overview
VectorStorage Class
The VectorStorage class is the primary tool for managing document vectors within IndexedDB.
Constructor: constructor(options: IVSOptions)
Creating an instance of VectorStorage requires an options
object with properties such as:
- openAIApiKey: Necessary for generating embeddings with OpenAI.
- maxSizeInMB: Optional maximum storage size (defaults to 2GB).
- debounceTime: Optional debounce time for saving to IndexedDB (defaults to 0).
- openaiModel: Optional model for generating embeddings (defaults to 'text-embedding-ada-002').
Core Methods
-
addText(text: string, metadata: object): Promise
Adds a single text document, returning the created document. -
addTexts(texts: string[], metadatas: object[]): Promise<IVSDocument[]>
Adds multiple documents, returning an array of created documents. -
similaritySearch(params: ISimilaritySearchParams): Promise<IVSDocument[]>
Conducts a similarity search, returning an array of matching documents.
Document Structure: IVSDocument Interface
The IVSDocument interface encapsulates a document's properties, which include:
- hits: The number of accesses for the document.
- metadata: Relevant metadata for filtering.
- text: The text content.
- timestamp: The addition timestamp.
- vectorMag: Vector magnitude.
- vector: Vector representation.
Contribution and Community
Contributions to Vector Storage are encouraged and welcome. Interested individuals can contribute by:
- Forking the GitHub repository.
- Cloning the fork to their local machine.
- Creating a branch for changes.
- Committing changes to the branch.
- Pushing changes to the GitHub fork.
- Opening a pull request to the main repository.
All contributors should ensure their code aligns with the project's style standards and passes all tests before submitting a pull request. Feedback, bug reports, and improvement suggestions can be submitted via issues on GitHub.
Licensing
Vector Storage is distributed under the MIT License, ensuring an open and collaborative environment for its development and use. For full license details, refer to the LICENSE file in the project's repository.