Client-Vector-Search Project Introduction
The client-vector-search project is a dynamic and efficient vector search library designed to be used on both client-side browsers and server-side environments. This library is tailored for embedding, searching, and caching operations, standing out with its speed and performance compared to other existing solutions like OpenAI's text-embedding-ada-002 and Pinecone.
Key Features
- Transformer Embeddings: By default, the library utilizes the gte-small transformer model to embed text documents. This functionality provides a powerful way to process and convert text data into vector form.
- Cosine Similarity Calculation: It allows users to compute the cosine similarity between different embeddings, aiding in comparing the likeness of various text documents.
- Client-Side Indexing and Searching: Users can create an index and perform search operations directly on the client side, facilitating rapid access and retrieval of information.
- Caching Support: The library offers vector caching with browser-based caching support to enhance performance and quicken response times.
Roadmap and Future Enhancements
The long-term vision for client-vector-search is to offer a fast, simple solution that scales comfortably with user needs, typically handling thousands of vectors efficiently. Here are some anticipated features and improvements:
- Introduction of an HNSW index suitable for both browser and Node.js environments, independent of third-party libraries.
- Implementation of a comprehensive testing framework, including health checks and performance benchmarks.
Installation and Quickstart
Installing the client-vector-search library is straightforward using npm:
npm i client-vector-search
Here's a simple guide to get you started:
-
Embed Text: Use the
getEmbedding
function to convert text into embeddings asynchronously.const embedding = await getEmbedding("Apple");
-
Initialize the Index: Form an index with objects that include an ‘embedding’ attribute.
const initialObjects = [...]; const index = new EmbeddingIndex(initialObjects);
-
Search for Similar Items: Query the index with a vector and retrieve the most similar entities.
const queryEmbedding = await getEmbedding('Fruit'); const results = await index.search(queryEmbedding, { topK: 5 });
Troubleshooting and Usage in NextJS
For developers using Next.js, modifications to the next.config.js
are necessary to prevent conflicts with certain node modules.
Step-by-Step Usage Guide
Outlined below is a comprehensive step-by-step guide to utilizing all aspects of the library:
- Generate Embeddings: Convert strings into embeddings for further processing.
- Calculate Similarity: Measure the similarity between different text representations.
- Manage Index: Add, update, and remove items from the index as needed.
- Persistent Storage: Save your index on IndexedDB for future retrieval and searches.
- Database Management: Instructions for deleting and managing databases and object stores within IndexedDB.
This guide ensures a thorough understanding of the capabilities of the library, encouraging hands-on experimentation to fully grasp its functionality.
In summary, the client-vector-search library is an essential tool for developers looking to integrate efficient and scalable vector searching capabilities into their applications. With a focus on speed, simplicity, and ease of use, this library promises to meet a wide range of application needs, backed by continuous improvements and support from its founding team.