VecTextSearch: A Powerful Text Search Tool
Introduction to the Project
VecTextSearch is an innovative project designed to leverage OpenAI's language model to create text vectors, which are then efficiently searched using the Weaviate database. This project aims to allow users to store text data within the Weaviate database and quickly search and retrieve related texts based on similarity. Written in Golang, VecTextSearch comes with a simple REST API that clients can easily access to utilize its functionalities.
Project Background
In many real-world applications, the need to perform rapid searches based on text similarity is paramount. For instance, finding articles with similar content is essential when analyzing large text bodies. Traditional search methods that rely on keywords may not capture the nuanced similarities between texts. VecTextSearch overcomes this limitation by using OpenAI's advanced language models to convert text into vector representations, which are conducive to efficient similarity searches using the Weaviate database.
Applications and Use Cases
VecTextSearch can be particularly useful in several scenarios:
- Content Discovery: It can help users discover related articles, blogs, or research papers.
- Intelligent Q&A Systems: The tool can be used to match user queries with relevant questions and answers rapidly.
- Recommendation Systems: Based on a user’s reading history, VecTextSearch can recommend articles with similar themes.
- Duplication Detection: It can identify redundant or plagiarized content, ensuring originality and compliance.
Features and APIs
VecTextSearch provides two main REST API endpoints:
Adding Text
-
URL:
/add-text
-
Method: POST
-
Content-Type: application/json
-
Request Payload:
{ "name": "Article Name", "content": "Article Content" }
-
Response: Returns a JSON object containing the text ID upon successful addition.
{ "id": "Unique Text Identifier" }
Searching Similar Texts
-
URL:
/search-similar-texts
-
Method: POST
-
Content-Type: application/json
-
Request Payload:
{ "content": "Query Content" }
-
Response: Returns a JSON array with information about similar texts.
[ { "name": "Article Name", "content": "Article Content", "distance": "Distance from Query", "certainty": "Similarity Score" }, ... ]
Objectives and Future Enhancements
VecTextSearch has an existing plan to enhance the project with several features, including:
- Development of demo applications to showcase its capabilities.
- Creation of data management interfaces for better handling of text data stored in Weaviate.
- A user-friendly front-end interface to simplify usage.
- Comprehensive documentation covering API references, usage examples, and tutorials.
- Additional configuration options for optimizing performance and functionality.
- Incorporation of unit and integration tests to ensure code quality.
- Keeping updated with OpenAI's language model improvements.
Getting Started with VecTextSearch
For those interested in contributing to VecTextSearch or using it for development, the following steps are recommended:
-
Clone the Repository: Retrieve the project from GitHub.
git clone https://github.com/szpnygo/VecTextSearch.git
-
Install Dependencies: Navigate into the project directory and install necessary dependencies.
cd VecTextSearch go get -u
-
Configure API Key: Fill in the correct OpenAI API key in the
config.yml
file. -
Run the Project: Start the application with the Go programming language.
go run main.go
Contributors can submit issues or pull requests on GitHub to propose new features or report problems.
Licensing and Contact Information
VecTextSearch is open-source software licensed under the MIT License. Users are encouraged to read the LICENSE file for more details.
For any issues or inquiries, contact the team by:
- Raising an Issue on GitHub.
- Emailing at [email protected].
VecTextSearch is an exciting tool that combines the power of OpenAI's language model with advanced database capabilities to transform how we search for and interact with text data.