⍼ Resin Search Engine
Overview
Resin is a versatile and efficient search engine that can be used both as a remote HTTP service and as an embedded library. It employs a vector space index approach, making it effective for searching through document collections. Resin is designed to handle various data input methods and perform complex queries, making it a robust solution for developers and data scientists alike.
Features and Usage
Document Management
Remote Document Writing
To add a document remotely, one can use an HTTP POST request directed to the appropriate API endpoint. The format requires specifying the collection in which the document will be stored and a JSON array containing the document data:
POST [host]/write?collection=[collection]
Content-Type: application/json
[
{
"field1": "value1",
"field2": "value2"
}
]
Local Document Writing
For those who prefer working locally, Resin provides a method to write documents which involves creating a DocumentDatabase
instance:
using (var database = new DocumentDatabase<string>(_directory, collectionId, model, strategy))
{
foreach (var document in documents)
{
database.Write(document);
}
database.Commit();
}
Querying Data
Resin supports both GET and POST requests for querying its document collections.
GET Queries
These can be used for simple searches by pointing to the query endpoint with the relevant parameters:
GET [host]/query/?collection=mycollection&q=[my_query]&field=field1&field=field2&select=field1&skip=0&take=10
Accept: application/json
POST Queries
For more complex queries, a detailed JSON structure can be POSTed to the API:
POST [host]/query/?select=field1&skip=0&take=10
Content-Type: application/json
Accept: application/json
{
"and": {
"collection": "film,music",
"title": "rocky eye of the tiger",
"or": {
"title": "rambo",
"or": {
"title": "cobra",
"or": {
"cast": "antonio banderas"
}
}
},
"and": {
"year": 1980,
"operator": "gt"
},
"not": {
"title": "first blood"
}
}
}
Local Querying
Queries can also be constructed and executed locally using Resin's query parser:
using (var database = new DocumentDatabase<string>(_directory, collectionId, model, strategy))
{
var queryParser = database.CreateQueryParser();
var query = queryParser.Parse(collectionId, word, "title", "title", and:true, or:false, label:true);
var result = database.Read(query, skip: 0, take: 1);
}
Technical Insights
Document Database
Resin uses a document-based storage system where data is organized into collections. Each document is indexed based on various fields, allowing efficient querying and retrieval.
Vector-based Indices
The core of Resin's search capability lies in its use of vector-based indices. These indices are constructed as binary search trees, where nodes represent clusters of similar data vectors. The angle between vectors (cosine similarity) dictates their grouping, enabling rapid and relevant search results.
Performance and Capabilities
Resin is capable of handling large datasets, such as those the size of Wikipedia, and can deliver search results in sub-second time. It offers flexibility in constructing and optimizing the indices through tools like Sir.Cmd
, and supports advanced query capabilities like field-level and cross-collection joins.
Developers can customize Resin to handle various data formats and utilize different indexing schemes, catering to specific use cases and performance needs. With Resin's architecture, users can efficiently manage and search through their data for insights and information.