resin - Versatile HTTP and Embedded Search Engine

⍼ Resin Search Engine

Overview

Resin is a versatile and efficient search engine that can be used both as a remote HTTP service and as an embedded library. It employs a vector space index approach, making it effective for searching through document collections. Resin is designed to handle various data input methods and perform complex queries, making it a robust solution for developers and data scientists alike.

Features and Usage

Document Management

Remote Document Writing

To add a document remotely, one can use an HTTP POST request directed to the appropriate API endpoint. The format requires specifying the collection in which the document will be stored and a JSON array containing the document data:

POST [host]/write?collection=[collection]
Content-Type: application/json

[
    {
        "field1": "value1",
        "field2": "value2"
    }
]

Local Document Writing

For those who prefer working locally, Resin provides a method to write documents which involves creating a DocumentDatabase instance:

using (var database = new DocumentDatabase<string>(_directory, collectionId, model, strategy))
{
    foreach (var document in documents)
    {
        database.Write(document);
    }

    database.Commit();
}

Querying Data

Resin supports both GET and POST requests for querying its document collections.

GET Queries

These can be used for simple searches by pointing to the query endpoint with the relevant parameters:

GET [host]/query/?collection=mycollection&q=[my_query]&field=field1&field=field2&select=field1&skip=0&take=10
Accept: application/json

POST Queries

For more complex queries, a detailed JSON structure can be POSTed to the API:

POST [host]/query/?select=field1&skip=0&take=10
Content-Type: application/json
Accept: application/json

{
    "and": {
        "collection": "film,music",
        "title": "rocky eye of the tiger",
        "or": {
            "title": "rambo",
            "or": {
                "title": "cobra",
                "or": {
                    "cast": "antonio banderas"
                }
            }
        },
        "and": {
            "year": 1980,
            "operator": "gt"
        },
        "not": {
            "title": "first blood"
        }
    }
}

Local Querying

Queries can also be constructed and executed locally using Resin's query parser:

using (var database = new DocumentDatabase<string>(_directory, collectionId, model, strategy))
{
    var queryParser = database.CreateQueryParser();
    var query = queryParser.Parse(collectionId, word, "title", "title", and:true, or:false, label:true);
    var result = database.Read(query, skip: 0, take: 1);
}

Technical Insights

Document Database

Resin uses a document-based storage system where data is organized into collections. Each document is indexed based on various fields, allowing efficient querying and retrieval.

Vector-based Indices

The core of Resin's search capability lies in its use of vector-based indices. These indices are constructed as binary search trees, where nodes represent clusters of similar data vectors. The angle between vectors (cosine similarity) dictates their grouping, enabling rapid and relevant search results.

Performance and Capabilities

Resin is capable of handling large datasets, such as those the size of Wikipedia, and can deliver search results in sub-second time. It offers flexibility in constructing and optimizing the indices through tools like Sir.Cmd, and supports advanced query capabilities like field-level and cross-collection joins.

Developers can customize Resin to handle various data formats and utilize different indexing schemes, catering to specific use cases and performance needs. With Resin's architecture, users can efficiently manage and search through their data for insights and information.