cherche - Improve document retrieval using neural search pipelines and pre-trained models

Cherche: An Introduction to Neural Search

Cherche is an innovative tool designed to facilitate the development of neural search pipelines. These pipelines utilize retrievers and pre-trained language models, serving both as retrievers and rankers, allowing users to search effectively through vast amounts of data. The standout feature of Cherche is its ability to build comprehensive end-to-end pipelines, making it particularly suitable for offline semantic searches, thanks to its batch computation capabilities.

Features of Cherche

Cherche provides users with a wide range of features that simplify the implementation of neural search solutions. Some of the key offerings include:

Neural Search Pipeline Development: Allows the creation of pipelines that use both retrievers and rankers to enhance search accuracy and relevance.
Compatibility with Pre-Trained Models: Enables the use of pre-trained language models for more intelligent search outcomes.
End-to-End Pipeline Capability: Supports comprehensive solutions from data retrieval to ranking, ensuring efficiency.
Batch Computation for Offline Searches: Suitable for scenarios where offline processing is advantageous.

To see these features in action, you can explore the live demo of an NLP search engine powered by Cherche.

Installation

Installing Cherche is straightforward, with different configurations available:

For a simple retriever on a CPU, such as TfIdf, Flash, Lunr, or Fuzz, use:
```
pip install cherche
```
For semantic retriever or ranker usage on a CPU:
```
pip install "cherche[cpu]"
```
For those requiring support on a GPU:
```
pip install "cherche[gpu]"
```

QuickStart with Cherche

Document Retrieval

Cherche allows users to identify relevant documents within a dataset using simple commands. It can efficiently find documents by title, URL, or content.

from cherche import data

documents = data.load_towns()

documents[:3]

Retriever and Ranker

Cherche offers a powerful combination of a TF-IDF retriever and ranking models that provide documents based on semantic similarity. Users can enter queries and swiftly receive related documents.

from cherche import data, retrieve, rank
from sentence_transformers import SentenceTransformer

# Load some documents and setup retrievers and rankers...

Search and Retrieval

Cherche's retrievers filter documents based on user queries, utilizing various techniques including TF-IDF, BM25, and even methods like Lunr and Flash.

Ranking Capabilities

Once documents are retrieved, rankers further refine results, working with models like SentenceTransformers to ensure that the most relevant results are provided.

Question Answering

Cherche is equipped with modules dedicated to question answering, making it fully compatible with Hugging Face's pre-trained models. This integration places Cherche at the forefront of neural search technology.

Contributors and Acknowledgements

Cherche was developed with contributions from various experts and is now accessible to all, with an open call for contributions. The project acknowledges the community and various technologies it builds upon, including Lunr.py and FlashText.

For academic references, those using Cherche in scientific publications are encouraged to cite the relevant SIGIR paper.

Development Team

Cherche has been developed by a skilled team including Raphaël Sourty, François-Paul Servant, Nicolas Bizzozzero, and Jose G Moreno, whose combined expertise ensures the ongoing success and innovation of Cherche.