RAG-Retrieval

Project Overview: RAG-Retrieval

RAG-Retrieval is an innovative project aimed at refining and streamlining the process of training and inferring RAG retrieval models. This involves fine-tuning, inference, and distillation processes for various models of retrieval augmented generation (RAG), which are crucial in improving retrieval and reranking capabilities in different applications.

Key Features

Fine-Tuning Support: It supports fine-tuning of any open-source RAG retrieval model including vector (embedding) models, late interaction models like ColBERT, and interactive models such as cross encoder and LLM-based rerankers.
Inference Flexibility: Offers a lightweight Python library that provides unified access to different RAG reranker models, simplifying the inference process. This library—rag-retrieval—eliminates the need to learn the intricacies of each model separately.
Model Distillation: Supports the distillation of LLM-based reranker models into BERT-based models, optimizing model efficiency and performance.

Community Engagement

For community interaction and support, RAG-Retrieval maintains a dedicated WeChat group.

Recent Updates

LLM Reranker Methods (Oct 2024): Developed new techniques for LLMs in reranker tasks and distillation methods to transfer learning to BERT models.
Embedding Model Improvements (June 2024): Introduced MRL loss as a standard for training embedding models, enhancing their performance.
Preference-Based RAG Fine-Tuning (June 2024): Implemented preference-based fine-tuning for RAG using LLM.
Lightweight Python Library (May 2024): Released a lightweight library for efficient RAG ranking.

Understanding Reranker Model Inference

Reranker models are integral to retrieval architectures and RAG systems due to:

Varied Performances: Different models excel in different scenarios, necessitating a tool that seamlessly integrates multiple models without the need to adapt to each individually.
Emerging Technologies: New reranker models, like the recently released LLM Reranker, use advanced techniques for reordering data, presenting innovative results.

Unique Features of `rag-retrieval`

This library supports multiple sorting models (e.g., Cross Encoder Rerankers, Decoder-Only LLM Rerankers) and is particularly suited for handling lengthy documents. It is designed for easy expansion to include new models by simply extending a base reranker class.

Installation

Before installation, ensure compatibility with local versions of Torch and CUDA to avoid incompatibilities:

pip install rag-retrieval

Supported Reranker Models

The library supports a wide range of reranker models, particularly those using Transformers' AutoModelForSequenceClassification. Examples include:

Cross Encoder Models: Such as BAAI's bge-reranker series.
LLM Reranker Models: Various LLM-based rerankers including zero-shot sorting using LLM chat models.

Fine-Tuning RAG Models

The project supports fine-tuning various models and embedding techniques, providing detailed scripts and tutorials for each type:

Embedding Models: Utilizing popular models, supporting both basic and complex fine-tuning scenarios.
ColBERT Models: Enables fine-tuning of ColBERT models, incorporating specific interaction data.
Reranker Models: Allows fine-tuning of any open-source reranker models for enhanced query-doc relevance scoring.

Experimental Results

The team tested various reranker and Colbert models on MTEB Reranking tasks, achieving competitive results with models trained using RAG-Retrieval:

Demonstrated that models fine-tuned with domain-specific data show significant performance improvements.

Licensing

RAG-Retrieval is open-source and licensed under the MIT License, ensuring broad usability and collaboration within the community.