Project Overview: RAG-Retrieval
RAG-Retrieval is an innovative project aimed at refining and streamlining the process of training and inferring RAG retrieval models. This involves fine-tuning, inference, and distillation processes for various models of retrieval augmented generation (RAG), which are crucial in improving retrieval and reranking capabilities in different applications.
Key Features
-
Fine-Tuning Support: It supports fine-tuning of any open-source RAG retrieval model including vector (embedding) models, late interaction models like ColBERT, and interactive models such as cross encoder and LLM-based rerankers.
-
Inference Flexibility: Offers a lightweight Python library that provides unified access to different RAG reranker models, simplifying the inference process. This library—rag-retrieval—eliminates the need to learn the intricacies of each model separately.
-
Model Distillation: Supports the distillation of LLM-based reranker models into BERT-based models, optimizing model efficiency and performance.
Community Engagement
For community interaction and support, RAG-Retrieval maintains a dedicated WeChat group.
Recent Updates
-
LLM Reranker Methods (Oct 2024): Developed new techniques for LLMs in reranker tasks and distillation methods to transfer learning to BERT models.
-
Embedding Model Improvements (June 2024): Introduced MRL loss as a standard for training embedding models, enhancing their performance.
-
Preference-Based RAG Fine-Tuning (June 2024): Implemented preference-based fine-tuning for RAG using LLM.
-
Lightweight Python Library (May 2024): Released a lightweight library for efficient RAG ranking.
Understanding Reranker Model Inference
Reranker models are integral to retrieval architectures and RAG systems due to:
-
Varied Performances: Different models excel in different scenarios, necessitating a tool that seamlessly integrates multiple models without the need to adapt to each individually.
-
Emerging Technologies: New reranker models, like the recently released LLM Reranker, use advanced techniques for reordering data, presenting innovative results.
Unique Features of rag-retrieval
This library supports multiple sorting models (e.g., Cross Encoder Rerankers, Decoder-Only LLM Rerankers) and is particularly suited for handling lengthy documents. It is designed for easy expansion to include new models by simply extending a base reranker class.
Installation
Before installation, ensure compatibility with local versions of Torch and CUDA to avoid incompatibilities:
pip install rag-retrieval
Supported Reranker Models
The library supports a wide range of reranker models, particularly those using Transformers' AutoModelForSequenceClassification
. Examples include:
- Cross Encoder Models: Such as BAAI's bge-reranker series.
- LLM Reranker Models: Various LLM-based rerankers including zero-shot sorting using LLM chat models.
Fine-Tuning RAG Models
The project supports fine-tuning various models and embedding techniques, providing detailed scripts and tutorials for each type:
- Embedding Models: Utilizing popular models, supporting both basic and complex fine-tuning scenarios.
- ColBERT Models: Enables fine-tuning of ColBERT models, incorporating specific interaction data.
- Reranker Models: Allows fine-tuning of any open-source reranker models for enhanced query-doc relevance scoring.
Experimental Results
The team tested various reranker and Colbert models on MTEB Reranking tasks, achieving competitive results with models trained using RAG-Retrieval:
- Demonstrated that models fine-tuned with domain-specific data show significant performance improvements.
Licensing
RAG-Retrieval is open-source and licensed under the MIT License, ensuring broad usability and collaboration within the community.