Introduction to fastRAG
fastRAG is a state-of-the-art research framework specifically tailored for creating efficient and optimized retrieval-augmented generative (RAG) pipelines. This project combines cutting-edge large language models (LLMs) with information retrieval techniques, enabling researchers and developers to explore and enhance the capabilities of retrieval-augmented generation.
Key Updates
fastRAG is continuously evolving with significant updates to ensure compatibility with the latest technology and improvements in its framework:
- May 2024: Release of fastRAG V3, now compatible with Haystack 2.0, enhancing its integration capabilities.
- December 2023: Expanded support for Gaudi2 and ONNX runtimes, optimized embedding models, and added multi-modality and chat demonstrations, along with the integration of REPLUG text generation.
- June 2023: Enhanced ColBERT index allowing easier document addition/removal.
- May 2023: Introduction of RAG with dynamic prompt synthesis using LLMs.
- April 2023: Support for Qdrant
DocumentStore
.
Key Features
fastRAG offers a suite of rich features designed to support efficient and customized RAG pipeline development:
- Optimized RAG: Fastest available components for building efficient RAG pipelines to maximize computational resources.
- Intel Hardware Optimization: Specifically optimized for running on Intel hardware such as Intel® Xeon® Processors and Intel® Gaudi® AI accelerators, utilizing extensions like Intel PyTorch Extensions (IPEX), Optimum Intel, and Optimum-Habana for performance boosts.
- Customizable Framework: Built on top of the Haystack framework and HuggingFace, providing full compatibility with all Haystack components, allowing easy customization for specific needs.
Components Overview
fastRAG is packed with various unique components, each aiding in the creation of efficient RAG pipelines:
LLM Backends
- Intel Gaudi Accelerators: Run LLMs efficiently on Gaudi 2 platforms.
- ONNX Runtime: Use optimized runtime for LLMs.
- OpenVINO: Deploy quantized LLMs using OpenVINO for maximized efficiency.
- Llama-CPP: Utilize Llama-CPP backend for running RAG pipelines.
Optimized Components
- Embedders: Includes optimized int8 bi-encoders.
- Rankers: Offers optimized and sparse cross-encoders for ranking tasks.
RAG-efficient Components
- ColBERT: Implements token-based late interaction.
- Fusion-in-Decoder (FiD): Incorporates a generative multi-document encoder-decoder.
- REPLUG: Features an enhanced multi-document decoder.
- PLAID: An exceptionally efficient indexing engine.
Installation
To begin using fastRAG, ensure the following preliminary requirements are met:
- Python version 3.8 or higher.
- PyTorch version 2.0 or higher.
Install fastRAG via pip for the latest stable version or clone the project for the newest updates. It is recommended to create a virtual environment to manage dependencies:
pip install fastrag
Additional Packages
Depending on the specific needs and components required, additional dependencies can be installed:
pip install fastrag[intel] # Intel optimized backend
pip install fastrag[openvino] # OpenVINO optimization
pip install fastrag[elastic] # ElasticSearch support
pip install fastrag[qdrant] # Qdrant support
For development purposes:
pip install .[dev]
The project is licensed under the Apache 2.0 License, providing open access for modifications and sharing under specified conditions.
Conclusion
fastRAG stands as a powerful and adaptable framework for anyone looking to advance in the field of retrieval-augmented generative models. By leveraging modern optimizations and support from industry-leading tools, fastRAG ensures developers can efficiently build and customize sophisticated RAG pipelines for advanced applications.