fastRAG - Build and Explore State-of-the-Art Retrieval-Augmented Generative Models

Introduction to fastRAG

fastRAG is a state-of-the-art research framework specifically tailored for creating efficient and optimized retrieval-augmented generative (RAG) pipelines. This project combines cutting-edge large language models (LLMs) with information retrieval techniques, enabling researchers and developers to explore and enhance the capabilities of retrieval-augmented generation.

Key Updates

fastRAG is continuously evolving with significant updates to ensure compatibility with the latest technology and improvements in its framework:

May 2024: Release of fastRAG V3, now compatible with Haystack 2.0, enhancing its integration capabilities.
December 2023: Expanded support for Gaudi2 and ONNX runtimes, optimized embedding models, and added multi-modality and chat demonstrations, along with the integration of REPLUG text generation.
June 2023: Enhanced ColBERT index allowing easier document addition/removal.
May 2023: Introduction of RAG with dynamic prompt synthesis using LLMs.
April 2023: Support for Qdrant DocumentStore.

Key Features

fastRAG offers a suite of rich features designed to support efficient and customized RAG pipeline development:

Optimized RAG: Fastest available components for building efficient RAG pipelines to maximize computational resources.
Intel Hardware Optimization: Specifically optimized for running on Intel hardware such as Intel® Xeon® Processors and Intel® Gaudi® AI accelerators, utilizing extensions like Intel PyTorch Extensions (IPEX), Optimum Intel, and Optimum-Habana for performance boosts.
Customizable Framework: Built on top of the Haystack framework and HuggingFace, providing full compatibility with all Haystack components, allowing easy customization for specific needs.

Components Overview

fastRAG is packed with various unique components, each aiding in the creation of efficient RAG pipelines:

LLM Backends

Intel Gaudi Accelerators: Run LLMs efficiently on Gaudi 2 platforms.
ONNX Runtime: Use optimized runtime for LLMs.
OpenVINO: Deploy quantized LLMs using OpenVINO for maximized efficiency.
Llama-CPP: Utilize Llama-CPP backend for running RAG pipelines.

Optimized Components

Embedders: Includes optimized int8 bi-encoders.
Rankers: Offers optimized and sparse cross-encoders for ranking tasks.

RAG-efficient Components

ColBERT: Implements token-based late interaction.
Fusion-in-Decoder (FiD): Incorporates a generative multi-document encoder-decoder.
REPLUG: Features an enhanced multi-document decoder.
PLAID: An exceptionally efficient indexing engine.

Installation

To begin using fastRAG, ensure the following preliminary requirements are met:

Python version 3.8 or higher.
PyTorch version 2.0 or higher.

Install fastRAG via pip for the latest stable version or clone the project for the newest updates. It is recommended to create a virtual environment to manage dependencies:

pip install fastrag

Additional Packages

Depending on the specific needs and components required, additional dependencies can be installed:

pip install fastrag[intel]  # Intel optimized backend
pip install fastrag[openvino]  # OpenVINO optimization
pip install fastrag[elastic]  # ElasticSearch support
pip install fastrag[qdrant]  # Qdrant support

For development purposes:

pip install .[dev]

The project is licensed under the Apache 2.0 License, providing open access for modifications and sharing under specified conditions.

Conclusion

fastRAG stands as a powerful and adaptable framework for anyone looking to advance in the field of retrieval-augmented generative models. By leveraging modern optimizations and support from industry-leading tools, fastRAG ensures developers can efficiently build and customize sophisticated RAG pipelines for advanced applications.