RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Overview
RAPTOR is an innovative approach to enhancing language models with the ability to retrieve information more effectively by using a structured tree format. It's designed to handle large volumes of text while ensuring that retrieval is both efficient and context-aware, overcoming many of the limitations found in traditional language models.
How It Works
At the core of RAPTOR's operation is its recursive tree structure. This structure allows RAPTOR to break down large texts into manageable segments, which are then processed for efficient retrieval. The recursive nature of the model helps in understanding and contextualizing information across larger documents, which is crucial for accurate question answering and summarization tasks.
Installation and Setup
To use RAPTOR, ensure you have Python 3.8 or higher. Begin by cloning the RAPTOR repository and installing the necessary dependencies. The setup is straightforward, and once complete, you are ready to start utilizing RAPTOR's capabilities.
git clone https://github.com/parthsarthi03/raptor.git
cd raptor
pip install -r requirements.txt
Basic Usage
-
Setting Up RAPTOR: Begin by setting your API key and initializing the configuration. This setup prepares RAPTOR to work with your documents and queries.
import os os.environ["OPENAI_API_KEY"] = "your-openai-api-key" from raptor import RetrievalAugmentation RA = RetrievalAugmentation()
-
Adding Documents: You can add documents which you would like RAPTOR to index and be able to retrieve information from.
with open('sample.txt', 'r') as file: text = file.read() RA.add_documents(text)
-
Answering Questions: Once documents are added, you can ask questions and RAPTOR will use its index to find answers within your documents.
question = "How did Cinderella reach her happy ending?" answer = RA.answer_question(question=question) print("Answer: ", answer)
-
Saving and Loading: The tree structure RAPTOR creates can be saved and reloaded, ensuring easy access and reuse of processed documents.
SAVE_PATH = "demo/cinderella" RA.save(SAVE_PATH) RA = RetrievalAugmentation(tree=SAVE_PATH)
Extending RAPTOR
RAPTOR is not just limited to its built-in models. Users can extend its functionality by integrating custom models for summarization, question answering, and embedding generation. This is particularly useful for those who need specialized language processing capabilities beyond what RAPTOR provides by default.
Usage With Custom Models
By following a simple integration process, developers can configure RAPTOR to use custom models, enhancing its versatility and adaptability to specific requirements.
Contributing and License
RAPTOR is an open-source project under the MIT License. Community contributions are encouraged, whether it's fixing issues, enhancing features, or improving documentation.
Citation
Researchers using RAPTOR in their work are encouraged to cite the project to acknowledge its role in their research.
RAPTOR is continually evolving, with ongoing updates and new features being developed. Keep an eye out for upcoming enhancements and additional guides to help maximize its functionality.