ragas - Optimize LLM Applications with Objective Metrics and Automated Test Data

Introduction to the Ragas Project

The Ragas project is an innovative toolkit designed to enhance the evaluation and optimization of Large Language Model (LLM) applications. It aims to simplify the assessment process, which can often be lengthy and subjective, by providing data-driven, efficient evaluation workflows. This project is perfect for developers and researchers looking to fine-tune their LLM applications with reliability and precision.

Key Features

Objective Metrics: Ragas employs both traditional and LLM-based metrics to evaluate applications with high accuracy.
Test Data Generation: The project offers automatic generation of comprehensive test datasets that include a wide range of scenarios to ensure thorough testing.
Seamless Integrations: It integrates smoothly with popular LLM frameworks like LangChain and various observability tools, ensuring efficiency and convenience in its use.
Build Feedback Loops: By leveraging production data, users can continually improve their LLM applications.

Installation

Installing Ragas is straightforward. For most users, the easiest way is through the Python Package Index (PyPI):

pip install ragas

Alternatively, users can install it directly from the source:

pip install git+https://github.com/explodinggradients/ragas

Quick Start

Evaluate LLM Applications

Evaluating your application with Ragas is simple and involves only a few lines of code. For example, you can use the following Python snippet:

from ragas.metrics import LLMContextRecall, Faithfulness, FactualCorrectness
from langchain_openai.chat_models import ChatOpenAI
from ragas.llms import LangchainLLMWrapper

evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o"))
metrics = [LLMContextRecall(), FactualCorrectness(), Faithfulness()]
results = evaluate(dataset=eval_dataset, metrics=metrics, llm=evaluator_llm)

This code evaluates the accuracy, factual correctness, and faithfulness of responses in your dataset. For a complete guide, users can follow the RAG Evaluation Quickstart available in the Ragas documentation.

Generate Test Datasets

If you don't have readily available data for testing, Ragas supports synthetic test set generation. This feature allows you to create test datasets that mimic real user interactions, controlling elements like difficulty, variety, and complexity. Detailed instructions can be found here.

Community and Contributions

The Ragas community is vibrant and welcoming. You can join their Discord server to discuss LLMs, retrieval methods, production issues, and more with like-minded individuals. Moreover, Ragas is open to contributions from developers worldwide, making it a continually evolving project. Contributors can get involved by fixing bugs, adding features, or improving documentation.

Open Analytics

At Ragas, transparency is a key value. The project collects minimal, anonymized usage data to improve the toolkit and guide future development. This data does not include personal or company-specific information. Details about data collection can be viewed in the open-source code, and aggregated data is publicly available here. Users can opt-out of data tracking by setting the RAGAS_DO_NOT_TRACK environment variable to true.

Overall, Ragas is an exceptional tool for anyone working with LLM applications, offering robust features to streamline and enhance the evaluation process effectively.