evidently - Assess Machine Learning and LLM Systems with Extensive Metrics

Evidently Project Overview

Evidently is an open-source Python library designed to evaluate, test, and monitor machine learning (ML) and large language model (LLM) systems. This tool provides comprehensive support for AI-powered systems, helping them transition smoothly from experimental phases to full production.

Key Features of Evidently

Data Compatibility: Evidently works seamlessly with various data types, including tabular data, text, and even data embeddings.
System Versatility: It supports a range of predictive and generative systems, ideal for tasks ranging from simple classifications to complex retrieval-augmented generation (RAG) systems.
Comprehensive Metrics: With over 100 built-in metrics, users can track everything from data drift detection to assessments of LLM outputs.
Customizability: The platform allows users to create custom metrics and tests via a Python interface, catering to specific needs and requirements.
Evaluation and Monitoring: Evidently supports both offline evaluations and real-time system monitoring.
Open Architecture: Users can easily export data and integrate Evidently with other tools or systems.

Components of Evidently

Evidently is a modular platform, which means it can be used in parts or as a whole, depending on the user's needs.

1. Reports

Reports are used to calculate various data, ML, and LLM quality metrics. Users can start with predefined presets or customize their reports. These reports offer interactive visuals that are excellent for exploratory analysis and debugging procedures. They are highly flexible and can be exported in multiple formats such as JSON, Python dictionary, HTML, or DataFrame.

2. Test Suites

Test Suites are designed to check predefined conditions on metric values, returning simple pass or fail results. Ideal for regression testing, CI/CD checks, or data validation pipelines. Test Suites can automatically generate test conditions or allow users to set their own conditions like greater than (gt), less than (lt), etc.

3. Monitoring Dashboard

The Monitoring UI helps users visualize metrics and test results over time. Users can choose to self-host the open-source version or use Evidently Cloud, which offers additional features like user management and alerting.

Getting Started with Evidently

Evidently is available on PyPI and can be installed using pip or conda. The library offers an easy-to-follow setup process for users to get started quickly.

Using Test Suites

Import the Test Suite and evaluation preset along with a toy dataset. Data is split into reference and current datasets for the Data Stability Test Suite, which checks column value ranges, missing values, and more.

Using Reports

Import the Report and evaluation preset with a toy dataset to compare column distributions between current and reference datasets. Export results in desirable formats like HTML or JSON.

Using the Monitoring Dashboard

After installation, run the Evidently UI with demo projects to access the service through a web browser. This feature allows users to visualize real-time data and system performance effectively.

Evaluations and Customizations

Evidently includes an extensive set of built-in evaluations, with optional visualizations available for each metric. Users can add custom evaluations to check various aspects of their data and models, such as text descriptors, LLM outputs, data quality, data drift, classification, regression, ranking, and recommendation systems.

Contributions and Community

Evidently is a community-driven project welcoming contributions from users. Detailed documentation and guides are available to assist users in navigating and utilizing the platform effectively. Users are encouraged to join the vibrant Discord community to connect and share insights.

For more detailed information and tutorials, users can refer to Evidently's complete documentation and explore the available examples and how-to guides.