TrustLLM - Broad Evaluation Tool for Assessing Trust in Large Language Models

TrustLLM: Understanding Trustworthiness in Large Language Models

About TrustLLM

TrustLLM is a comprehensive framework that delves into the trustworthiness of large language models (LLMs). The project aims to establish principles, surveys, and benchmarks, setting a firm foundation for evaluating the trust aspects of such models. TrustLLM introduces a set of eight guiding principles addressing trust dimensions and builds a benchmark that measures six crucial factors: truthfulness, safety, fairness, robustness, privacy, and machine ethics.

Various mainstream LLMs are assessed using the TrustLLM framework, utilizing over 30 datasets. The project also discusses ongoing challenges and future directions in enhancing model reliability. Whether you are a researcher or developer, TrustLLM provides insightful tools for evaluating your models' trustworthiness.

Getting Started

Installation

You can easily start using TrustLLM by installing it via GitHub, pip, or conda.

GitHub installation (recommended):

git clone [email protected]:HowieHwong/TrustLLM.git

Pip installation:
```
pip install trustllm
```
Conda installation:
```
conda install -c conda-forge trustllm
```

For creating a dedicated environment:

conda create --name trustllm python=3.9

Then, install the required packages:

cd trustllm_pkg
pip install .

Dataset Download

To download the TrustLLM dataset, you can use the following Python command:

from trustllm.dataset_download import download_dataset

download_dataset(save_path='save_path')

Generation

To generate data using TrustLLM, you can follow the detailed steps in the generation guide. Here's a simple example:

from trustllm.generation.generation import LLMGeneration

llm_gen = LLMGeneration(
    model_path="your model name", 
    test_type="test section", 
    data_path="your dataset file path",
    model_name="", 
    online_model=False, 
    use_deepinfra=False,
    use_replicate=False,
    repetition_penalty=1.0,
    num_gpus=1, 
    max_new_tokens=512, 
    debug=False,
    device='cuda:0'
)

llm_gen.generation_results()

Evaluation

TrustLLM provides a toolkit to easily assess the trustworthiness of large language models. For detailed instructions, refer to the documentation. An example of evaluating truthfulness is:

from trustllm.task.pipeline import run_truthfulness

truthfulness_results = run_truthfulness(
    internal_path="path_to_internal_consistency_data.json",  
    external_path="path_to_external_consistency_data.json",  
    hallucination_path="path_to_hallucination_data.json",  
    sycophancy_path="path_to_sycophancy_data.json",
    advfact_path="path_to_advfact_data.json"
)

Dataset & Task Overview

TrustLLM uses datasets to evaluate various aspects of LLM trustworthiness. Some datasets are derived from previous projects, while others are newly developed. The tasks cover areas such as misinformation, hallucination, sycophancy, stereotype awareness, privacy, and ethical judgment.

Leaderboard

To see how different models perform or to add your model's performance data, visit the TrustLLM Leaderboard.

Contribution

Contributions to TrustLLM are welcome, especially in areas like new datasets, trustworthy issue research, and toolkit improvements. To contribute, you can fork the repository, make changes, and create a pull request.

Future Plans

TrustLLM plans to introduce more datasets, Chinese output evaluation, and enhancements in downstream application assessments.

License

The project is open source and licensed under the MIT license.

TrustLLM is a significant step forward in making language models more reliable and accountable. With its extensive toolkit and guidance, users can better understand and improve the trust metrics of their AI systems.