TrustLLM: Understanding Trustworthiness in Large Language Models
About TrustLLM
TrustLLM is a comprehensive framework that delves into the trustworthiness of large language models (LLMs). The project aims to establish principles, surveys, and benchmarks, setting a firm foundation for evaluating the trust aspects of such models. TrustLLM introduces a set of eight guiding principles addressing trust dimensions and builds a benchmark that measures six crucial factors: truthfulness, safety, fairness, robustness, privacy, and machine ethics.
Various mainstream LLMs are assessed using the TrustLLM framework, utilizing over 30 datasets. The project also discusses ongoing challenges and future directions in enhancing model reliability. Whether you are a researcher or developer, TrustLLM provides insightful tools for evaluating your models' trustworthiness.
Getting Started
Installation
You can easily start using TrustLLM by installing it via GitHub, pip, or conda.
-
GitHub installation (recommended):
git clone [email protected]:HowieHwong/TrustLLM.git
-
Pip installation:
pip install trustllm
-
Conda installation:
conda install -c conda-forge trustllm
For creating a dedicated environment:
conda create --name trustllm python=3.9
Then, install the required packages:
cd trustllm_pkg
pip install .
Dataset Download
To download the TrustLLM dataset, you can use the following Python command:
from trustllm.dataset_download import download_dataset
download_dataset(save_path='save_path')
Generation
To generate data using TrustLLM, you can follow the detailed steps in the generation guide. Here's a simple example:
from trustllm.generation.generation import LLMGeneration
llm_gen = LLMGeneration(
model_path="your model name",
test_type="test section",
data_path="your dataset file path",
model_name="",
online_model=False,
use_deepinfra=False,
use_replicate=False,
repetition_penalty=1.0,
num_gpus=1,
max_new_tokens=512,
debug=False,
device='cuda:0'
)
llm_gen.generation_results()
Evaluation
TrustLLM provides a toolkit to easily assess the trustworthiness of large language models. For detailed instructions, refer to the documentation. An example of evaluating truthfulness is:
from trustllm.task.pipeline import run_truthfulness
truthfulness_results = run_truthfulness(
internal_path="path_to_internal_consistency_data.json",
external_path="path_to_external_consistency_data.json",
hallucination_path="path_to_hallucination_data.json",
sycophancy_path="path_to_sycophancy_data.json",
advfact_path="path_to_advfact_data.json"
)
Dataset & Task Overview
TrustLLM uses datasets to evaluate various aspects of LLM trustworthiness. Some datasets are derived from previous projects, while others are newly developed. The tasks cover areas such as misinformation, hallucination, sycophancy, stereotype awareness, privacy, and ethical judgment.
Leaderboard
To see how different models perform or to add your model's performance data, visit the TrustLLM Leaderboard.
Contribution
Contributions to TrustLLM are welcome, especially in areas like new datasets, trustworthy issue research, and toolkit improvements. To contribute, you can fork the repository, make changes, and create a pull request.
Future Plans
TrustLLM plans to introduce more datasets, Chinese output evaluation, and enhancements in downstream application assessments.
License
The project is open source and licensed under the MIT license.
TrustLLM is a significant step forward in making language models more reliable and accountable. With its extensive toolkit and guidance, users can better understand and improve the trust metrics of their AI systems.