garak

Introduction to `garak`: LLM Vulnerability Scanner

garak is an innovative tool designed to scrutinize large language models (LLMs) for vulnerabilities. Named similar to the well-known network mapping tool nmap, garak serves as a comprehensive scanning tool, but specifically for language models. It is a part of the "Generative AI Red-teaming & Assessment Kit" and focuses on various aspects where LLMs may fail or produce undesirable outputs.

What Does `garak` Do?

garak explores how well an LLM performs under different challenges by testing for numerous failure modes such as:

Hallucination: Where a model generates false or misleading information not supported by its training data.
Data Leakage: Detects whether a model is unintentionally revealing sensitive or private data.
Prompt Injection: Safeguards against unwanted prompt manipulation.
Misinformation: Evaluates the model's tendency to produce or support misleading facts.
Toxicity Generation: Checks if the model outputs harmful language.
Jailbreaks and Other Vulnerabilities: Identifies attempts to bypass intended model limitations for unauthorized purposes.

`garak`'s Approach

The tool employs static, dynamic, and adaptive probing techniques to examine the weaknesses of an LLM. It provides out-of-the-box tests and supports custom probe configurations to adapt to specific model assessments.

Features and Support

Free to Use: garak is freely available and open to contributions for new features.
Broad Model Support: Compatible with various LLMs through platforms like Hugging Face Hub, OpenAI API, and many more.

Installation and Setup

garak is primarily designed as a command-line tool and can be effortlessly installed via pip:

python -m pip install -U garak

For more recent updates, it can be cloned directly from its GitHub repository for the development version:

python -m pip install -U git+https://github.com/leondz/garak.git@main

Using `garak`

Once installed, garak utilizes a simple command syntax structure, specifying the target model and selecting relevant probes:

garak <options>

garak allows users to tailor the testing process by specifying the model type, model name, and which specific probes to run. Additionally, users can choose to apply all known probes to a model by default.

Example Use Cases

Check if a chat model is susceptible to encoding-based prompt injection:

export OPENAI_API_KEY="your_api_key"
python3 -m garak --model_type openai --model_name gpt-3.5-turbo --probes encoding

Test a Hugging Face model for vulnerabilities against a known attack pattern:

python3 -m garak --model_type huggingface --model_name gpt2 --probes dan.Dan_11_0

Reading Results

garak provides detailed output on each probe's performance against the tested model. It marks responses with a "FAIL" if the model demonstrates any problematic behavior, highlighting areas requiring attention.

Probes Overview

The tool comprises a wide array of probes, each tailored to detect specific weaknesses or exploit attempts on a model. These probes range from the straightforward (e.g., checking for empty responses) to complex adversarial attacks that manipulate the model into undesirable outcomes.

Building Your Own Plugins

Developers can extend garak by writing custom plugins. With a straightforward architecture based on base classes, new probes, generators, or detectors can be added to suit specific testing needs.

Community and Resources

For further guidance and support:

User Guide: Complete documentation for users.
Discord Community: Join to engage with other users and contributors.
Twitter Updates: Follow for the latest updates.

Licensing and Contributions

garak is released under the Apache 2.0 License, and contributions to its development are welcome through pull requests and issue reporting.

In summary, garak is an essential tool for anyone working with large language models who needs a thorough and adaptable testing process for potential weaknesses. From detecting hallucinations to preventing data leaks, it equips developers and researchers to ensure their models behave as intended in diverse scenarios.

Introduction to garak: LLM Vulnerability Scanner

What Does garak Do?

garak's Approach