crab - Comprehensive Benchmarking for Cross-Platform Multimodal Language Models

What is CRAB?

CRAB, which stands for Cross-platform Agent Benchmark, is a modern framework designed to evaluate and benchmark multimodal embodied language model agents. It provides a comprehensive platform to create and test environments for language models that are designed to work in different settings.

Key Features of CRAB

1. Cross-platform and Multi-environment Support

CRAB allows users to build environments that can be deployed in a variety of ways. Whether it's in-memory, hosted in Docker, running on virtual machines, or even spread across distributed physical machines, CRAB provides the flexibility to support them all. It offers a unified interface that allows language model agents to access and interact with multiple environments simultaneously.

2. Easy Configuration

One of CRAB’s strengths is its ease of use. By using simple Python functions and the @action decorator, developers can quickly add new actions that these agents can perform. The environment itself is defined by combining several of these actions, making setup straightforward and easy for those familiar with Python.

3. Innovative Benchmarking Suite

CRAB introduces a fresh way to create benchmarks by defining tasks and evaluators in a manner that is native to Python. One particularly novel aspect is its graph evaluator method which provides detailed metrics to understand the performance of language model agents better.

How to Get Started

Prerequisites

Before diving into CRAB, ensure that Python 3.10 or newer is installed on your system. With this in place, installing CRAB is as simple as executing the following command in your command line:

pip install crab-framework[client]

Experimenting with CRAB

CRAB comes equipped with datasets and experiment codes that can be found in the crab-benchmark-v0 directory. It's recommended to refer to the benchmark tutorial available in the directory to familiarize oneself with how to effectively utilize the benchmark provided by CRAB.

Running CRAB Examples

To see CRAB in action, you can run template environments using an OpenAI agent. Simply set the OPENAI_API_KEY environment variable with your API key, then run:

python examples/single_env.py
python examples/multi_env.py

These examples demonstrate how CRAB can manage and operate within single or multiple environments efficiently.

Learn More

For those looking to dive deeper, CRAB offers a variety of resources including documentation, a blog, and demos available through its website. Moreover, those interested can view a demonstration video available on YouTube to see CRAB in action.

Research and collaborations related to CRAB can find additional insights in the paper titled "CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents" available on arXiv. Please cite this paper if you utilize CRAB in your academic or professional work.

Overall, CRAB presents a flexible and comprehensive solution for anyone working with language models, providing the necessary tools to create, evaluate, and benchmark in a variety of environment setups.