Project Icon

AgentBench

Evaluate Visual and Language Agents Across Diverse Contexts

Product DescriptionAgentBench provides a framework for evaluating LLMs as agents in different settings. Version v0.2 features architecture updates, new tasks, and broader model testing. VisualAgentBench is introduced for training visual agents with large multimodal models in five environments. Together, these tools aid the development and evaluation of visual and language agents in diverse scenarios, enhancing autonomous capabilities.
Project Details