AgentBench
AgentBench provides a framework for evaluating LLMs as agents in different settings. Version v0.2 features architecture updates, new tasks, and broader model testing. VisualAgentBench is introduced for training visual agents with large multimodal models in five environments. Together, these tools aid the development and evaluation of visual and language agents in diverse scenarios, enhancing autonomous capabilities.