en

#multilingual capabilities

LongBench is an open benchmark evaluating large language models' ability to comprehend long contexts in both Chinese and English. It covers six categories with 21 tasks, such as single- and multi-document QA, and summarization. Featuring a cost-effective automated evaluation process, LongBench assesses models across 14 English tasks, 5 Chinese tasks, and 2 code tasks, with contexts from 5k to 15k words across 4,750 test instances. LongBench-E provides balanced evaluations for different context lengths, aiding in understanding performance variations.

Terms of Use Privacy Policy Advertising Services

Feedback Email: [email protected]