Project Icon

AGIEval

Comprehensive Evaluation of Foundation Models via Human-Centric Benchmarks

Product DescriptionAGIEval is a benchmark crafted to evaluate the problem-solving and cognitive capabilities of foundation models using tasks from exams like the Chinese Gaokao and American SAT. With the latest update to version 1.1, AGIEval offers MCQ and cloze tasks and provides performance evaluations across models such as GPT-3.5-Turbo and GPT-4o. This benchmark enables objective assessments and ensures researchers can identify model strengths and weaknesses.
Project Details