AGIEval
AGIEval is a benchmark crafted to evaluate the problem-solving and cognitive capabilities of foundation models using tasks from exams like the Chinese Gaokao and American SAT. With the latest update to version 1.1, AGIEval offers MCQ and cloze tasks and provides performance evaluations across models such as GPT-3.5-Turbo and GPT-4o. This benchmark enables objective assessments and ensures researchers can identify model strengths and weaknesses.