CharacterEval
CharacterEval provides an extensive dataset featuring 1,785 dialogues and 23,020 examples to evaluate 77 characters from Chinese literature. Utilizing a multidimensional assessment system with thirteen metrics, it integrates a character-based reward model demonstrating superior human correlation compared to other models, including GPT-4. The project includes detailed annotation documents and supports consensus scoring. Additionally, it facilitates the replication of evaluation scores using multiple open-source models.