Project Icon

CharacterEval

Role-Playing Conversational Agent Evaluation Benchmark with Chinese Multi-Turn Dialogues

Product DescriptionCharacterEval provides an extensive dataset featuring 1,785 dialogues and 23,020 examples to evaluate 77 characters from Chinese literature. Utilizing a multidimensional assessment system with thirteen metrics, it integrates a character-based reward model demonstrating superior human correlation compared to other models, including GPT-4. The project includes detailed annotation documents and supports consensus scoring. Additionally, it facilitates the replication of evaluation scores using multiple open-source models.
Project Details