Project Icon

lm-evaluation-harness

Comprehensive Platform for Evaluating Generative Language Model Performance

Product DescriptionThe framework offers a versatile testing ground for generative language models, supporting a broad array of evaluation tasks. Key enhancements include the addition of Open LLM Leaderboard tasks and compatibility with multimodal inputs and APIs, facilitating improved customization and efficiency. It integrates over 60 benchmarks and supports various models, including GPT-NeoX and Megatron-DeepSpeed, with efficient inference using vLLM. The tool is extensively used in research and within organizations such as NVIDIA and Cohere.
Project Details