instruct-eval
InstructEval is a platform designed to evaluate instruction-tuned LLMs including Alpaca and Flan-T5, using benchmarks like MMLU and BBH. It supports many HuggingFace Transformer models, allows qualitative comparisons, and assesses generalization on tough tasks. With user-friendly scripts and detailed leaderboards, InstructEval shows model strengths. Additional datasets like Red-Eval and IMPACT enhance safety and writing assessments, providing researchers with in-depth performance insights.