#Model Evaluation
opencompass
OpenCompass is a comprehensive platform for assessing large language models, featuring advanced algorithms and a user-friendly interface. It supports 20+ HuggingFace and API models, evaluating over 70 datasets with about 400,000 questions. The platform is proficient in distributed evaluations, providing billion-scale assessments within hours, and supports various paradigms including zero-shot and few-shot learning. OpenCompass is modular and easily extendable, accommodating new models and datasets. It also allows for API and accelerated evaluations with different backends, contributing to a fair, open, and reproducible benchmarking ecosystem with its tools like CompassKit, CompassHub, and CompassRank.
fairness-indicators
Fairness Indicators helps objectively assess model fairness for binary and multiclass classifiers with TensorFlow's support. It facilitates data distribution analysis, performance evaluation across user groups, and detailed result slice analysis. Integrating with TensorFlow Data Validation and Model Analysis, plus the What-If Tool, it provides a thorough evaluation framework without overstating capabilities. Explore fairness concerns over time with clear case studies and examples, offering insights for ethical AI integration.
Feedback Email: [email protected]