en

#model accuracy

This repository provides a lightweight library for transparent evaluations of language models, emphasizing zero-shot and chain-of-thought methods. It includes benchmark results for models such as GPT-4, using tests like MMLU and HumanEval. The library favors simple, realistic instructions over complex prompting to better gauge real-world performance. While not actively maintained, it allows for updates such as bug fixes and new models. The setup supports OpenAI and Anthropic APIs for efficient, adaptable evaluations.

Terms of Use Privacy Policy Advertising Services

Feedback Email: [email protected]