Project Icon

ML-Bench

Evaluation Framework for Large Language Models and Machine Learning Agents

Product DescriptionThis framework evaluates large language models and machine learning agents using repository-level code, featuring ML-LLM-Bench and ML-Agent-Bench. Key functionalities include environment setup scripts, data preparation tools, model fine-tuning recipes, and API calling guides. It supports assessing open-source models, aiding in training and testing dataset preparation, and offering a Docker environment for streamlined operation.
Project Details