Project Icon

llm-colosseum

Real-time Evaluation of LLM Capabilities Using Street Fighter III Tournaments

Product DescriptionExperience real-time evaluations of Language Learning Models (LLMs) through simulated combat in Street Fighter III. This framework assesses models like GPT-3.5 and Mistral 7B on speed, intelligence, adaptability, resilience, and creative strategies. With more than 342 matches conducted, the framework provides insights into how LLMs understand and interact in contextual gaming environments. Explore the ELO ranking, and see how you can customize and test your own models in this innovative assessment arena.
Project Details