llmperf-leaderboard
This evaluation provides insights on the performance, reliability, and efficiency of LLM inference providers. Key metrics such as output tokens throughput and time to first token are analyzed to assist developers and users in making informed decisions about model integrations. Transparent results and reproducible configurations support the optimization of streaming applications such as chatbots. Note that results may vary due to system load and provider traffic, with data updated as of December 19, 2023, providing a current overview of provider capabilities.