paddler
Paddler provides a stateful load balancing solution tailored for llama.cpp, focusing on efficient slot-based request distribution. Features include flexible autoscaling, AWS compatibility, and real-time monitoring via a built-in dashboard. Paddler aims to optimize server performance and cost-effectiveness without unnecessary extraneous adjectives.