willow-inference-server
Willow Inference Server (WIS) enables efficient language processing for self-hosted ASR, TTS, and LLM tasks with CUDA optimization, supporting affordable GPUs like the GTX 1060 and Tesla P4. It facilitates simultaneous model loading with low VRAM demand. Real-time speech recognition, custom TTS voices, and LLaMA-based functions enhance its utility, providing high performance even on lesser GPUs. WIS supports REST, WebRTC, and Web Sockets for broad integration across applications.