Project Icon

willow-inference-server

Efficient and Versatile Inference Server for Language and Speech Tasks

Product DescriptionWillow Inference Server (WIS) enables efficient language processing for self-hosted ASR, TTS, and LLM tasks with CUDA optimization, supporting affordable GPUs like the GTX 1060 and Tesla P4. It facilitates simultaneous model loading with low VRAM demand. Real-time speech recognition, custom TTS voices, and LLaMA-based functions enhance its utility, providing high performance even on lesser GPUs. WIS supports REST, WebRTC, and Web Sockets for broad integration across applications.
Project Details