QuiLLMan: Voice Chat with Moshi
QuiLLMan is an innovative voice chat application that utilizes advanced technology to facilitate seamless communication. At its core is a speech-to-speech language model supported by continuous bidirectional streaming, ensuring an engaging and responsive user experience.
How QuiLLMan Works
The backend of QuiLLMan is driven by Kyutai Lab's Moshi model. This model continuously listens to the user, devises responses, and communicates them effectively. The process is enhanced by the Mimi streaming encoder/decoder, which ensures a consistent and uninterrupted audio stream, both for input and output. Additionally, a speech-text foundation model determines the best moments to respond, optimizing interaction flow.
A notable feature of QuiLLMan is its use of bidirectional websocket streaming and the Opus audio codec for audio compression over the network. This technology translates into nearly instantaneous response times, mimicking the natural rhythm of human conversation.
For those interested in experiencing QuiLLMan firsthand, a live demo is available here.
A Launchpad for Innovation
QuiLLMan is designed not only as a standalone application but also as a foundation for developing your own language model-based projects. It serves as a rich environment for technological experimentation and invites contributions from developers eager to explore new frontiers in voice chat applications.
Setting Up QuiLLMan Locally
For those looking to explore QuiLLMan's capabilities, setting up a local development environment is straightforward:
Requirements
- Install
modal
in your current Python environment (pip install modal
). - Sign up for a Modal account and configure it (
modal setup
). - Generate and set up a Modal token (
modal token new
).
Inference Module Development
QuiLLMan's Moshi server functions as a Modal class module. It loads models and manages streaming state through a FastAPI HTTP server to expose a websocket interface online. To initiate development mode, use the command:
modal serve src.moshi
Monitor the terminal output for a websocket connection URL and note that any project file updates apply automatically. To stop the app, press Ctrl+C
.
Testing the Websocket Connection
From another terminal, you can directly test the websocket connection with tests/moshi_client.py
. First, install the necessary dependencies:
python -m venv venv
source venv/bin/activate
pip install -r requirements/requirements-dev.txt
Then, execute the terminal client and commence interaction:
python tests/moshi_client.py
Ensure your microphone and speakers are functional.
Frontend and HTTP Server Development
The application’s HTTP server is established at src/app.py
, using FastAPI to serve static frontend files. To start a development server, execute:
modal serve src.app
Since src/app.py
incorporates src/moshi.py
, this command will also launch the Moshi websocket server. Any changes to project files reflect automatically, but clearing the browser cache may be required for frontend modifications.
Deploying on Modal
Once development is complete, you can deploy the app to Modal:
modal deploy src.app
This deployment includes both the frontend server and the Moshi websocket server. Remember, deploying on Modal incurs no cost as the platform is serverless and scales to zero when inactive.
QuiLLMan not only showcases cutting-edge technology in voice communication but also empowers developers to push the boundaries of what's possible within voice chat applications.