Local AI Voice Chat
Local AI Voice Chat is an innovative project that brings real-time AI conversations to your PC, all executed locally. With customizable AI personality and voice, it allows users to have engaging and interactive dialogues with artificial intelligence.
About the Project
Local AI Voice Chat integrates advanced technology to provide a fast and dynamic voice chatbot experience. By leveraging the powerful Zephyr 7B language model along with real-time speech-to-text and text-to-speech libraries, it creates an interactive environment where users can communicate with AI effortlessly.
Technical Framework
The project utilizes a robust tech stack to operate efficiently:
-
llama_cpp: This library provides an interface for llama-based language models, specifically integrating with Zephyr 7B to process AI-driven conversations.
-
RealtimeSTT: Equipped with faster_whisper, this library delivers real-time speech-to-text transcription, converting spoken words into text swiftly.
-
RealtimeTTS: Using Coqui XTTS, it performs real-time text-to-speech synthesis, enabling natural-sounding AI voice responses.
Development Status
Currently, Local AI Voice Chat is in an experimental alpha stage, which means it may not yet offer the stability needed for production use. While impressive for its developmental phase, users might encounter occasional glitches, particularly with the XTTS model. However, it serves as an initial step towards creating a local real-time voice chatbot.
Updates and Bug Fixes
Recent updates include:
- Transition to the Coqui XTTS 2.0 model.
- Resolving issues with RealtimeTTS, particularly the download process of the Coqui model.
System Requirements
To operate Local AI Voice Chat in real-time, users will need a GPU with approximately 8 GB of VRAM.
For NVIDIA Users
-
NVIDIA CUDA Toolkit 11.8 is required. It can be downloaded from the NVIDIA CUDA Toolkit Archive.
-
NVIDIA cuDNN 8.7.0 for CUDA 11.x is also necessary and available from the NVIDIA cuDNN Archive.
For AMD Users
-
ROCm v.5.7.1: This can be downloaded from the AMD ROCm Hub. Follow their instructions for installation.
-
FFmpeg: This software is necessary for various operations and can be installed based on your operating system through package managers.
Installation Process
-
Start by cloning the repository or downloading the source code.
-
Install llama.cpp. For AMD users, ensure an environment variable called
LLAMA_HIPBLAS
is set toon
before proceeding with installation. -
Install real-time speech and text libraries using pip:
RealtimeSTT
RealtimeTTS
-
Download the Zephyr model from Hugging Face and configure the model path in
creation_params.json
. -
If dependency conflicts appear, specific library versions may be required to resolve them.
Running the Application
Simply run python ai_voicetalk_local.py
to start the application.
Customization Options
-
AI Personality: Modify the
chat_params.json
to adjust the conversation dynamics and alter the AI's personality. -
AI Voice: Change the AI voice by switching the reference WAV file in
ai_voicetalk_local.py
. -
Speech End Detection: If the transcription ends too early, adjust the
post_speech_silence_duration
parameter.
Contributing
Contributions and suggestions to improve this project are welcome. Interested developers can open a pull request with their proposed changes.
License
Local AI Voice Chat is licensed under the Coqui Public Model License 1.0.0, permitting only non-commercial usage of the model and its outputs.
Contact
For further inquiries or support, please contact Kolja Beigel at [email protected].
This project offers an exciting glimpse into the future of local AI-driven communication, emphasizing user privacy and customizable interaction in real-time.