LocalAIVoiceChat - Real-time AI Voice Chat with Customizable Personality Options

Local AI Voice Chat

Local AI Voice Chat is an innovative project that brings real-time AI conversations to your PC, all executed locally. With customizable AI personality and voice, it allows users to have engaging and interactive dialogues with artificial intelligence.

About the Project

Local AI Voice Chat integrates advanced technology to provide a fast and dynamic voice chatbot experience. By leveraging the powerful Zephyr 7B language model along with real-time speech-to-text and text-to-speech libraries, it creates an interactive environment where users can communicate with AI effortlessly.

Technical Framework

The project utilizes a robust tech stack to operate efficiently:

llama_cpp: This library provides an interface for llama-based language models, specifically integrating with Zephyr 7B to process AI-driven conversations.
RealtimeSTT: Equipped with faster_whisper, this library delivers real-time speech-to-text transcription, converting spoken words into text swiftly.
RealtimeTTS: Using Coqui XTTS, it performs real-time text-to-speech synthesis, enabling natural-sounding AI voice responses.

Development Status

Currently, Local AI Voice Chat is in an experimental alpha stage, which means it may not yet offer the stability needed for production use. While impressive for its developmental phase, users might encounter occasional glitches, particularly with the XTTS model. However, it serves as an initial step towards creating a local real-time voice chatbot.

Updates and Bug Fixes

Recent updates include:

Transition to the Coqui XTTS 2.0 model.
Resolving issues with RealtimeTTS, particularly the download process of the Coqui model.

System Requirements

To operate Local AI Voice Chat in real-time, users will need a GPU with approximately 8 GB of VRAM.

For NVIDIA Users

NVIDIA CUDA Toolkit 11.8 is required. It can be downloaded from the NVIDIA CUDA Toolkit Archive.
NVIDIA cuDNN 8.7.0 for CUDA 11.x is also necessary and available from the NVIDIA cuDNN Archive.

For AMD Users

ROCm v.5.7.1: This can be downloaded from the AMD ROCm Hub. Follow their instructions for installation.
FFmpeg: This software is necessary for various operations and can be installed based on your operating system through package managers.

Installation Process

Start by cloning the repository or downloading the source code.
Install llama.cpp. For AMD users, ensure an environment variable called LLAMA_HIPBLAS is set to on before proceeding with installation.
Install real-time speech and text libraries using pip:
- RealtimeSTT
- RealtimeTTS
Download the Zephyr model from Hugging Face and configure the model path in creation_params.json.
If dependency conflicts appear, specific library versions may be required to resolve them.

Running the Application

Simply run python ai_voicetalk_local.py to start the application.

Customization Options

AI Personality: Modify the chat_params.json to adjust the conversation dynamics and alter the AI's personality.
AI Voice: Change the AI voice by switching the reference WAV file in ai_voicetalk_local.py.
Speech End Detection: If the transcription ends too early, adjust the post_speech_silence_duration parameter.

Contributing

Contributions and suggestions to improve this project are welcome. Interested developers can open a pull request with their proposed changes.

License

Local AI Voice Chat is licensed under the Coqui Public Model License 1.0.0, permitting only non-commercial usage of the model and its outputs.

Contact

For further inquiries or support, please contact Kolja Beigel at [email protected].

This project offers an exciting glimpse into the future of local AI-driven communication, emphasizing user privacy and customizable interaction in real-time.