ChatTTS-ui - Simple Local Interface for Multilingual Text-to-Speech Synthesis

Introduction to ChatTTS-ui Project

ChatTTS-ui is a user-friendly web interface and API that allows users to convert text into speech easily. This project supports both English and Chinese languages, even handling a mix of numbers and letters seamlessly.

Origins and Essentials

ChatTTS-ui originated from the ChatTTS project. Since version 0.96, the project requires installing ffmpeg before deploying the source code. Previous versions of timber files such as csv and pt are obsolete, necessitating the generation of new timber values.

Sponsorship and Functionality

The primary functionality of ChatTTS-ui is rich, offering an easy-to-navigate platform where users pay-as-they-go without monthly fees. The interface separates management from use, making it highly accessible to individuals and businesses alike. This project is notably supported by 302.AI, a marketplace offering diverse AI solutions globally.

Interface and Features

In the web interface, users can input text which ChatTTS then synthesizes into speech. Here's a preview:

ChatTTS-ui Interface

The tool impressively handles text with mixed characters, digits, and control symbols.

Deployment Options

Windows Pre-packaged Version

Download the package from the Releases.
Extract and run app.exe to get started.
Note: Security software may falsely flag it as a virus, in which case source deployment is recommended.
Systems with NVIDIA GPUs (with over 4GB VRAM) can benefit from CUDA 11.8+ GPU acceleration.

Manual Model Download

Models are initially downloaded from huggingface.co or GitHub to the asset directory. In case of unstable networks, users can manually download and extract models, then place the pt files in the asset directory.

Linux Container Deployment

ChatTTS-ui can be deployed easily in a Docker container on Linux, ensuring it runs efficiently on both CPU and GPU environments. Follow these steps:

Clone the project repository: git clone https://github.com/jianchang512/ChatTTS-ui.git chat-tts-ui.
Start the container using Docker Compose for either GPU or CPU.
Access the interface via IP:9966.

Updating requires pulling the latest code from the main branch and rebuilding the Docker image.

Source Code Deployment

The project supports deployment across various platforms including Linux, MacOS, and Windows:

Linux: Requires Python 3.9-3.11, ffmpeg, and corresponding drivers for CUDA or ROCm for GPU acceleration.
MacOS: Similar setup as Linux with additional dependencies like libsndfile.
Windows: Involves setting up a virtual environment and ensuring necessary installs for GPU acceleration.

Using ChatTTS-ui API

The API allows programmatic access to text-to-speech conversions with configurable parameters like voice, prompt, temperature, and more.

Request Method: POST
Endpoint: http://127.0.0.1:9966/tts
Parameters include text, voice, prompt, and settings for customizing the output.

Successful API calls return JSON data containing paths to the generated audio files.

Integration with pyVideoTrans

ChatTTS-ui can also integrate with pyVideoTrans from version 1.82, where users can select ChatTTS from the settings menu to convert subtitles into speech.

Conclusion

ChatTTS-ui is an open-source project facilitating text-to-speech synthesis through an easy-to-use web interface and API. Its versatility in deployment and integration potential makes it an attractive solution for both individuals looking to transform text into speech and developers seeking integration with other tools.