ChatTTS-Forge

Introduction to ChatTTS-Forge

Overview

ChatTTS-Forge is an innovative project designed around models for Text-to-Speech (TTS) generation. The project provides an API Server and a web-based interface using Gradio, allowing users to easily generate and manipulate speech data.

How to Experience and Deploy

There are several ways to explore and deploy Speech-AI-Forge:

Online Experience: Users can try out the features directly on HuggingFace Spaces via this link.
One-Click Start: Initiate the project with a single click using Colab via this button.
Container Deployment: Detailed information is available in the Docker section.
Local Deployment: Instructions for setting up the environment are provided in the local installation section.

Installation and Running

To get started, ensure all required dependencies are installed. Launch the app using:

python webui.py

Web UI Features

The web interface comes with a suite of features enabling various functionalities:

TTS Features:
- Speaker Switch: Users can switch voices from multiple built-in options or upload custom voice files.
- Style Control: Offers multiple styles to control the speech's tone.
- Long Text Support: Capable of processing lengthy texts by automatically splitting them.
- Refiner: Supports refinements for ChatTTS-generated text.
- Splitter and Adjuster: Customize text splitting settings and adjust speed, pitch, and volume.
- Enhancer Model: Enhances TTS output quality.
- Generation History: Stores the last three outputs for easy comparison.
- Multiple Models: Supports various TTS models like ChatTTS, CosyVoice, FishSpeech, and GPT-SoVITS.
SSML: Provides advanced textual controls using XML-like syntax, allowing users to craft detailed audio narratives and recreate text from subtitles.
Voice Features:
- Builder: Create new voices using a seed or reference audio.
- Testing: Enables voice tests with uploaded files.
- ChatTTS Tools: Includes options for generating random voices and blending different voices.
ASR Features:
- Utilizes models like Whisper for automatic speech recognition.
Tools: Offers post-process options for clipping, adjusting, and enhancing audio files.

API Server - `launch.py`

For scenarios requiring higher API throughput or avoiding the web UI, ChatTTS-Forge allows launching a pure API service:

python launch.py

Access API documentation at http://localhost:7870/docs. Options for script parameters and further API details can be found by using:

python launch.py -h
API documentation

Integrating with SillyTavern

ChatTTS-Forge can be linked to SillyTavern with /v1/xtts_v2 series APIs by configuring SillyTavern's TTS settings as per the following instructions:

SillyTavern Integration Example

Demos

Demonstrations of style control and long text generation are available showcasing the system's capabilities with audio examples.

Docker Deployment

While Docker images are still in development for Speech-AI-Forge, manual building steps are provided for now. The necessary models are downloadable using scripts in the /scripts directory.

Roadmap

The project aims to support an expansive range of models including ChatTTS, FishSpeech, CosyVoice, and FireRedTTS. Exploration into whisper and SenseVoice for ASR, Voice Cloning models like OpenVoice, and enhancers like ResembleEnhance is underway.

Model Downloads and FAQs

The Speak-AI-Forge project offers detailed guides for downloading models and addresses frequently asked questions. Topics include voice cloning, model training, and optimizing inference speed.

References

Further documentation and discussions can be found on the project's GitHub repository.

ChatTTS-Forge stands out as a comprehensive tool for TTS and speech-related applications, promising a broad range of features and adaptable deployment options for various user needs.