ChatTTS_colab - Enhanced Text-to-Speech Solution with Easy Deployment and Long-Form Audio Capability

ChatTTS_colab Project Introduction

ChatTTS_colab represents an innovative approach to text-to-speech technology, easing deployment and offering user-friendly features without the need for complex installation processes. Built upon the ChatTTS framework, it showcases a range of functionalities like streaming output, voice customization, long audio generation, and multi-role narration.

Key Features

Streamlined Execution via Colab: Users can experience the ease of running this project directly in their browser by simply clicking the "Open in Colab" button. This eliminates the hassle of intricate environment setups.
Voice Cloning Feature: Generate and explore a variety of voice options, allowing users to retain those they prefer. This feature adds a layer of personalization by varying voice characteristics.
Long Audio Generation: Ideal for those who need to produce extended audio content. This feature ensures that users can handle lengthy text-to-speech tasks seamlessly.
Character Processing: Automated handling of numbers and punctuation errors during speech synthesis, enhancing the clarity and correctness of the generated audio.
Role-based Narration: Supports the distinct narration of text involving multiple characters, with the capability of scripting generation via large models.
Streaming Output Support: Experience real-time playback as audio is generated, negating the need to wait for entire clips to finish processing before listening.

Demonstration of Features

Streaming Output

The system allows for continuous playback of generated audio content, adding a layer of convenience for the user.

Streaming Output Support

Role-based Narration

Manage text that requires multiple voices with ease, suitable for dialogues or multi-character scripts.

Role-based Narration

Voice Cloning

Users can experiment with a diversity of voice samples, selecting favorites for future applications.

Voice Cloning Feature

Long Audio Production

Generate extensive audio outputs, perfect for projects necessitating longer durations.

Long Audio Production

Getting Started

Running in Colab

Click the "Open In Colab" button to launch the Colab notebook.
Simply run all cells by selecting "Run all" in the interface.
After execution, find a URL in the log that looks like: https://**********.gradio.live
Access your project through the generated public URL.

Running on macOS

If not already installed, get Conda.
Open the terminal and create a new Conda environment:
```
conda create -n "ChatTTS_colab" python=3.11
```
Activate the newly created environment:
```
conda activate ChatTTS_colab
```

Clone the repository locally:

git clone [email protected]:6drf21e/ChatTTS_colab.git

Install ChatTTS dependencies into the project directory:

cd ChatTTS_colab
git clone -q https://github.com/2noise/ChatTTS
cd ChatTTS
git checkout -q e6412b1
cd ..
mv ChatTTS temp
mv temp/ChatTTS ./ChatTTS
rm -rf temp

Install additional dependencies:
```
pip install -r requirements-macos.txt
```
Run the project and allow the automatic model downloads:
```
python webui_mix.py
```

Common Issues and Solutions

Model Download Failures: If downloading from Hugging Face fails, clear the cache and retry:
```
rm -rf ~/.cache/huggingface/hub/models--2Noise--ChatTTS
```
Slow Downloads: Enhance download speeds using a mirror site:
```
export HF_ENDPOINT=https://hf-mirror.com
```

Contributors

The project thrives thanks to the contributions of its diverse group of contributors, easily viewable on GitHub.

License

ChatTTS_colab operates under the MIT License, promoting open-source collaboration and innovation.