clone-voice - Multi-Language Voice Cloning with Simple Interface

Clone-Voice Project Overview

Clone-Voice is a user-friendly tool designed to clone human voices and transform text into speech using any selected voice. It can also convert an existing audio piece into another tone using the same chosen voice. This project is built using the xtts_v2 model developed by coqui.ai and follows the Coqui Public Model License 1.0.0. Anyone interested in using this project should comply with the license terms available at https://coqui.ai/cpml.txt.

Key Features of Clone-Voice

Versatile Language Support: Clone-Voice can generate speech in multiple languages, including Chinese, English, Japanese, Korean, French, German, and Italian, among others—a total of 16 languages.
User-Friendly Interface: The tool is straightforward to use, even without a powerful GPU. Users can download the pre-compiled version, execute the app.exe file, and access the web interface with simple clicks.
Audio Recording: Users can directly record audio through their microphones for better synthesis quality. It is recommended that recordings last between 5 and 20 seconds and are clear of background noise.
Excellent Performance: While the English synthesis is notably high-quality, the tool also performs reasonably well in other languages, including Chinese.

How to Use Clone-Voice

For Windows users, the pre-compiled version of the software can be installed easily by following these steps:

Download and Extract: Access the Releases page and download the main file (1.7GB) and the model (3GB). Extract the files to a suitable directory, such as E:/clone-voice.
Launch the Application: Execute the app.exe file, which will automatically open a web window. Console messages should be followed carefully for errors.
Model Integration: Extract the downloaded model into the 'tts' folder in the software directory.
Conversion Processes:
- To convert text to speech, input text into the textbox or import an SRT file, then click "Start Now."
- For voice-to-voice conversion, add your audio file (mp3/wav/flac), select the voice for cloning, or upload a recorded 5-20 seconds sound file. You can also record live audio in the browser.
CUDA Acceleration: If your machine is equipped with an Nvidia GPU and has the necessary CUDA environment, the application will automatically utilize CUDA for acceleration.

Source Code Deployment

The Clone-Voice project source code can also be deployed on Linux, Mac, and Windows systems. Users need to ensure their systems meet requirements like Python 3.9-3.11, have git-cmd tools installed, and have configured a stable proxy to facilitate downloading large models.

To get the source code up and running:

Install necessary software, including Python, Git, and verify a stable proxy setting.
Pull the source code using Git and set up a virtual environment.
Install necessary dependencies and perform adjustments for CUDA acceleration if available.
Use the ffmpeg program alongside the app.py file for comprehensive deployment.
Follow detailed instructions for downloading and initializing the model and updating configurations if needed.

Problems and Solutions

Users might encounter issues like slow start-up, conversion errors, or character limit warnings, which are generally solvable by following the instructions provided, such as adjusting proxy settings or modifying configuration files.

Sponsorship

Clone-Voice is sponsored by 302.AI, a platform with a wide array of AI tools, available on-demand without monthly fees. They offer solutions to fit various user needs, from beginners to developers, with an easy-to-use interface.

Clone-Voice exemplifies a powerful, versatile, and accessible text-to-speech solution, appealing for personal projects, learning, or research purposes.