Voice-Cloning-App - Efficient Human Voice Cloning Application Using Python and PyTorch

Voice Cloning App

Voice Cloning App is a Python and Pytorch-based application designed to make synthesizing human voices as straightforward as possible. Whether you are a tech enthusiast or someone venturing into voice synthesis for the first time, this app provides a streamlined approach to creating realistic voice clones. With a robust set of features and comprehensive documentation, it serves as a pivotal tool for anyone interested in voice technology.

Key Features

Voice Cloning App comes equipped with a range of features intended to enhance user experience and functionality:

Automatic Dataset Generation: With support for subtitles and audiobooks, users can easily generate their datasets, allowing for a more diverse range of voice cloning possibilities.
Additional Language Support: The app supports multiple languages, making it a versatile tool for global users.
Local & Remote Training: Users can choose to train voices either locally on their machines or remotely, providing flexibility based on their resources.
Easy Train Start/Stop: The app simplifies the training process with options to start and stop training with ease, making it more accessible for beginners.
Data Importing/Exporting: This feature ensures that data handling is efficient, whether users are bringing in new datasets or moving their creations elsewhere.
Multi GPU Support: For enhanced performance, the app supports the use of multiple GPUs.

System Requirements

To run the Voice Cloning App, users need:

A Windows 10 or Ubuntu 20.04+ operating system
At least 5GB of available disk space
An NVIDIA GPU with at least 4GB of memory and driver version 456.38+, though this is optional

Future Improvements

The developers aim to enhance the app continually. Planned future improvements include:

Support for Talknet
Adding GTA alignment for Hifi-gan
Improved batch size estimation for more efficient processing
AMD GPU support for broader hardware compatibility

Manual Guides

For users needing assistance, several manual guides are available that cover various aspects of the app:

Additional Resources

For those who wish to explore more, additional resources and collaborations are available:

A remote training notebook to facilitate online voice cloning.
Platforms like uberduck.ai and Vocodes where users can try out existing voices.
Various Colab resources for tasks like YouTube data fetching and synthesizing in Colab.

Acknowledgements

The Voice Cloning App leverages technology from several recognized projects, including a reworked version of NVIDIA's Tacotron2, as well as resources from DSAlign, Silero, DeepSpeech, and hifi-gan.

The development of this app has been supported by dedicated individuals and organizations, notably Dr. John Bustard at Queen's University Belfast for his guidance, and the team at uberduck.ai for their support. The developer community, including members of the VocalSynthesis subreddit, has also provided invaluable feedback and contributions. Their collective efforts help make Voice Cloning App a powerful tool for voice synthesis.