One-Shot-Voice-Cloning
The Unet-TTS project facilitates improved speaker and style transfer for one-shot voice cloning by utilizing the Unet network and AdaIN layer. This open-source project offers inferencing code and pre-trained models for generating diverse text-based audio. It addresses the complexity of out-of-domain style transfer using a neutral emotion corpus. The project automates duration statistics and supports multi-speaker TTS with a pre-trained Content Encoder, along with a detailed setup guide. Compatible with Linux and certain TensorFlow versions, implementation is accessible through both Python scripts and Colab notebooks.