ZMM-TTS
ZMM-TTS is a framework that utilizes self-supervised discrete speech representations for multilingual and multispeaker text-to-speech synthesis. The model integrates text-based and speech-based self-supervised learning models to improve speech naturalness and speaker similarity in high-resource languages. It effectively performs zero-shot speech synthesis in low-resource languages, providing high intelligibility and speaker resemblance without prior data. Discover the pre-trained models for six languages and the MM6 dataset, created for balanced multilingual training. Explore speech synthesis in different languages with ZMM-TTS.