Project Icon

HierSpeechpp

Efficient Zero-shot Speech Synthesis Using Hierarchical Variational Inference

Product DescriptionHierSpeech++ employs hierarchical variational inference to advance zero-shot speech synthesis, enhancing robustness and expressiveness. It efficiently bridges semantic and acoustic gaps, significantly boosting naturalness and speaker similarity in TTS and voice conversion. This project includes a text-to-vec framework and a high-efficiency super-resolution process, enhancing audio from 16kHz to 48kHz. Built on PyTorch, it offers pre-trained models for further exploration, outperforming LLM-based and diffusion models in human-level quality synthesis.
Project Details