#LJSpeech
DiffSinger
DiffSinger provides a PyTorch-based implementation for singing voice synthesis, using a shallow diffusion mechanism and pre-trained FastSpeech2 Auxiliary Decoder. It supports multiple model configurations—naive, auxiliary, and shallow—with TTS features like pitch, volume, and rate control. Designed for single and batch inference with pretrained models, the repository offers detailed guidance for comprehensive synthesis and training, while still in development for multi-speaker training.
PortaSpeech
PortaSpeech delivers a PyTorch-based generative text-to-speech system known for its compact model size and flexibility. It allows exploration of audio samples and employs pretrained models for single and batch inference. Featuring TTS controllability and supporting datasets like LJSpeech, it is designed with concise preprocessing and training guidance. It integrates vocoder options via HiFi-GAN and MelGAN for quality synthesis, making it a versatile choice for developers interested in speech synthesis. Moreover, it accommodates custom datasets and enhances alignment configurations, all while providing real-time functionality exemplified by TensorBoard.
Feedback Email: [email protected]