Chinese-FastSpeech2
The project uses an improved FastSpeech2 model for Chinese speech synthesis, focusing on vibrant and rhythmic pronunciation. It includes prosody representation and prediction enhancements. Recent updates feature prosody model training code and data preprocessing for Biaobei data. The architecture integrates FastSpeech2 and HifiGAN, utilizing a prosody vector to form three models: fastspeech_model, hifigan_model, and prosody_model. It supports both command-line and API-based text-to-speech predictions and welcomes community input and feedback.