Project Overview of Chinese-FastSpeech2
Chinese-FastSpeech2 is an enhanced version of the FastSpeech2 model, specifically adapted for Chinese speech synthesis. The project builds upon the FastSpeech2 model by introducing prosody representation and prediction modules, making the Chinese pronunciation more lively and rhythmic. The project's training utilized the Biaobei Standard Mandarin Female Voice dataset.
Updates as of April 2, 2023
- Inclusion of Prosody Training Code: The project now includes code for training prosody models, found under the BertProsody directory.
- Data Preprocessing Code for Prosody Training: A data preprocessing script tailored for the Biaobei dataset has been added, located at preprocessor/biaobei.py. Note that this script is currently unrefined but available for initial use.
Samples
The generated audio samples are available for reference to showcase the speech synthesis capabilities of the model.
Model Files
The main architecture of the project consists of FastSpeech2 combined with HifiGAN, enhanced by the inclusion of Chinese text prosody vectors at the input stage. As a result, the project comprises three models:
- fastspeech_model (File: 8000.pth.tar) → Place in
output/ckpt/biaobei/
- hifigan_model (File: generator_universal.pth.tar) → Place in
hifigan/
- prosody_model (File: best_model.pt) → Place in
transformer/prosody_model/
The models can be downloaded from this link with the extraction code: qgpi.
Prediction Methods
The project provides two methods for synthesizing speech:
-
Interactive Synthesis: By running
python synthesize_all.py
, users can input the text to be converted into speech via command line, which will generate a file namedtmp.wav
in the current working directory. -
API Call: Running
tts_server.py
will launch a text-to-speech interface, which can be accessed via HTTP API as demonstrated inTestServer.py
. The resulting audio file (tmp.wav
) is also stored in the current working directory.
Training Process
For those interested in custom training, the project references the detailed training methods from the FastSpeech2 project. Chinese-FastSpeech2 includes several optimizations to the base FastSpeech2 methods. For further insights on these improvements, one can refer to the blog: Optimization of Chinese Speech Synthesis based on FastSpeech2.
Chinese-FastSpeech2 is a personal endeavor aimed at exploring advancements in speech synthesis. The project welcomes feedback, critiques, and productive exchanges to continue improving its offerings.