FCH-TTS Project Overview
The FCH-TTS project is focused on developing a parallel text-to-speech (TTS) synthesis system. This project aims at creating high-quality, natural-sounding speech from textual input, utilizing modern machine learning techniques with a focus on efficiency and flexibility. Here's an overview of the key components and advancements within the FCH-TTS project.
Recent Progress
The project has seen several important updates and developments:
- Wavegan Integration (2021/04/20): The wavegan branch was successfully merged into the main branch, thereby consolidating functionalities and improving system coherence.
- Encoder Branch (2021/04/13): A dedicated branch was created for developing the speech style transfer module, allowing for more personalized and expressive speech synthesis.
- SoftDTW Support (2021/04/13): The softdtw branch enables the use of SoftDTW loss for training models, offering more robust alignment during synthesis.
- Wavegan Vocode Options (2021/04/09): Although now deleted, the wavegan branch had provided support for several vocoding techniques, including PWG, MelGAN, and Multi-band MelGAN.
- Parallel Text-to-Mel and MelGAN (2021/04/05): Support was added for using ParallelText2Mel combined with MelGAN vocoding, enhancing the synthesis quality.
Project Structure
The FCH-TTS project employs a structured repository for seamless development and implementation:
.
|--- config/ # Configuration files
|--- datasets/ # Data processing scripts
|--- encoder/ # Voice encoder scripts
|--- helpers/ # Utility scripts
|--- logdir/ # Directory for saving training logs
|--- losses/ # Loss function implementations
|--- models/ # Synthesis models
|--- pretrained/ # Pretrained models (e.g., LJSpeech dataset)
|--- samples/ # Sample synthesized outputs
|--- utils/ # Common utilities
|--- vocoder/ # Vocoder scripts
|--- wandb/ # Wandb logs
|--- scripts for dataset preparation, synthesis, training, etc.
Synthesis Examples
Various samples of synthesized speech are available to showcase the capabilities and quality of the FCH-TTS project.
Pretrained Models
The project offers pretrained models that facilitate quick deployment and testing, particularly useful for researchers and developers looking to experiment with synthesized speech without starting from scratch.
Quick Start Guide
-
Clone the Repository:
$ git clone https://github.com/atomicoo/ParallelTTS.git
-
Install Dependencies:
$ conda create -n ParallelTTS python=3.7.9 $ conda activate ParallelTTS $ pip install -r requirements.txt
-
Synthesize Speech:
$ python synthesize.py \ --checkpoint ./pretrained/ljspeech-parallel-epoch0100.pth \ --melgan_checkpoint ./pretrained/ljspeech-melgan-epoch3200.pth \ --input_texts ./samples/english/synthesize.txt \ --outputs_dir ./outputs/
Note: Use
--config
to specify configuration files for different languages.
Training the System
-
Prepare the Data:
$ python prepare-dataset.py
-
Train Alignment Model:
$ python train-duration.py
-
Extract Duration:
$ python extract-duration.py
-
Train Synthesis Model:
$ python train-parallel.py
The complete process involves using well-prepared datasets, such as LJSpeech, LibriSpeech, JSUT, and others, to effectively train and fine-tune the TTS models.
Performance and Challenges
- The project identifies certain synthesis speed benchmarks across various hardware configurations to help gauge performance efficiency.
- Known issues include compatibility concerns with the vocoder, text input considerations for Mandarin, and cross-language model application challenges.
Further Information
The project is built on a range of influences and methodologies from related projects and references, including tacotron implementations, deep voice systems, and others in the TTS domain. Development is ongoing, with plans to enhance speech quality evaluation, expand multi-language support, and refine voice style transfer capabilities. Collaboration and inquiries are encouraged via project contact details.
For more detailed technical information and updates, one can explore the project repository or reach out directly to the development team at [email protected].