glados-tts - Advanced Neural Network TTS Engine Supporting Multiple Speakers

GLaDOS Text-to-Speech (TTS) Voice Generator

The GLaDOS Text-to-Speech (TTS) Voice Generator is a sophisticated TTS engine that utilizes neural networking to generate speech. It is designed to convert text into natural-sounding spoken words, making it a versatile tool for different applications. Below is an overview of its key features and functionalities.

Playing Around with GLaDOS TTS

For those interested in exploring the TTS capabilities, the generator can be operated as a standalone program. Users can easily execute the following command to start generating speech:

python3 glados-tts/glados.py

Remote Use

The GLaDOS TTS Engine can also function remotely, particularly useful if a more powerful machine is required, such as when the Raspberry Pi's capabilities might be insufficient. To set this up, the remote execution command should be run from the glados-tts directory:

python3 engine-remote.py

By default, the TTS engine operates on port 8124. To integrate it with the Glados Voice Assistant, ensure the following line in the settings.env variable is updated to reflect the correct API endpoint:

TTS_ENGINE_API = http://192.168.1.3:8124/synthesize/

Training New Models

The training process for new models leverages the Tacotron and ForwardTacotron frameworks. These models are designed as multispeaker models and trained using two distinct datasets. One is the extensive LJSpeech dataset consisting of 13,100 lines. The other is a customized version of the Ellen McClain dataset, which is refined into two categories: Portal 1 and Portal 2 voices, each with added punctuation and corrections. Specifically, the dataset captures lines from the conclusion of Portal 1 to the onset of Portal 2.

Insights into Old Model Training

The first iteration of the TTS model was constructed using a regular Tacotron model. Initially, the training relied entirely on the LJSpeech dataset before adopting a modified Ellen McClain dataset. This adaptation involved eliminating all non-Portal 2 voices and refining the data with punctuation.

Distinctly, the Forward Tacotron model received training based on roughly 600 voice lines. Additionally, the HiFiGAN model benefited from transfer learning, representing an enhancement utilizing sample data. Subsequently, all models underwent processes of optimization and quantization for improved performance and efficiency.

Installation Instructions

To install the GLaDOS TTS Engine on a machine, the following steps are recommended:

Begin by downloading the model files from Google Drive and extracting them into the repository folder.
Proceed to install the necessary Python packages. This can be accomplished by executing:
```
pip install -r requirements.txt
```

These steps will prepare the environment for utilizing the TTS capabilities of the GLaDOS Voice Generator, enabling users to create high-quality speech synthesis for varied applications.