voice-builder - Simplified Tool for Creating Text-to-Speech Voices in Collaborative Research Environments

Voice Builder Project Introduction

Voice Builder is an open-source tool designed to empower users to create text-to-speech (TTS) voices with ease and flexibility. While it is not an official Google product, it stands out as a handy resource for anyone interested in TTS voice building, focusing on providing a simple, collaborative environment for experimenting with voice training.

Purpose and Objectives

The primary aim of Voice Builder is to reduce the barriers for creating new TTS voices and to foster TTS research. By offering an accessible platform for conducting voice experiments, it encourages interdisciplinary collaboration and caters especially to low-resourced languages, where experimentation is crucial to optimize limited data.

Features of Voice Builder

Voice Builder is engineered to offer simplicity and flexibility through various features:

Ease of Use: With only basic computer skills required, users can run voice training experiments and listen to synthesized voices.
Flexible Voice Creation: It supports the creation of custom TTS voices by experimenting with different configurations and data sets.
Collaboration Encouragement: By simplifying the setup and experimental phases, it fosters collaboration among researchers, linguists, and developers.

Installing and Setting Up Voice Builder

Prerequisites

Before deploying Voice Builder, ensure the following:

Google Cloud Platform (GCP) Setup: Create and set up a project with billing enabled on GCP.
Software Requirements: Install essential software including Docker, Node.js, and Firebase command line tools.
Service Activation: Enable necessary GCP services such as Appengine API, Firebase Cloud Function, and Genomics Pipeline API.
Optional Custom Data Exporter: If desired, set up a custom data exporter for advanced input data manipulation.

Deployment Steps

With prerequisites in place, here's how you can deploy Voice Builder:

Clone the Voice Builder repository.
Authenticate with Google Cloud and Firebase.
Modify relevant configuration settings with your project and service account details.
Execute deployment scripts to set up cloud functions and user interface components.

Creating a TTS Voice Example

Once Voice Builder is deployed, users can start creating a TTS voice by using the platform's user interface. With built-in engines like Festival and Merlin, you can generate voices from example data sets. It involves straightforward steps such as accessing the UI, utilizing the 'Create Voice' feature, and following the guided process to create and test a new TTS voice.

Using a Custom Data Exporter (Optional)

Voice Builder allows for an optional component, the Data Exporter, which lets users transform input files as needed before processing them through TTS algorithms. This feature is useful for adjusting file formats or filtering data. Setting it up involves modifying scripts to give proper access to data resources and updating configuration files to point to your data exporter's services.

Voice Builder Specification

When building a voice, Voice Builder generates a JSON configuration detailing the specifications. It includes various attributes such as voice ID, lexicon path, sample rate, chosen TTS engine, and additional engine parameters. This data is sent to the TTS engine and, if in use, the Data Exporter, ensuring a smooth process from data input to voice output.

Additional Resources

For those interested in exploring Voice Builder further or understanding its phonological capabilities, additional resources are available, including the JSON Phonology documentation.

By providing an accessible and flexible platform, Voice Builder plays a vital role in advancing TTS research, making sophisticated voice-building experiments feasible for a broader audience.