TensorVox - Explore Efficient Neural Speech Synthesis in Desktop Applications Using TensorFlow and PyTorch

Introduction to TensorVox

Overview

TensorVox is an innovative desktop application designed to simplify neural speech synthesis technology for greater accessibility. Its primary objective is to offer a user-friendly and resource-efficient experience, making advanced speech synthesis more approachable for everyone.

Technical Foundation

TensorVox harnesses the power of several cutting-edge technologies. It primarily relies on TensorFlowTTS, with additional support from Coqui-TTS and VITS. The application is crafted using pure C++ and the Qt framework, allowing for lightweight operation. By utilizing the Tensorflow C API for TensorFlow models and LibTorch for PyTorch models, TensorVox performs inference efficiently without the need for extensive Python library installations, focusing instead on a few essential DLLs.

TensorVox Interface

Getting Started

To begin using TensorVox, users can:

Access a comprehensive guide in Google Docs.
Obtain a copy from the available releases, extract the .zip file, and consult the Google Drive folder for models and installation instructions.

For those wishing to utilize personal models, training and exporting the models is a necessary step.

Supported Architectures

TensorVox supports various model architectures across different repositories:

TensorFlowTTS: Includes FastSpeech2, Tacotron2 (both character and phoneme-based), and Multi-Band MelGAN. A Colab notebook is available to demonstrate exporting a char-based Tacotron2 model: Colab Link.
Coqui-TTS: Supports Tacotron2 (phoneme-based IPA) and Multi-Band MelGAN after converting from PyTorch to Tensorflow. An exporting notebook for the LJSpeech DDC model can be found here: Colab Link.
jaywalnut310/VITS: A fully end-to-end model using stressed IPA as phonemes. Export it using this notebook: Colab Link.

Out-of-the-box language support includes English, German, and Spanish, with additional languages possible through IPA, ARPA, or GlobalPhone phoneme sets, enhancing global usability.

Build Instructions

Currently, TensorVox supports Windows 10 x64, with reports of it working on Windows 8.1. Essential requirements include Qt Creator and the MSVC 2017 (v141) compiler. A primed build involves:

Downloading necessary binary dependencies and includes.
Positioning the unzipped deps folder alongside the .pro and main source files.
Opening the project in Qt Creator to compile with the appropriate tools.

Acknowledgments and External Tools

TensorVox incorporates various external tools and libraries, including LibTorch, Tensorflow C API, CppFlow, AudioFile, and several others that enhance its functionality. Special thanks are extended to these projects for their invaluable contributions.

Contact and Support

For questions and discussions, users are encouraged to join the Discord server. Formal inquiries can be directed to the provided email: [email protected].

Licensing

While TensorVox itself is MIT licensed, external models used within the application are subject to their own licensing terms. Users, especially those in specific regions like Vietnam, must review these terms at the respective TensorFlowTTS licensing page.