Introduction to TensorVox
Overview
TensorVox is an innovative desktop application designed to simplify neural speech synthesis technology for greater accessibility. Its primary objective is to offer a user-friendly and resource-efficient experience, making advanced speech synthesis more approachable for everyone.
Technical Foundation
TensorVox harnesses the power of several cutting-edge technologies. It primarily relies on TensorFlowTTS, with additional support from Coqui-TTS and VITS. The application is crafted using pure C++ and the Qt framework, allowing for lightweight operation. By utilizing the Tensorflow C API for TensorFlow models and LibTorch for PyTorch models, TensorVox performs inference efficiently without the need for extensive Python library installations, focusing instead on a few essential DLLs.
Getting Started
To begin using TensorVox, users can:
- Access a comprehensive guide in Google Docs.
- Obtain a copy from the available releases, extract the .zip file, and consult the Google Drive folder for models and installation instructions.
For those wishing to utilize personal models, training and exporting the models is a necessary step.
Supported Architectures
TensorVox supports various model architectures across different repositories:
-
TensorFlowTTS: Includes FastSpeech2, Tacotron2 (both character and phoneme-based), and Multi-Band MelGAN. A Colab notebook is available to demonstrate exporting a char-based Tacotron2 model: Colab Link.
-
Coqui-TTS: Supports Tacotron2 (phoneme-based IPA) and Multi-Band MelGAN after converting from PyTorch to Tensorflow. An exporting notebook for the LJSpeech DDC model can be found here: Colab Link.
-
jaywalnut310/VITS: A fully end-to-end model using stressed IPA as phonemes. Export it using this notebook: Colab Link.
Out-of-the-box language support includes English, German, and Spanish, with additional languages possible through IPA, ARPA, or GlobalPhone phoneme sets, enhancing global usability.
Build Instructions
Currently, TensorVox supports Windows 10 x64, with reports of it working on Windows 8.1. Essential requirements include Qt Creator and the MSVC 2017 (v141) compiler. A primed build involves:
- Downloading necessary binary dependencies and includes.
- Positioning the unzipped
deps
folder alongside the .pro and main source files. - Opening the project in Qt Creator to compile with the appropriate tools.
Acknowledgments and External Tools
TensorVox incorporates various external tools and libraries, including LibTorch, Tensorflow C API, CppFlow, AudioFile, and several others that enhance its functionality. Special thanks are extended to these projects for their invaluable contributions.
Contact and Support
For questions and discussions, users are encouraged to join the Discord server. Formal inquiries can be directed to the provided email: [email protected].
Licensing
While TensorVox itself is MIT licensed, external models used within the application are subject to their own licensing terms. Users, especially those in specific regions like Vietnam, must review these terms at the respective TensorFlowTTS licensing page.