wetts - Comprehensive End-to-End Text-to-Speech Toolkit for Diverse Platforms

Introduction to WeTTS

WeTTS is a state-of-the-art Text-to-Speech (TTS) toolkit that emphasizes being production-ready and suitable for end-to-end processing. This innovative tool is designed to make TTS implementations efficient for various production environments, ensuring an easy transition from development to deployment.

Installation

Installing the Python Package

To install the WeTTS package, users can easily do so through the command line with the following command:

pip install git+https://github.com/wenet-e2e/wetts.git

Once installed, users can utilize WeTTS via command line:

wetts --text "今天天气怎么样" --wav output.wav

For developers who wish to incorporate WeTTS into their Python code, the package is prepared for easy integration:

import wetts

# TODO

For Development & Deployment

Developers looking to dive deeper into the codebase or deploy WeTTS can clone the repository and set up a dedicated environment using Anaconda or Miniconda:

git clone https://github.com/wenet-e2e/wetts.git
conda create -n wetts python=3.8 -y
conda activate wetts
pip install -r requirements.txt

Roadmap

WeTTS emphasizes end-to-end production capabilities with the aim to function efficiently on devices. Key focuses include:

Backend: Utilizing advanced end-to-end models such as VITS.
Frontend: Employing sophisticated text normalization and prosody models to enhance performance, including components like:
- WeTextProcessing
- Unified Mandarin TTS Front-end Based on Distilled BERT Model

Dataset

WeTTS plans to leverage various open-source TTS datasets, supporting a diverse array of languages and dialects:

Baker: A comprehensive Chinese Standard Mandarin Speech corpus.
AISHELL-3: Notable for its large-scale multi-speaker capabilities in high fidelity.
Opencpop: Lends itself well to Mandarin singing voice synthesis.

Pretrained Models

Pretrained models are an essential part of WeTTS, providing users with a foundation to build upon. Key models include ones for the Baker dataset and multilingual applications, with specialized BERT and VITS models available.

Runtime

WeTTS supports numerous hardware configurations, ensuring flexibility for deployment in various environments such as:

x86 architecture
Android devices
Raspberry Pi and other on-device platforms

Users can execute commands to configure and run the TTS system as follows:

export GLOG_logtostderr=1
export GLOG_v=2

cd runtime/onnxruntime
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build
./build/bin/tts_main \
  --frontend_flags baker_bert_onnx/frontend.flags \
  --vits_flags multilingual_vits_v3_onnx/vits.flags \
  --sname baker \
  --text "hello我是小明。" \
  --wav_path audio.wav

Discussion & Communication

WeTTS offers a strong emphasis on community engagement and support. Users can join discussions and find support through:

WeChat groups targeted at Chinese users for a collaborative environment.
Github Issues platform for resolving queries and contributing to the project.

Acknowledgement

WeTTS acknowledges the contributions of other open-source projects, notably borrowing from VITS for its model implementation and referencing PaddleSpeech for lexicon generation.

In conclusion, WeTTS presents itself as a comprehensive, production-ready TTS toolkit that fosters a collaborative open-source community while supporting diverse usage scenarios across different platforms and languages.