Introduction to WeTTS
WeTTS is a state-of-the-art Text-to-Speech (TTS) toolkit that emphasizes being production-ready and suitable for end-to-end processing. This innovative tool is designed to make TTS implementations efficient for various production environments, ensuring an easy transition from development to deployment.
Installation
Installing the Python Package
To install the WeTTS package, users can easily do so through the command line with the following command:
pip install git+https://github.com/wenet-e2e/wetts.git
Once installed, users can utilize WeTTS via command line:
wetts --text "今天天气怎么样" --wav output.wav
For developers who wish to incorporate WeTTS into their Python code, the package is prepared for easy integration:
import wetts
# TODO
For Development & Deployment
Developers looking to dive deeper into the codebase or deploy WeTTS can clone the repository and set up a dedicated environment using Anaconda or Miniconda:
git clone https://github.com/wenet-e2e/wetts.git
conda create -n wetts python=3.8 -y
conda activate wetts
pip install -r requirements.txt
Roadmap
WeTTS emphasizes end-to-end production capabilities with the aim to function efficiently on devices. Key focuses include:
- Backend: Utilizing advanced end-to-end models such as VITS.
- Frontend: Employing sophisticated text normalization and prosody models to enhance performance, including components like:
Dataset
WeTTS plans to leverage various open-source TTS datasets, supporting a diverse array of languages and dialects:
- Baker: A comprehensive Chinese Standard Mandarin Speech corpus.
- AISHELL-3: Notable for its large-scale multi-speaker capabilities in high fidelity.
- Opencpop: Lends itself well to Mandarin singing voice synthesis.
Pretrained Models
Pretrained models are an essential part of WeTTS, providing users with a foundation to build upon. Key models include ones for the Baker dataset and multilingual applications, with specialized BERT and VITS models available.
Runtime
WeTTS supports numerous hardware configurations, ensuring flexibility for deployment in various environments such as:
- x86 architecture
- Android devices
- Raspberry Pi and other on-device platforms
Users can execute commands to configure and run the TTS system as follows:
export GLOG_logtostderr=1
export GLOG_v=2
cd runtime/onnxruntime
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build
./build/bin/tts_main \
--frontend_flags baker_bert_onnx/frontend.flags \
--vits_flags multilingual_vits_v3_onnx/vits.flags \
--sname baker \
--text "hello我是小明。" \
--wav_path audio.wav
Discussion & Communication
WeTTS offers a strong emphasis on community engagement and support. Users can join discussions and find support through:
- WeChat groups targeted at Chinese users for a collaborative environment.
- Github Issues platform for resolving queries and contributing to the project.
Acknowledgement
WeTTS acknowledges the contributions of other open-source projects, notably borrowing from VITS for its model implementation and referencing PaddleSpeech for lexicon generation.
In conclusion, WeTTS presents itself as a comprehensive, production-ready TTS toolkit that fosters a collaborative open-source community while supporting diverse usage scenarios across different platforms and languages.