#VITS

Logo of vits
vits
Discover an innovative end-to-end TTS method that improves upon traditional two-stage systems using variational inference and adversarial learning. This approach enhances generative capabilities, resulting in natural-sounding speech. A stochastic duration predictor supports varied speech rhythms and tones from text. Human evaluations on the LJ Speech dataset demonstrate its superior performance, achieving MOS scores close to real human speech. Access the interactive demo for audio examples or explore available pretrained models.
Logo of wetts
wetts
WeTTS provides a comprehensive end-to-end text-to-speech toolkit designed for robust production use. It leverages advanced models like VITS and integrates WeTextProcessing for effective text normalization and prosody control. Supporting multiple open-source datasets such as Baker and AISHELL-3, WeTTS is compatible with a wide range of hardware including x86 and Android, offering developers a reliable solution for developing high-quality TTS applications.
Logo of piper
piper
Piper is designed to provide high-quality neural text-to-speech output with significant optimizations for the Raspberry Pi 4. It supports numerous languages such as English, Chinese, Arabic, and Spanish. Its adaptability includes integration into systems like Home Assistant and NVDA, with support for running on various platforms using Python scripts or C++ sources.
Logo of Retrieval-based-Voice-Conversion-WebUI
Retrieval-based-Voice-Conversion-WebUI
The project provides a user-friendly voice conversion framework utilizing the VITS model, ensuring high-quality outputs even on lower-end GPUs. It addresses timbre leakage with top-1 feature replacement and enhances vocal pitch accuracy using the RMVPE algorithm. Suitable for quick training with minimal data, it efficiently supports voice conversion and model fusion. Additionally, it facilitates low-latency real-time processing compatible with ASIO hardware for precise voice modifications.
Logo of vits_chinese
vits_chinese
Discover a cutting-edge TTS project that combines BERT and VITS to improve prosody and sound quality. The project uses Microsoft's natural speech features to create natural pauses and reduce sound errors through innovative loss techniques. Module-wise distillation is employed to speed up processing, resulting in high-quality audio outputs perfect for experimentation and research. Please note, this project is not intended for direct production use but serves as a valuable tool for TTS technological exploration.
Logo of sound_dataset_tools2
sound_dataset_tools2
The tool enables fast creation of voice datasets, seamlessly exporting training data for VITS and related projects. Featuring a user-friendly GUI, the tool supports both audio and subtitle-based imports and offers automatic audio segmentation with clipping prevention. Users have control over audio configurations and can perform evaluations to select quality data. Operable through compiled executables or from source code, the tool employs an SQLite and PySide6-based structure, promising versatile and efficient data handling.
Logo of whisper-vits-svc
whisper-vits-svc
This project offers an end-to-end method for converting singing voices through variational inference and adversarial learning, leveraging the VITS model. Designed for deep learning beginners, it focuses on hands-on practice with essential Python and PyTorch knowledge. It supports training with multiple speakers, creating distinctive voices by mixing, and handling light accompaniment. While needing at least 6GB VRAM for training, it delivers strong performance with features like noise immunity and advanced sound quality enhancement. Real-time voice conversion is not supported, but the project provides comprehensive instruction for training and inference, aiding learners in optimizing model operations.
Logo of Retrieval-based-Voice-Conversion
Retrieval-based-Voice-Conversion
The framework utilizes VITS for efficient voice conversion, offering library, API, and CLI support. It includes versatile setup options and features like audio inference processing and model management. Suitable for seamless integration and deployment via Docker or scripts, enhancing voice-related applications.