#voice conversion
Applio
Applio is a user-friendly voice conversion tool for artists, developers, and researchers, delivering high performance and quality transformations. It supports extensive customization options through plugins and is compatible with Windows, Linux, and macOS. The tool adheres to the MIT license for commercial use, encouraging ethical practices. Users can contribute to its development and explore in-depth features such as TensorBoard monitoring. Comprehensive documentation and community support are available on Discord.
Diff-HierVC
Diff-HierVC is an advanced voice conversion system utilizing diffusion models to enhance pitch accuracy and speaker adaptation. Featuring DiffPitch and DiffVoice components, it achieves precise F0 generation and effective voice style transfer. The system incorporates a source-filter encoder and a data-driven Mel-spectrogram prior to boost conversion quality. In zero-shot adaptation scenarios, it delivers a 0.83% CER and 3.29% EER, offering versatile solutions for voice conversion challenges across diverse datasets.
vits-simple-api
The VITS API provides text-to-speech and voice conversion solutions with features like automatic language recognition, multi-model support, and GPU acceleration. It includes advanced models such as HuBert-VITS and Bert-VITS2, and supports convenient deployment through Docker or virtual environments. The WebUI interface facilitates management and the API supports SSML and customizable defaults, making it suitable for scalable applications.
HierSpeechpp
HierSpeech++ employs hierarchical variational inference to advance zero-shot speech synthesis, enhancing robustness and expressiveness. It efficiently bridges semantic and acoustic gaps, significantly boosting naturalness and speaker similarity in TTS and voice conversion. This project includes a text-to-vec framework and a high-efficiency super-resolution process, enhancing audio from 16kHz to 48kHz. Built on PyTorch, it offers pre-trained models for further exploration, outperforming LLM-based and diffusion models in human-level quality synthesis.
Feedback Email: [email protected]