#Voice Conversion

Logo of GPT-SoVITS
GPT-SoVITS
Discover a comprehensive platform for efficient voice conversion and multilingual text-to-speech powered by a user-friendly interface. Access zero-shot and few-shot speech synthesis across multiple languages, including English, Japanese, Korean, Cantonese, and Chinese, with built-in tools for dataset preparation and text labeling. Easily deployable through Colab, Docker, and direct downloads, ensuring support for Windows, Linux, and macOS environments. Achieve realistic and flexible voice results with GPT-SoVITS-WebUI.
Logo of awesome-speech-recognition-speech-synthesis-papers
awesome-speech-recognition-speech-synthesis-papers
This repository provides a curated collection of key research papers in speech recognition and synthesis, covering areas like Text-to-Audio, Automatic Speech Recognition (ASR), Speaker Verification, Voice Conversion (VC), and Speech Synthesis (TTS). It also delves into specialized topics including Language Modelling, Confidence Estimates, and Music Modelling. The compilation features foundational works and recent advancements, offering valuable insights for researchers and practitioners in the field of audio processing. This serves as an extensive knowledge base for understanding the evolution of techniques and applications influencing today's speech and audio processing developments.
Logo of YourTTS
YourTTS
YourTTS offers an advanced zero-shot approach for TTS and voice conversion, built on VITS, supporting multiple speakers and languages. Ideal for low-resource languages, it enables voice synthesis with minimal input. Recent fixes enhance training accuracy. Explore audio samples and demos for a comprehensive understanding.
Logo of voice-changer
voice-changer
This open-source project facilitates real-time voice conversion using advanced AI technologies on platforms including Windows, Mac, Linux, and Google Colab. Features such as Beatrice v2 and crossfade adjustment enhance functionality, while network load offloading ensures efficiency in demanding applications. Users have the flexibility to use pre-built binaries or set up environments with Docker or Anaconda, optimizing performance for AI models like MMVC and RVC.
Logo of CosyVoice
CosyVoice
Explore CosyVoice, an AI model for voice processing with multilingual capabilities. Features include repetition-aware sampling for stability, streaming inference, and voice conversion across languages. It supports zero-shot, SFT, and instruct mode inferences. Pre-trained models enable advanced text-to-speech and voice manipulation for both experts and novices. Potential future additions include music generation and wider multilingual data support.
Logo of DragonianVoice
DragonianVoice
A comprehensive ONNX-based framework supporting TTS, SVC, and SVS using models such as Tacotron2, Vits, and DiffSinger. It allows for development in C/Cpp/C# and integrates with fish-speech via the ggml framework. Specialized branches like MoeVoiceStudio are available, enhancing voice conversion and synthesis capabilities. The project emphasizes CUDA compatibility through ONNXRuntime for efficient model deployment. Operating offline and requiring ONNX model conversion, it ensures user privacy by avoiding data collection.