#Voice Cloning

Logo of fish-speech
fish-speech
Explore a robust text-to-speech system offering zero-shot and few-shot functionalities across languages like English, Japanese, and Chinese. The platform supports fast processing with a real-time factor of 1:5 on an Nvidia RTX 4060 and maintains low character and word error rates. Features include a Gradio-based web UI and a PyQt6 interface for easy cross-platform deployment on Windows, Linux, and macOS, enhanced by fish-tech acceleration.
Logo of xtts2-ui
xtts2-ui
The XTTS-2-UI project provides a straightforward interface for cloning voices in 16 languages using text and a brief audio sample. The model tts_models/multilingual/multi-dataset/xtts_v2 is automatically downloaded when first used, aiding in seamless voice cloning experiments. It supports both voice recording and uploading with a few setup steps. The application can operate via terminal or Streamlit, requiring agreement to the terms of service initially.