Introduction to CosyVoice_For_Windows
CosyVoice_For_Windows is an advanced speech synthesis toolkit that provides users with high-quality and versatile text-to-speech (TTS) capabilities. Designed for Windows, this toolkit enables users to convert text into lifelike speech using cutting-edge artificial intelligence models. CosyVoice is ideal for researchers, developers, and enthusiasts aiming to explore the capabilities of synthetic audio or integrate TTS technology into their applications.
Setup Requirements
To experience the optimal performance with CosyVoice, it’s essential to have Python 3.11 installed due to performance enhancements in this version. Additionally, ensure that CUDA 12.6 and cuDNN 9.4 are set up for faster model inference, particularly if working with NVIDIA GPUs.
With these prerequisites:
- Install the project dependencies:
pip3 install -r requirements.txt pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
Running the Service
CosyVoice can be run as a local service using Python, providing endpoints for text conversion to speech, subtitle generation, and audio output.
- To launch the API service, execute the following:
python3 api.py
- Access the output through:
- API URL:
http://localhost:9880/?text=YourTextHere&speaker=SpeakerName
- Subtitle File:
http://localhost:9880/file/output.srt
- Audio File:
http://localhost:9880/file/output.wav
- API URL:
Installation Instructions
Clone the Repository
First, clone the CosyVoice_For_Windows repository and its submodules to your local machine:
git clone --recursive https://github.com/v3ucn/CosyVoice_For_Windows.git
cd CosyVoice_For_Windows
git submodule update --init --recursive
Set up a Conda environment specifically for CosyVoice:
- Install Conda from the Miniconda website.
- Create and activate the Conda environment:
conda create -n cosyvoice python=3.11 conda activate cosyvoice pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
Model Download
To fully utilize CosyVoice, download the pretrained models:
-
Use ModelScope SDK for downloading:
from modelscope import snapshot_download snapshot_download('iic/CosyVoice-300M', local_dir='pretrained_models/CosyVoice-300M') snapshot_download('iic/CosyVoice-300M-SFT', local_dir='pretrained_models/CosyVoice-300M-SFT') snapshot_download('iic/CosyVoice-300M-Instruct', local_dir='pretrained_models/CosyVoice-300M-Instruct') snapshot_download('speech_tts/speech_kantts_ttsfrd', local_dir='pretrained_models/speech_kantts_ttsfrd')
-
Alternatively, use Git for model downloading, ensuring Git LFS is installed:
mkdir -p pretrained_models git clone https://www.modelscope.cn/iic/CosyVoice-300M.git pretrained_models/CosyVoice-300M
Usage Examples
CosyVoice offers several ways to perform speech synthesis:
- Zero-Shot Inference: Generates speech without needing a language-specific model.
- SFT Inference: Utilizes fine-tuned models for more tailored speech synthesis.
- Cross-Lingual Inference: Capable of synthesizing speech across different languages.
- Instruct Inference: Customizes speech output with specific instructions in dialogue.
All these use cases can be explored and executed by integrating CosyVoice with Python scripts, leveraging the torchaudio
library for output audio processing.
Web and Advanced Usage
CosyVoice includes a web interface for easy access to its functionalities. It supports all model types like SFT, zero-shot, cross-lingual, and instruct inference. To start the web UI:
python3 webui.py --port 9886 --model_dir ./pretrained_models/CosyVoice-300M
For those interested in customizing or deploying CosyVoice as a service, advanced training and inference scripts are available. Additionally, grpc can be optionally integrated for deploying models as scalable services.
Support and Acknowledgments
CosyVoice’s development incorporates significant contributions from multiple open-source projects like FunASR, FunCodec, and others, ensuring that users have a robust toolkit for exploring speech synthesis.
Join discussions or seek assistance through the GitHub Issues page or the official chat group for community support.