Project Introduction: Retrieval-based Voice Conversion
The Retrieval-based Voice Conversion project is a straightforward and efficient framework designed for voice conversion. It is built on the VITS platform and offers users an accessible way to manipulate and transform audio recordings to suit various needs. This innovative project is currently under development and serves as a library and API that users can integrate into their projects.
Installation and Usage
Standard Setup
To use this framework, first initialize your project by creating a dedicated directory. Within this directory, you'll store two important folders:
assets
folder: This will contain all the models necessary for both inference and training.result
folder: This will house the results produced from the training process.
Initiate this setup by running:
rvc init
This command creates the essential assets
folder and a .env
file in your working directory. It's important to ensure the directory is either empty or does not already contain an assets folder.
Custom Setup
For users who have already downloaded models or wish to alter their configurations, modification of the .env
file is necessary. If this file doesn't exist, create it with:
rvc env create
Once ready, you can proceed to download a model using:
rvc dlmodel
or specify a download directory:
rvc dlmodel {download_dir}
Finally, update the .env
file with the model location, and you’re all set for voice conversion tasks!
Library Usage
To perform audio inference through the library, here’s a basic Python script:
from pathlib import Path
from dotenv import load_dotenv
from scipy.io import wavfile
from rvc.modules.vc.modules import VC
def main():
vc = VC()
vc.get_vc("{model.pth}")
tgt_sr, audio_opt, times, _ = vc.vc_inference(
1, Path("{InputAudio}")
)
wavfile.write("{OutputAudio}", tgt_sr, audio_opt)
if __name__ == "__main__":
load_dotenv("{envPath}")
main()
Command-Line Interface (CLI) Usage
The system allows for command-line audio inference with the following command:
rvc infer -m {model.pth} -i {input.wav} -o {output.wav}
Options
-m
: Model path (required)-i
: Input audio path (required)-o
: Output audio path (required)- Additional options allow for customization of speaker/singer ID, pitch correction, filtering, and more.
API Usage
To run the API server, initiate with:
rvc-api
or using Poetry:
poetry run poe rvc-api
Perform audio inference as a blob with:
curl -X 'POST' \
'http://127.0.0.1:8000/inference?res_type=blob' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'modelpath={model.pth}' \
-F 'input={input audio path}'
Or receive the result in JSON, including timing details:
curl -X 'POST' \
'http://127.0.0.1:8000/inference?res_type=json' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'modelpath={model.pth}' \
-F 'input={input audio path}'
Docker Usage
Easily set up and run using Docker scripts:
./docker-run.sh
Alternatively, manually build and run with the following commands:
-
Build the Docker image:
docker build -t "rvc" .
-
Execute the container:
docker run -it \ -p 8000:8000 \ -v "${PWD}/assets/weights:/weights:ro" \ -v "${PWD}/assets/indices:/indices:ro" \ -v "${PWD}/assets/audios:/audios:ro" \ "rvc"
This assumes your model weights, feature indices, and input audio files are stored in the directory path current-directory/assets
.
The Retrieval-based Voice Conversion project stands as a robust tool for users looking to explore voice alteration and magical sound transformations, thanks to its versatility and ease of use.