Retrieval-based-Voice-Conversion - User-Friendly Framework for Seamless Voice Conversion

Project Introduction: Retrieval-based Voice Conversion

The Retrieval-based Voice Conversion project is a straightforward and efficient framework designed for voice conversion. It is built on the VITS platform and offers users an accessible way to manipulate and transform audio recordings to suit various needs. This innovative project is currently under development and serves as a library and API that users can integrate into their projects.

Installation and Usage

Standard Setup

To use this framework, first initialize your project by creating a dedicated directory. Within this directory, you'll store two important folders:

assets folder: This will contain all the models necessary for both inference and training.
result folder: This will house the results produced from the training process.

Initiate this setup by running:

rvc init

This command creates the essential assets folder and a .env file in your working directory. It's important to ensure the directory is either empty or does not already contain an assets folder.

Custom Setup

For users who have already downloaded models or wish to alter their configurations, modification of the .env file is necessary. If this file doesn't exist, create it with:

rvc env create

Once ready, you can proceed to download a model using:

rvc dlmodel

or specify a download directory:

rvc dlmodel {download_dir}

Finally, update the .env file with the model location, and you’re all set for voice conversion tasks!

Library Usage

To perform audio inference through the library, here’s a basic Python script:

from pathlib import Path
from dotenv import load_dotenv
from scipy.io import wavfile
from rvc.modules.vc.modules import VC

def main():
      vc = VC()
      vc.get_vc("{model.pth}")
      tgt_sr, audio_opt, times, _ = vc.vc_inference(
            1, Path("{InputAudio}")
      )
      wavfile.write("{OutputAudio}", tgt_sr, audio_opt)

if __name__ == "__main__":
      load_dotenv("{envPath}")
      main()

Command-Line Interface (CLI) Usage

The system allows for command-line audio inference with the following command:

rvc infer -m {model.pth} -i {input.wav} -o {output.wav}

Options

-m: Model path (required)
-i: Input audio path (required)
-o: Output audio path (required)
Additional options allow for customization of speaker/singer ID, pitch correction, filtering, and more.

API Usage

To run the API server, initiate with:

rvc-api

or using Poetry:

poetry run poe rvc-api

Perform audio inference as a blob with:

curl -X 'POST' \
      'http://127.0.0.1:8000/inference?res_type=blob' \
      -H 'accept: application/json' \
      -H 'Content-Type: multipart/form-data' \
      -F 'modelpath={model.pth}' \
      -F 'input={input audio path}'

Or receive the result in JSON, including timing details:

curl -X 'POST' \
      'http://127.0.0.1:8000/inference?res_type=json' \
      -H 'accept: application/json' \
      -H 'Content-Type: multipart/form-data' \
      -F 'modelpath={model.pth}' \
      -F 'input={input audio path}'

Docker Usage

Easily set up and run using Docker scripts:

./docker-run.sh

Alternatively, manually build and run with the following commands:

Build the Docker image:
```
docker build -t "rvc" .
```

Execute the container:

docker run -it \
  -p 8000:8000 \
  -v "${PWD}/assets/weights:/weights:ro" \
  -v "${PWD}/assets/indices:/indices:ro" \
  -v "${PWD}/assets/audios:/audios:ro" \
  "rvc"

This assumes your model weights, feature indices, and input audio files are stored in the directory path current-directory/assets.

The Retrieval-based Voice Conversion project stands as a robust tool for users looking to explore voice alteration and magical sound transformations, thanks to its versatility and ease of use.