vocal-remover - Accurately Described Tool to Enhance Audio by Extracting Instrumentals Using Deep Learning

Vocal Remover Project Introduction

Overview

Vocal Remover is a deep-learning-based tool designed to separate vocals from the instrumental tracks of your favorite songs. This open-source software harnesses advanced machine learning techniques to give users the ability to isolate the instrumental components, making it a great asset for music producers, DJs, and enthusiasts who want to remix or analyze music tracks.

Installation Guide

Acquiring Vocal Remover

To get started with Vocal Remover, users can easily download the latest version directly from the project's GitHub Releases page.

Installing PyTorch

Vocal Remover requires PyTorch, a prominent machine learning library. Installing PyTorch is straightforward, and users can find detailed instruction by visiting the PyTorch get-started page.

Installing Additional Packages

Once Vocal Remover is downloaded, users need to install the necessary additional packages. By navigating to the project directory in the terminal and executing the following command, pip install -r requirements.txt, all dependencies will be installed and the setup process will be completed.

Usage Instructions

Users can easily separate tracks by executing commands that facilitate this feature. The separated tracks are output as separate files: *_Instruments.wav for instrumentals and *_Vocals.wav for vocals.

Running on CPU

For a simple separation task using the CPU, the following command is used:

python inference.py --input path/to/an/audio/file

Running on GPU

To leverage the power of GPU for faster processing, users should use:

python inference.py --input path/to/an/audio/file --gpu 0

Advanced Options

Vocal Remover includes advanced options to enhance the separation quality:

Test-Time-Augmentation (TTA): This option enhances processing by applying augmentations during testing. Implement it with:
```
python inference.py --input path/to/an/audio/file --tta --gpu 0
```
Post-Processing: This experimental feature masks instrumental parts based on the vocal volume to improve quality. Use caution, as it might cause issues:
```
python inference.py --input path/to/an/audio/file --postprocess --gpu 0
```

Training Your Own Models

For users interested in training custom models, Vocal Remover provides flexibility for personalized dataset utilization.

Dataset Preparation

Users should organize their datasets with instruments and mixtures in separate directories:

path/to/dataset/
  +- instruments/
  |    +- 01_foo_inst.wav
  |    +- 02_bar_inst.mp3
  +- mixtures/
       +- 01_foo_mix.wav
       +- 02_bar_mix.mp3

Model Training

Training the models involves running the following command with designated paths and parameters:

python train.py --dataset path/to/dataset --mixup_rate 0.5 --reduction_rate 0.5 --gpu 0

Academic References

Vocal Remover's underlying technology is built upon various academic papers and notable research contributions, which include works by Jansson, Takahashi, Choi, and Liutkus. These papers delve into the fundamentals of audio separation using a combination of deep learning architectures, which backbone this powerful tool.

By providing an accessible and straightforward tool, Vocal Remover expands the potential of music manipulation, making it more accessible to a broader audience of music hobbyists and professionals alike.