voicefixer_main - Speech Restoration Framework for Historical and Degraded Audio

VoiceFixer Project Overview

VoiceFixer is an innovative framework designed to address the challenges associated with general speech restoration. This project particularly focuses on the restoration of severely degraded and historical speech. It employs advanced machine learning techniques to enhance the quality of audio recordings that may have been distorted due to various factors such as noise, clipping, reverb, and low resolution.

Materials

VoiceFixer is built upon a rich collection of resources and research materials. The project has its foundation in a preprint available on Arxiv, which can be accessed here. Additionally, the project hosts a demo page that showcases the capabilities of VoiceFixer alongside comparisons with other speech restoration methods. For those interested in practical application, VoiceFixer is packaged as a pip package for easy installation. The datasets used for training and testing are publicly available here.

Usage

Setting Up the Environment

To begin experimenting with VoiceFixer, users need to set up their environment:

# Download dataset and prepare running environment
git clone https://github.com/haoheliu/voicefixer_main.git
cd voicefixer_main
source init.sh

VoiceFixer for General Speech Restoration

VoiceFixer is applied to general speech restoration using several models. One such model is VF_UNet, which utilizes a UNet structure for audio analysis.

Training: Users can start training VoiceFixer by providing a configuration file to the training script:
```
python3 train_gsr_voicefixer.py -c config/vctk_base_voicefixer_unet.json
```
Training process details, including logs, checkpoints, and validations, can be found in the logs directory.
Evaluation: After training, automatic evaluation can be conducted on all or selected test sets. Users can specify settings using a configuration file and checkpoint file:
```
python3 eval_gsr_voicefixer.py --config <path-to-the-config-file> --ckpt <path-to-the-checkpoint>
```

ResUNet for General Speech Restoration

ResUNet offers an alternative approach for general speech restoration.

Training: Similar to VoiceFixer, users can train ResUNet with a specific configuration file:
```
python3 train_gsr_voicefixer.py -c config/vctk_base_voicefixer_unet.json
```
Evaluation: The evaluation process mirrors that of VoiceFixer, allowing for flexible test set usage and descriptive labeling:
```
python3 eval_ssr_unet.py --config <path-to-the-config-file> --ckpt <path-to-the-checkpoint> ...
```

ResUNet for Single Task Speech Restoration

For more specialized tasks such as denoising, dereverberation, super-resolution, and declipping, ResUNet provides focused strategies:

Training: Each task requires a dedicated configuration file to train the model:
```
python3 train_ssr_unet.py -c config/vctk_base_ssr_unet_denoising.json
...
```
Evaluation: The evaluation structure remains consistent across these tasks, offering detailed insights into model performance.

Citation

For those interested in referencing the VoiceFixer project, use the following citation:

 @misc{liu2021voicefixer,   
     title={VoiceFixer: Toward General Speech Restoration With Neural Vocoder},   
     author={Haohe Liu and Qiuqiang Kong and Qiao Tian and Yan Zhao and DeLiang Wang and Chuanzeng Huang and Yuxuan Wang},  
     year={2021},  
     eprint={2109.13731},  
     archivePrefix={arXiv},  
     primaryClass={cs.SD}  
 }

VoiceFixer represents a step forward in audio restoration technology, providing robust solutions for both general and specific speech enhancement needs. Through its comprehensive resources and flexible usage, VoiceFixer serves as a powerful tool for researchers and practitioners alike.