rnnoise - RNN-Based Real-Time Noise Suppression for Speech Enhancement

Introduction to RNNoise

RNNoise is an advanced noise suppression library that combines digital signal processing (DSP) with deep learning, specifically leveraging a recurrent neural network (RNN). This innovative approach offers robust real-time speech enhancement across full bandwidth. The fundamental workings of this technology are detailed in a scholarly paper by J.-M. Valin, where the hybrid method is extensively discussed.

Getting Started

To experience RNNoise, an interactive demo of version 0.1 can be accessed online. For those interested in utilizing or modifying the library, the compilation process is straightforward. It involves running a series of configuration scripts starting with autogen.sh, followed by the configure and make commands. Optionally, users can choose to install the library with make install. It's recommended to adjust the compilation settings for optimal performance on architectures supporting AVX2 or SSE4.1.

By default, when compiling, the necessary neural network model files are automatically downloaded from Xiph.Org servers due to their size.

Usage

While RNNoise is primarily intended to be used as a library for developers, a simple command-line tool is provided as an example. This tool works with raw, 16-bit, 48 kHz mono PCM audio files. Users can process noisy speech files using this tool to generate cleaner audio outputs. The command use is as follows:

./examples/rnnoise_demo <noisy speech> <output denoised>

Both input and output files must be in the raw, not WAV format.

Training Your Own Models

RNNoise models are trained using publicly available speech and noise datasets. Users need clean speech and noise data, both sampled at 48 kHz in 16-bit PCM format. The process starts by mixing speech and noise to simulate various real-life conditions. Training data features are then generated using the dump_features executable. Users can additionally simulate acoustic conditions such as reverberation by incorporating room impulse response (RIR) data files.

Once feature extraction is complete, a Python script is used to train the neural network model. This script outputs the model weights in PyTorch format, which then need conversion into a C-compatible format to integrate with RNNoise.

Loadable Models

Later versions of RNNoise have transitioned to a machine-endian binary format for models, allowing users to load models from a file at runtime. This change provides flexibility in model management and deployment, requiring care when handling the model object and associated files during active RNNoise operations. For those looking to reduce download size or memory footprint, a smaller "little" model is available.

Conclusion

RNNoise offers an effective, state-of-the-art solution for noise suppression in audio applications, combining the strengths of DSP and deep learning. Whether you're a developer looking to integrate noise suppression into your app, or a researcher interested in customizing and training models, RNNoise provides the tools and flexibility to meet those needs.