RAVE - Efficient Neural Audio Synthesis with Variational Autoencoders

RAVE: Realtime Audio Variational autoEncoder

Introduction

RAVE, short for Realtime Audio Variational autoEncoder, is an innovative system designed for fast and high-quality neural audio synthesis. This project, brought to life by Antoine Caillon and Philippe Esling, is aimed at revolutionizing how audio is processed and synthesized in real-time environments. Its capability has been encapsulated in a comprehensive research paper available on arXiv.

The creators of RAVE encourage users who include it in musical performances or installations to cite either the repository or the scholarly article associated with it. A vibrant community for discussion and support is available on RAVE’s Discord server.

Availability and Resources

RAVE VST

RAVE's powerful VST plugin is accessible for Windows, Mac, and Linux users in its beta version on the Forum IRCAM webpage. Any issues can be reported through the forum's discussion page.

Tutorials

To assist users in getting started, tutorials have been made available, with video versions in the pipeline. These guides delve into neural synthesis within digital audio workstations (DAWs) and Max 8, as well as instructions for training RAVE models on custom datasets.

Installation and Setup

RAVE is easily installable via Python's package manager with the command pip install acids-rave. It is recommended to first install torch and torchaudio to ensure compatibility with your system's configurations. Additionally, ffmpeg must be installed, which can be done using Conda.

Using RAVE

Dataset Preparation

RAVE accommodates two methods for dataset preparation: regular and lazy. The lazy approach is particularly useful for large audio files directly readable by RAVE during training, though it increases CPU usage.

Training

Training involves selecting configurations to suit various needs. RAVE supports several architectures, ranging from the original model to the improved v2 version, and beyond to specialized configurations for specific audio processing objectives. Users may also apply data augmentations for better model generalization under limited data conditions.

Export

After training, RAVE models can be exported as torchscript files. When exporting, enabling the streaming mode with a specific flag optimizes the model for real-time processing.

Advanced Features

Prior Integration

For discrete models, additional training using the msprior library can enhance model performance. However, experimental results have reintegrated previous priors into current versions of the model for convenience.

Real-time and Offline Usage

RAVE is compatible with real-time applications via the nn~ object, usable in Max/MSP or PureData. This feature allows for exciting applications like style transfer, which uses style definitions in live audio processing.

Offline usage options include batch operations for processing multiple audio files, helping users manage large datasets efficiently.

Community and Support

The project maintains active discussions through GitHub for users to share experiences, ask questions, and showcase works involving RAVE.

Demonstrations and Further Learning

Video demonstrations are available to showcase RAVE's capabilities with Max/MSP and its potential for real-time audio manipulation on embedded platforms, assisting users in visualizing practical applications.

This project has been made possible with the support of several funding programs, underlining its innovative contributions to the field of audio processing and synthesis.

RAVE stands as a testament to the power of audio neural networks, offering both researchers and musicians a versatile tool for creative exploration in sound design and performance.