AcademiCodec - Efficient Audio Codec Models for Research and TTS Applications

AcademiCodec: An Open Source Audio Codec Model for Academic Research

AcademiCodec represents a groundbreaking initiative designed to revolutionize audio codec technology by making it accessible for academic research and development. It is completely open-source, featuring a toolkit that empowers researchers and developers to explore and advance audio codec models. With a well-structured repository, it caters to continuous academic collaboration and contributions.

Repository Structure

The project is neatly organized into several critical components:

academicodec: This is the core directory, containing common utilities, modules, quantization techniques, and various models including HiFi-Codec, Encodec, and SoundStream.
evaluation_metric: Houses tools and metrics necessary for evaluating the models' performances.
egs: Contains example scripts and configurations for specific models like SoundStream and HiFi-Codec, along with scripts for testing and starting model evaluations.

Ongoing Project with Collaborative Spirit

As an ongoing project, AcademiCodec encourages contributions from researchers worldwide. Originating from a university setting, the project provides a platform for the academic community to engage collaboratively, contributing to the evolution of audio codec technology. The project's foundational paper is accessible here.

Audio Codec Models and Challenges

Audio codec models play a vital role in compressing audio for communications and lately for generating intermediates in audio generation tasks. Despite their widespread usage, these models face two main challenges:

The complexity of training due to closed training processes and the requirement for extensive data and computational resources.
The dependency on multiple codebooks for high-quality reconstructions, which burdens generation models.

Innovations of HiFi-Codec

AcademiCodec introduces a novel solution – the HiFi-Codec, based on Group-Residual Vector Quantization (GRVQ). HiFi-Codec addresses existing challenges by using only four codebooks, achieving superior reconstruction performance over its counterparts with fewer resources. It is an ideal choice for various audio generation applications, providing efficient middle representations.

Recent Developments and Releases

April 16, 2023: Initial release of training codes for Encodec and SoundStream models, along with pre-trained models in 24khz and 16khz.
May 5, 2023: Release of HiFi-Codec's training codes.
June 2, 2023: Addition of an infer notebook for HiFi-Codec, facilitating future training for models like VALL-E and SoundStorm.
June 13, 2023: Major code structure refactor.

Technical Requirements

To explore AcademiCodec's capabilities, ensure your setup includes:

PyTorch version >= 1.13.0
Python version >= 3.8

Training and Data Preparation

Users interested in training their models should prepare their audio data ensuring correct sample rates. For pre-trained models, guidance is available through Hugging Face.

Understanding Model Variations

The distinction between AcademiCodec's models lies mainly in their choice of discriminators:

SoundStream and Encodec: Vary in the discriminator approach, with SoundStream using dual discriminators for waveforms and spectrograms, whereas Encodec uses a single STFT-discriminator.
HiFi-Codec: Marks a leap in codec efficiency, minimizing the token number with a reduced count of codebooks.

Acknowledgements and Citations

AcademiCodec builds on knowledge and tools from leading repositories, acknowledging the significant contributions of their developers. Should this project support your research, the team kindly asks for the project to be cited.

Licensing

The project is shared under the MIT license, promoting open collaboration and accessibility.

In essence, AcademiCodec is more than just an audio codec model toolkit; it represents a collaborative effort to elevate audio processing research and technology, making advanced audio codec models accessible for academic progression and innovation.