FunCodec - Open-source toolkit for neural speech codec with robust audio processing features

FunCodec: An Overview

FunCodec is an innovative open-source toolkit specifically developed for neural speech codec applications. Aiming to set new standards in speech coding, FunCodec is designed to be fundamental, reproducible, and easy to integrate into various projects. The toolkit is still under active development, inviting contributions and feedback from the community to enhance its capabilities.

Latest Developments

As of December 22, 2023, FunCodec has introduced training and inference recipes for its LauraTTS models. LauraTTS is a groundbreaking zero-shot text-to-speech synthesizer that utilizes codec-based techniques to produce audio that rivals other leading systems like VALL-E in key metrics such as semantic consistency and speaker similarity.

Getting Started

To install FunCodec, users can clone the repository from GitHub and proceed with the installation using pip:

git clone https://github.com/alibaba/FunCodec.git && cd FunCodec
pip install --editable ./

Model Options

FunCodec offers a variety of pre-trained models available on platforms like Huggingface and Modelscope. These models range in bitrate and complexity, catering to different use cases from general to specific datasets like LibriTTS. Each model comes with detailed specifications such as bitrate, parameters, and computational requirements, making it easier for users to select the appropriate model for their needs.

Downloading Models

Users can download models through two main sources: ModelScope and Huggingface. Instructions and scripts are provided to facilitate the download process, ensuring that models are easily accessible and ready for use in different environments.

Performing Inference

FunCodec supports batch inference for encoding and decoding tasks. Users can extract codes from audio files and reconstruct waveforms efficiently using simple command-line scripts. Batch process capabilities allow handling significant data loads with options to utilize multiple GPUs to optimize performance.

Training with FunCodec

Using Open-source Datasets: FunCodec simplifies the training process with recipes for popular datasets like LibriTTS. By following the structured process in its egs directory, users can prepare data and train models with minimal manual interventions.
Integrating Customized Data: For users with specialized datasets, FunCodec offers flexibility in data preparation. By organizing data in a wav.scp format, users can train models with their own datasets, using provided scripts to streamline data processing and model training.

Acknowledgments and Licensing

FunCodec draws inspiration and components from various existing projects, including FunASR, Kaldi, ESPnet, and the Enocdec model architecture. It is licensed under the MIT License, allowing users to freely utilize and modify the toolkit under the terms of the license.

Citations

Researchers and developers using FunCodec are encouraged to cite the tool in their work using the following reference:

@misc{du2023funcodec,
      title={FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec},
      author={Zhihao Du, Shiliang Zhang, Kai Hu, Siqi Zheng},
      year={2023},
      eprint={2309.07405},
      archivePrefix={arXiv},
      primaryClass={cs.Sound}
}

FunCodec stands as a robust resource for anyone interested in advancing neural speech codec technology, providing comprehensive tools and support for both novice and experienced developers. Stay tuned for further updates and enhancements as the project continues to grow and evolve.