Python Audio-Loading Benchmark
The Python Audio-Loading Benchmark project evaluates the performance of various audio input/output (I/O) libraries in Python. This evaluation is crucial for machine learning models that process raw audio data efficiently and can benefit from faster audio loading speeds. As raw audio is becoming increasingly important in machine learning tasks, the ability to quickly load audio files, whether uncompressed or compressed, and support features like seeking specific audio segments, is essential. This is especially true for convolutional neural networks that handle variable-length samples.
Tested Libraries
The project assesses a wide range of libraries, each with different capabilities, supported audio codecs, and features. Here's a brief overview:
- scipy.io.wavfile: A basic library supporting PCM (16 bit) format; it does not support seeking.
- scipy.io.wavfile memmap: Similar to the above but includes support for seeking.
- soundfile: Offers extensive codec support, including PCM, Ogg, Flac, and MP3, and supports seeking.
- pydub: Works with various formats supported by FFMPEG but lacks seeking capability.
- aubio: Supports PCM, MP3, and OGG formats with seeking features.
- audioread (FFMPEG): A versatile option for all FFMPEG-supported codecs but does not support seeking.
- librosa: Compatible with formats supported by soundfile, with seeking capabilities.
- tensorflow
tf.io.audio.decode_wav
: Limited to PCM (16 bit) and does not allow seeking. - tensorflow-io
from_audio
: Supports PCM, Ogg, Flac codecs and includes seeking. - torchaudio (sox_io and soundfile): Supports all codecs by Sox and Soundfile with seeking.
- soxbindings: Versatile with seeking support.
- stempeg: Works with all FFMPEG-supported codecs and offers seeking.
Some libraries are not included due to limited platform support, installation difficulties, or redundant features.
Results
The benchmarking process involves loading single-channel audio files of varying lengths (1 to 151 seconds) and measuring the conversion time to tensors. This process is evaluated across different tensor types: numpy, PyTorch, and TensorFlow. The results indicate how quickly different libraries can convert audio to the respective tensor type, although the results may not fully reflect batch loading speeds in deep learning applications.
Load to Numpy Tensor
The benchmarking visualizations show results for loading audio data into a Numpy tensor format.
Load to PyTorch Tensor
For PyTorch Tensors, the benchmark assesses tools compatible with this popular deep learning framework.
Load to TensorFlow Tensor
Similarly, libraries are evaluated based on their TensorFlow tensor conversion efficiency.
Getting Metadata Information
Apart from loading times, metadata extraction performance, such as sampling rate, channels, samples, and duration, is tested. The evaluation excludes the pydub
library due to its slower speed in metadata extraction.
Running the Benchmark
The project provides a comprehensive setup guide for conducting the benchmarks, offering two primary methods: using Docker or setting up a virtual environment.
- Docker Setup: Involves building a Docker container, installing required packages, and executing scripts to load and process audio data.
- Virtual Environment Setup: Guides users to create an isolated Python environment, install dependencies, and execute the benchmark.
Contribution
The project encourages community contributions through issues and pull requests. Contributors can notify the project team about new tools or updates to existing libraries, enabling ongoing refinement and accuracy of the benchmark results.
By providing these benchmarking insights, the Python Audio-Loading Benchmark project aims to help developers choose the right audio processing tools for their machine learning and audio analysis tasks, balancing speed, format support, and additional features like seeking and metadata extraction.