audiomentations - Audio Data Augmentation Library for Enhancing Deep Learning

Introduction to Audiomentations

Audiomentations is a Python library tailored specifically for audio data augmentation, drawing inspiration from the well-known albumentations library used for image data. It's crafted to assist in deep learning processes and is capable of running on any CPU, supporting both mono and multichannel audio formats. This makes it extremely useful in enhancing the robustness and effectiveness of audio models during their training phase. One of its highlights is its ability to be smoothly integrated into training pipelines of popular frameworks like Tensorflow/Keras or Pytorch.

This library is not just a tool, but a catalyst for achieving excellence, as demonstrated by its role in helping participants achieve high rankings in Kaggle competitions. Moreover, it's adopted by various companies that are at the forefront of creating innovative audio products. For those interested in exploring a Pytorch-specific alternative with the added benefit of GPU support, Audiomentations also recommends taking a look at torch-audiomentations.

Setup

Setting up Audiomentations is straightforward. It supports an array of operating systems including Linux, macOS, and Windows, and is compatible with various Python versions, as evident by its monthly download statistics from PyPI. To get started, install Audiomentations via pip:

pip install audiomentations

Usage Example

Audiomentations offers a diverse range of transforms to augment audio, making it versatile for various applications. Here's a simple example to illustrate how to use it:

from audiomentations import Compose, AddGaussianNoise, TimeStretch, PitchShift, Shift
import numpy as np

augment = Compose([
    AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.5),
    TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),
    PitchShift(min_semitones=-4, max_semitones=4, p=0.5),
    Shift(p=0.5),
])

# Generating 2 seconds of dummy audio for demonstration
samples = np.random.uniform(low=-0.2, high=0.2, size=(32000,)).astype(np.float32)

# Applying augmentations to the audio data
augmented_samples = augment(samples=samples, sample_rate=16000)

Transforms

Audiomentations provides an extensive list of transformations, allowing users to mix, modify, and manipulate audio data creatively. Some of the notable transformations include:

AddBackgroundNoise: Introduces background noise by mixing in another sound.
PitchShift: Changes the pitch of the audio without affecting its tempo.
TimeStretch: Alters the speed of the audio without changing its pitch.
AddGaussianNoise: Applies Gaussian noise to audio samples to simulate varied sound environments.

Each of these transformers can be configured with different parameters to cater to specific needs, making it a comprehensive tool for audio data augmentation.

Documentation

Detailed documentation including guides, code examples, illustrations, and audio samples can be accessed here. This resource serves as a valuable asset for both new and experienced users to fully leverage the capabilities of Audiomentations.

Changelog

Audiomentations is actively maintained and frequently updated to improve performance and expand its feature set. For instance, the recent version 0.37.0 introduced performance enhancements through SIMD-accelerated numpy-minmax package, enhancing the speed of several transforms like Limiter, Mp3Compression, and Normalize.

For a comprehensive list of updates and changes, users can review the complete changelog here.

Acknowledgements

The development and improvement of Audiomentations are supported by contributions from individuals and organizations alike. Special thanks are extended to Nomono for backing the project, and to all contributors who have helped in enhancing Audiomentations.

By harnessing the capabilities of Audiomentations, developers and researchers can significantly elevate the quality and effectiveness of their audio-based models and applications.