torchcrepe - PyTorch-Based Tool for Accurate Audio Pitch Tracking Using CREPE

Overview

Torchcrepe is a PyTorch-based implementation of the CREPE pitch tracker, which assists in accurately determining the pitch of audio signals. Originally developed in TensorFlow, CREPE provides robust pitch estimation capabilities that are highly suitable for a variety of audio processing tasks. Torchcrepe utilizes model weights converted from CREPE's "tiny" and "full" models via MMdnn, an open-source framework for model management.

Installation

To get started, users should first install PyTorch following system-specific instructions available on the PyTorch website. Once PyTorch is set up, torchcrepe can be installed simply using pip:

pip install torchcrepe

Usage

Torchcrepe is designed to be user-friendly, enabling developers to compute pitch and periodicity from audio with ease. Here’s a simple step-by-step guide to using it:

Computing Pitch and Periodicity

Load Audio: Use the API to load your audio file.
Set Parameters: Select the model type ('tiny' or 'full'), the device for inference (like 'cuda:0'), and other parameters like frequency range and batch size.
Make Predictions: Compute the pitch using the selected model and parameters.

Additionally, torchcrepe allows extracting a periodicity metric akin to the Crepe confidence score by setting return_periodicity=True.

Decoding

Torchcrepe employs Viterbi decoding by default to interpret the pitch, which mitigates errors due to frequency doubling or halving. Different decoding methods are available, allowing flexibility:

Viterbi Decoding: The default method that smooths out large pitch jumps.
Weighted Argmax: Similar to the original implementation, this method can be chosen for comparison.
Argmax: A simpler, direct approach for decoding.

Filtering and Thresholding

When periodicity values are low, resulting pitch estimations may become unreliable. Torchcrepe offers filters and thresholding techniques to manage this issue and enhance data quality.

Median Filtering: To smooth out noisy confidence values.
Noise Masking: To remove areas with low periodic content.

These filters and thresholds should be adjusted according to the specific nature and quality of the audio data being processed.

Additional Features

Torchcrepe also provides functions to compute embeddings, model output activations, and predictions directly from audio files, accommodating for both small-scale and large-scale audio processing workflows.

Command-line Interface

Torchcrepe comes with a flexible command-line interface that facilitates easy processing of audio files without requiring extensive programming knowledge. The interface supports numerous options for customized processing, such as specifying input and output files, selecting models, and configuring processing parameters.

Tests

To ensure functionality, torchcrepe includes a comprehensive suite of tests. Users can run these tests using Pytest to verify that the installation and processing operations are working as expected.

pip install pytest
pytest

References

The development of torchcrepe is based on significant research works. CREPE's foundation is detailed in the paper by J. W. Kim et al., which introduces Crepe's convolutional representation for pitch estimation. Additionally, insights on pitch embedding are informed by the work on Differentiable Digital Signal Processing, as discussed by J. H. Engel et al.

By offering a blend of accuracy, flexibility, and ease of use, torchcrepe is an invaluable tool for developers and researchers working on audio signal processing and analysis. Its PyTorch implementation makes it accessible and extensible for machine learning enthusiasts and professionals alike.