crepe - Accurate Monophonic Pitch Tracking with Deep Learning Technology

CREPE: A State-of-the-Art Pitch Tracking Tool

Introduction

CREPE is an advanced monophonic pitch tracker utilizing a deep convolutional neural network, which functions directly on the time-domain waveform inputs. As of 2018, CREPE has been recognized as a leading pitch detection tool, surpassing well-known alternatives like pYIN and SWIPE. This innovative approach is detailed in the paper titled "CREPE: A Convolutional Representation for Pitch Estimation," which was presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) in 2018. Researchers utilizing CREPE in academic work are encouraged to cite this paper.

Installing CREPE

CREPE is accessible via the Python Package Index (PyPI). Users can install it by executing a few simple commands in their Python environment. If TensorFlow is not already installed, it should first be added or updated using the command:

$ pip install --upgrade tensorflow  # Requires TensorFlow ≥ 2.0.0
$ pip install crepe

Alternatively, the latest version of CREPE can be installed from the source by cloning the GitHub repository and running:

$ python setup.py install

Using CREPE

Command Line Usage

For ease of use, CREPE offers a command-line tool equipped with a pre-trained model. To determine the pitch of an audio file, users can simply execute:

$ crepe audio_file.wav

$ python -m crepe audio_file.wav

The resulting CSV file will contain three columns: timestamps, predicted fundamental frequency in Hertz, and voicing confidence indicating the likelihood of pitch presence.

Customizing the Pitch Analysis

Timestamps: By default, CREPE analyzes pitches every 10 milliseconds, which can be modified using the --step-size option. For instance, --step-size 50 would process every 50 milliseconds.
Model Capacity: Users can choose various model sizes depending on their need for speed versus accuracy by specifying models like tiny, small, medium, large, or full.
Temporal Smoothing: Although optional, temporal smoothing using the Viterbi algorithm can be enabled with the --viterbi flag.
Saving Data: Saving the activation matrix and corresponding plot images is possible with the --save-activation and --save-plot options.
Batch Processing: CREPE supports processing multiple audio files by providing a folder path of WAV files.

Further usage details can be accessed via the help command:

$ python crepe.py --help

Programmatic Usage

CREPE can be integrated directly into Python scripts. The following shows a simple implementation:

import crepe
from scipy.io import wavfile

sr, audio = wavfile.read('/path/to/audiofile.wav')
time, frequency, confidence, activation = crepe.predict(audio, sr, viterbi=True)

Argmax-local Weighted Averaging

A distinctive feature of CREPE’s current release is its use of an Argmax-local weighted averaging formula. This technique enhances pitch accuracy by focusing calculations on the vicinity of the maximum activation.

Considerations

CREPE currently supports only WAV file input.
The model is optimized for 16 kHz audio and auto-resamples if the input differs.
For best performance, Keras should utilize the TensorFlow backend.
Speed gains are possible by running the model on a GPU.
CREPE’s training involved datasets rich in vocal and instrumental audio, making it most effective with similar types of audio signals.

References

CREPE’s development and training reference various scholarly datasets and technologies, contributing to its robust performance in pitch tracking for music and audio processing. The acknowledged datasets include MIR-1K, Bach10, RWC-Synth, MedleyDB, MDB-STEM-Synth, and NSynth.