CREPE: A State-of-the-Art Pitch Tracking Tool
Introduction
CREPE is an advanced monophonic pitch tracker utilizing a deep convolutional neural network, which functions directly on the time-domain waveform inputs. As of 2018, CREPE has been recognized as a leading pitch detection tool, surpassing well-known alternatives like pYIN and SWIPE. This innovative approach is detailed in the paper titled "CREPE: A Convolutional Representation for Pitch Estimation," which was presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) in 2018. Researchers utilizing CREPE in academic work are encouraged to cite this paper.
Installing CREPE
CREPE is accessible via the Python Package Index (PyPI). Users can install it by executing a few simple commands in their Python environment. If TensorFlow is not already installed, it should first be added or updated using the command:
$ pip install --upgrade tensorflow # Requires TensorFlow ≥ 2.0.0
$ pip install crepe
Alternatively, the latest version of CREPE can be installed from the source by cloning the GitHub repository and running:
$ python setup.py install
Using CREPE
Command Line Usage
For ease of use, CREPE offers a command-line tool equipped with a pre-trained model. To determine the pitch of an audio file, users can simply execute:
$ crepe audio_file.wav
or
$ python -m crepe audio_file.wav
The resulting CSV file will contain three columns: timestamps, predicted fundamental frequency in Hertz, and voicing confidence indicating the likelihood of pitch presence.
Customizing the Pitch Analysis
-
Timestamps: By default, CREPE analyzes pitches every 10 milliseconds, which can be modified using the
--step-size
option. For instance,--step-size 50
would process every 50 milliseconds. -
Model Capacity: Users can choose various model sizes depending on their need for speed versus accuracy by specifying models like
tiny
,small
,medium
,large
, orfull
. -
Temporal Smoothing: Although optional, temporal smoothing using the Viterbi algorithm can be enabled with the
--viterbi
flag. -
Saving Data: Saving the activation matrix and corresponding plot images is possible with the
--save-activation
and--save-plot
options. -
Batch Processing: CREPE supports processing multiple audio files by providing a folder path of WAV files.
Further usage details can be accessed via the help command:
$ python crepe.py --help
Programmatic Usage
CREPE can be integrated directly into Python scripts. The following shows a simple implementation:
import crepe
from scipy.io import wavfile
sr, audio = wavfile.read('/path/to/audiofile.wav')
time, frequency, confidence, activation = crepe.predict(audio, sr, viterbi=True)
Argmax-local Weighted Averaging
A distinctive feature of CREPE’s current release is its use of an Argmax-local weighted averaging formula. This technique enhances pitch accuracy by focusing calculations on the vicinity of the maximum activation.
Considerations
-
CREPE currently supports only WAV file input.
-
The model is optimized for 16 kHz audio and auto-resamples if the input differs.
-
For best performance, Keras should utilize the TensorFlow backend.
-
Speed gains are possible by running the model on a GPU.
-
CREPE’s training involved datasets rich in vocal and instrumental audio, making it most effective with similar types of audio signals.
References
CREPE’s development and training reference various scholarly datasets and technologies, contributing to its robust performance in pitch tracking for music and audio processing. The acknowledged datasets include MIR-1K, Bach10, RWC-Synth, MedleyDB, MDB-STEM-Synth, and NSynth.