pesto - Self-supervised Pitch Estimation with Transposition-Equivariant Techniques for Efficient Processing

Introduction to PESTO: A Revolutionary Pitch Estimation Tool

PESTO, short for Pitch Estimation with Self-supervised Transposition-equivariant Objective, is a groundbreaking pitch estimation tool based on machine learning that offers fast and accurate pitch detection. The project grabbed the Best Paper Award at the ISMIR 2023 conference, highlighting its innovation and impact in the field of music information retrieval.

Installation and Setup

Setting up PESTO is a breeze. Users can install it with a single command:

pip install pesto-pitch

The implementation relies on PyTorch, along with additional dependencies like numpy, torchaudio, and optionally matplotlib for visual output. It's recommended to install PyTorch first for seamless integration.

Using PESTO

Command-line Interface

PESTO offers a user-friendly command-line interface (CLI). To estimate pitch from an audio file, simply input:

pesto my_file.wav

The results are saved in a .csv file, echoing the format used by CREPE, allowing for easy comparisons. PESTO also supports various output formats, including .npz for timesteps, pitch, confidence, and activations, and visual outputs in .png format using matplotlib.

The tool is designed for batch processing, allowing users to estimate pitches for multiple files simultaneously, making it highly convenient for analyzing entire folders of audio files.

Audio Formats

PESTO supports a wide range of audio formats through torchaudio, accommodating files with different sampling rates without needing resampling.

Customizing Pitch Prediction

By default, PESTO provides a probability distribution over pitch bins, using Argmax-Local Weighted Averaging for precise pitch calculation. Users can opt for alternative methods using command options.

Additional functionalities include customizing the prediction step size, returning pitch predictions in semitones, and supporting GPU for enhanced processing speed.

Python API

PESTO's core functions can be integrated directly into Python code. This allows developers to leverage PESTO's capabilities within their own applications by calling functions like predict_from_files.

The API supports advanced manipulation, such as pre-loading models for repeated use to save time and resources. This flexibility makes PESTO particularly appealing for developers working on large-scale projects.

Performance and Speed

PESTO excels in both performance and speed. It outperforms other self-supervised models on datasets like MIR-1K and MDB-stem-synth, and its performance is comparable to CREPE, albeit with significantly fewer parameters.

Its lightweight design ensures rapid inference, processing audio files much faster than real-time, especially when using optimized step sizes. GPU support further accelerates the process, catering to resource-intensive applications.

Contribution Opportunities

While PESTO already stands out in speed and performance, the development team encourages community contributions. From experimenting with model architectures to optimizing speed, all suggestions and improvements are welcome.

Summary

PESTO is a formidable tool for pitch estimation, offering a blend of accuracy, speed, and ease of use. Its compatibility with a broad range of audio formats and seamless integration with Python make it an excellent choice for both hobbyists and professionals in the music and audio processing domain.