#audio processing

Logo of riffusion-hobby
riffusion-hobby
Riffusion offers a library leveraging stable diffusion for real-time music and audio generation, despite no longer being actively maintained. It serves as a valuable asset for developers working with spectrogram-image-to-audio conversions. The project encompasses a command-line interface, a Streamlit interactive app, and a Flask server for API-based model inference. Compatible with Python 3.9 or 3.10, it supports CUDA for enhanced performance, providing a robust tool for audio processing and third-party application integration.
Logo of make-a-smart-speaker
make-a-smart-speaker
Discover a wealth of open-source resources for assembling a smart speaker from scratch. Explore essential technologies including audio processing and natural language algorithms, and examine leading projects like Mycroft and SEPIA. Delve into SDKs like Amazon's Alexa and Google Assistant to enhance functionality, and leverage advanced libraries for a personalized, privacy-oriented smart speaker using Raspberry Pi.
Logo of dasp-pytorch
dasp-pytorch
dasp-pytorch facilitates audio processing with differentiable modules like reverberation and distortion, seamlessly embedding within neural architectures. Supporting both CPU and GPU execution, this Apache 2.0 licensed tool enhances automated DSP and virtual analog modeling for academia and industry.
Logo of torchcrepe
torchcrepe
torchcrepe is a PyTorch implementation of the CREPE pitch tracker, utilizing converted model weights via MMdnn. It efficiently calculates pitch and periodicity for audio analysis, supporting GPU acceleration and batch processing. With advanced decoding like Viterbi and robust filtering options, it handles noise effectively. The tool includes command-line functionalities and file processing, adaptable for diverse audio applications in speech and music analysis.
Logo of spleeter
spleeter
Spleeter is a valuable tool for audio source separation that employs pretrained models to achieve swift vocal and instrumental isolation, leveraging TensorFlow for processing speeds up to 100 times faster than real-time on GPU. It can be integrated through command line or as a Python library, supporting two, four, and five-stem separations. Popular among professional audio software developers, Spleeter is easily installed using pip or Docker, catering to developers seeking high-efficiency music demixing that fits smoothly into pre-existing systems.
Logo of audio
audio
TorchAudio is a powerful library that leverages PyTorch's GPU capabilities for audio processing. It supports various audio formats and includes dataloaders for common datasets, making it integral for machine learning. Features include audio I/O, speech processing, and transforms like Spectrogram and MFCC, ensuring smooth PyTorch integration. Compliance interfaces enhance compatibility with other tools, offering a seamless experience for PyTorch users in audio and speech fields. Discover more about TorchAudio's features in the documentation.
Logo of sound_dataset_tools2
sound_dataset_tools2
The tool enables fast creation of voice datasets, seamlessly exporting training data for VITS and related projects. Featuring a user-friendly GUI, the tool supports both audio and subtitle-based imports and offers automatic audio segmentation with clipping prevention. Users have control over audio configurations and can perform evaluations to select quality data. Operable through compiled executables or from source code, the tool employs an SQLite and PySide6-based structure, promising versatile and efficient data handling.
Logo of SwiftWhisper
SwiftWhisper
SwiftWhisper facilitates transcription integration with Swift software using whisper.cpp. Installation is streamlined through Swift Package Manager or Xcode. CoreML support is available for model deployment, and developers can use the provided API for audio-to-text conversion. Features like delegate methods for progress tracking and error handling, performance optimization for release builds, and pre-trained models on AudioKit for audio conversion to 16kHz PCM are included, ensuring quality transcription.
Logo of audiomentations
audiomentations
Audiomentations is a Python library providing audio data augmentation tools for deep learning models. It operates on CPUs and supports mono and multichannel audio, adapting to frameworks such as Tensorflow/Keras and Pytorch. The library offers functions like Gaussian noise addition, pitch shift, and time stretch, vital for optimizing audio-based AI systems. Widely recognized for its success in Kaggle competitions, it is a preferred tool among top audio tech companies. The comprehensive documentation and examples ensure ease of integration and application in diverse projects.
Logo of audio-transformers-course
audio-transformers-course
This open-source course offers a deep dive into using Transformers for audio and speech processing, provided by Hugging Face. It includes translations in multiple languages like English, Spanish, and French. Participants can contribute translations and engage with a global community via GitHub and Discord. Interactive Jupyter notebooks are available for practical learning. The course aims to make machine learning education accessible globally with well-structured chapters.
Logo of crepe
crepe
CREPE is a cutting-edge tool for monophonic pitch tracking, leveraging a deep convolutional neural network that processes waveform inputs directly. It exceeds the performance of trackers such as pYIN and SWIPE, offering adaptable model capacities for efficient computation. CREPE supports batch processing, is easily accessible via command line and Python, and user friendly with installation through PyPI. It allows flexible time-step adjustments and Viterbi smoothing. Optimized for processing vocal and instrumental audio in WAV format, it provides resampling capabilities and benefits from GPU acceleration for faster processing.
Logo of pesto
pesto
Explore a pitch estimation method designed with self-supervised transposition-equivariant objectives, noted for its high accuracy and fast processing. Built with PyTorch, PESTO provides user-friendly integration through both a command-line interface and a Python API, and supports batch processing across diverse audio formats. Tailored for swift pitch analysis, PESTO offers multiple export options and functions effectively even without advanced hardware, making it applicable for both research-oriented and practical music processing scenarios.
Logo of aubiojs
aubiojs
Aubiojs is a real-time audio processing library derived from aubio, featuring pitch and tempo detection capabilities. It supports both web and Node.js environments, offering easy integration through script or npm installation. Suitable for music and audio analysis, it detects beats per minute (BPM) efficiently. Compiled with emscripten, aubiojs provides optimal performance and comprehensive features for robust audio processing.