audio-preprocess - Comprehensive Audio Tools for WAV Conversion, Separation, and Transcription

Introduction to Fish Audio Preprocessor

The Fish Audio Preprocessor is a versatile toolkit designed for various audio processing tasks. It provides users with a collection of scripts aimed primarily at manipulating audio and video files, making it an advantageous utility for those working with sound data. This software package caters to both experts and novices in audio processing, offering comprehensive functions that are straightforward to deploy.

Core Features

Video/Audio to WAV Conversion:
The tool allows users to effortlessly convert both video and audio files into the WAV format. This feature is invaluable for those needing a universal audio file format commonly used in professional audio applications.
Audio Vocal Separation:
Users can separate vocals from instrumental tracks with ease. This function is particularly useful for creators looking to remix or repurpose sound files without the interference of original vocal tracks.
Automatic Audio Slicing:
The system can automatically divide audio tracks into segments, simplifying the task of creating samples or highlights from longer tracks.
Audio Loudness Matching:
To maintain a consistent audio output, this feature enables the adjustment of audio loudness levels across different files, ensuring uniformity in volume and quality.
Audio Data Statistics:
The toolkit can compute and provide statistics regarding the audio data, such as determining the length of an audio file, which is crucial for various analytical and processing activities.
Audio Resampling:
This feature allows adjustment of the sample rate of audio files, making them compatible with different audio systems or specifications.
Audio Transcription (.lab):
Fish Audio Preprocessor offers transcription capabilities, converting audio input into text files with the .lab extension. This is useful for documentation and subtitling purposes.
Transcription via FunASR:
By setting the --model-type to funasr, users can leverage the capabilities of the FunASR model for an enhanced transcription process. More details on this model's application can be explored within the codebase.

In Progress and Future Enhancements

WhisperX Audio Transcription: While not yet complete, this feature promises to extend the transcription abilities by integrating WhisperX, a robust tool for audio dialogue transcription.
Merging of .lab Files: Another anticipated enhancement is the capacity to merge multiple .lab files. This would streamline the management of transcription data across different projects or datasets.

Compatibility and Installation

The software has been thoroughly tested on Ubuntu 22.04 and 20.04 with Python 3.10, ensuring a degree of stability and compatibility. However, user feedback is appreciated to further refine its functionality across diverse environments. For installation, users can easily integrate the Fish Audio Preprocessor into their systems using the command:

pip install -e .

To explore the vast capabilities and options, users can invoke the help command:

fap --help

Reference for Further Exploration

For those interested in similar projects or seeking additional tools, the Fish Audio Preprocessor acknowledges its inspiration from the Batch Whisper project, which can be visited at Batch Whisper.

In summary, Fish Audio Preprocessor offers a rich set of tools for audio manipulation, catering to a diverse range of needs in the audio processing domain, while continuously evolving to meet user demands and industry standards.