insanely-fast-whisper - High-Speed Audio Transcription Using Whisper with Flash Attention

Introduction to Insanely Fast Whisper

Insanely Fast Whisper is a highly efficient command-line interface (CLI) designed to transcribe audio files using OpenAI's Whisper model, directly on your device. This innovative tool is bolstered by the capabilities of Hugging Face's Transformers, Optimum, and Flash Attention technologies, providing incredibly fast transcription speeds.

Key Features

Transcription Efficiency: Insanely Fast Whisper transcribes up to 150 minutes (2.5 hours) of audio in less than 98 seconds, showcasing the potential for lightning-fast processing.
Based on Cutting-Edge Models: The project employs the Whisper Large v3 model from OpenAI, renowned for its superior transcription capability.

Installation and Setup

To get started with Insanely Fast Whisper, you can install it using pipx:

pipx install insanely-fast-whisper==0.0.15 --force

Note that for Python 3.11.XX users, pipx might not interpret the version correctly, installing an outdated version. This can be fixed by:

pipx install insanely-fast-whisper --force --pip-args="--ignore-requires-python"

For direct pip installation, use:

pip install insanely-fast-whisper --ignore-requires-python

How to Use

To use the CLI for transcription, simply execute:

insanely-fast-whisper --file-name <filename or URL>

For macOS users, include the --device-id mps flag:

insanely-fast-whisper --file-name <filename or URL> --device-id mps

To specialize your transcription, various options are available, including selecting different model versions and enabling Flash Attention for optimized performance.

CLI Options

The CLI comes with a variety of options to customize your transcription process. These include specifying the audio file, device ID, model name, task type (transcription or translation), language detection, and many more tailored settings. Here's an overview of some key options:

--file-name: Input file path or URL.
--device-id: Specify GPU device ("0" for CUDA or "mps" for Mac).
--batch-size: Adjust for parallel processing.
--flash: Enable Flash Attention 2 for speed enhancement.

For a full list of options and their defaults, consult the help command:

insanely-fast-whisper --help

Troubleshooting and FAQ

Flash Attention Installation: Ensure correct installation with pipx runpip insanely-fast-whisper install flash-attn --no-build-isolation.
CUDA Errors on Windows: Solve by installing torch within the virtual environment.
Memory Management on Mac: Optimize memory usage with a reduced batch size and the --device-id mps setting.

Without CLI

For those who prefer not using CLI, a Python snippet provided within details can smoothly run Whisper using the Transformers library.

Acknowledgements and Community

The project thanks the development efforts of the OpenAI Whisper, Hugging Face Transformers, and Optimum teams. Community contributions include innovative projects and packages which extend the utility of Insanely Fast Whisper.

The Insanely Fast Whisper project exemplifies the power of modern AI technologies, making rapid, on-device audio transcription accessible and efficient for users, empowering developers and creators with transformative audio processing tools.