Introduction to Insanely Fast Whisper
Insanely Fast Whisper is a highly efficient command-line interface (CLI) designed to transcribe audio files using OpenAI's Whisper model, directly on your device. This innovative tool is bolstered by the capabilities of Hugging Face's Transformers, Optimum, and Flash Attention technologies, providing incredibly fast transcription speeds.
Key Features
- Transcription Efficiency: Insanely Fast Whisper transcribes up to 150 minutes (2.5 hours) of audio in less than 98 seconds, showcasing the potential for lightning-fast processing.
- Based on Cutting-Edge Models: The project employs the Whisper Large v3 model from OpenAI, renowned for its superior transcription capability.
Installation and Setup
To get started with Insanely Fast Whisper, you can install it using pipx
:
pipx install insanely-fast-whisper==0.0.15 --force
Note that for Python 3.11.XX users, pipx
might not interpret the version correctly, installing an outdated version. This can be fixed by:
pipx install insanely-fast-whisper --force --pip-args="--ignore-requires-python"
For direct pip
installation, use:
pip install insanely-fast-whisper --ignore-requires-python
How to Use
To use the CLI for transcription, simply execute:
insanely-fast-whisper --file-name <filename or URL>
For macOS users, include the --device-id mps
flag:
insanely-fast-whisper --file-name <filename or URL> --device-id mps
To specialize your transcription, various options are available, including selecting different model versions and enabling Flash Attention for optimized performance.
CLI Options
The CLI comes with a variety of options to customize your transcription process. These include specifying the audio file, device ID, model name, task type (transcription or translation), language detection, and many more tailored settings. Here's an overview of some key options:
--file-name
: Input file path or URL.--device-id
: Specify GPU device ("0" for CUDA or "mps" for Mac).--batch-size
: Adjust for parallel processing.--flash
: Enable Flash Attention 2 for speed enhancement.
For a full list of options and their defaults, consult the help command:
insanely-fast-whisper --help
Troubleshooting and FAQ
- Flash Attention Installation: Ensure correct installation with
pipx runpip insanely-fast-whisper install flash-attn --no-build-isolation
. - CUDA Errors on Windows: Solve by installing torch within the virtual environment.
- Memory Management on Mac: Optimize memory usage with a reduced batch size and the
--device-id mps
setting.
Without CLI
For those who prefer not using CLI, a Python snippet provided within details can smoothly run Whisper using the Transformers library.
Acknowledgements and Community
The project thanks the development efforts of the OpenAI Whisper, Hugging Face Transformers, and Optimum teams. Community contributions include innovative projects and packages which extend the utility of Insanely Fast Whisper.
The Insanely Fast Whisper project exemplifies the power of modern AI technologies, making rapid, on-device audio transcription accessible and efficient for users, empowering developers and creators with transformative audio processing tools.