faster-whisper - Improve Audio Transcription Speed with Fast and Efficient Processing

Introducing the Faster-Whisper Project

Overview

Faster-Whisper is an advanced reimplementation of OpenAI's Whisper model, leveraging the power of CTranslate2, a highly efficient inference engine designed for Transformer models. This project significantly boosts performance, delivering up to a fourfold increase in speed compared to the original Whisper implementation, without compromising accuracy. It also reduces memory usage, making it an optimal choice for various applications. The performance can be further enhanced with 8-bit quantization, applicable on both CPU and GPU platforms.

Benchmarking Performance

Performance Comparison

Faster-Whisper demonstrates its prowess in handling audio transcription tasks by providing benchmark comparisons with other implementations, showcasing its superior speed and reduced memory consumption.

Large-v2 Model on GPU: When translating 13 minutes of audio:
- OpenAI's Whisper uses considerable GPU and CPU memory and takes 4 minutes and 30 seconds.
- Faster-Whisper (fp16 precision) accomplishes the task in just 54 seconds, using substantially less memory.
- Faster-Whisper (int8 precision) is only slightly slower at 59 seconds, but further reduces memory usage.
Small Model on CPU: Transcribing the same audio:
- Original Whisper requires over 10 minutes, while Faster-Whisper (fp32) slashes the time to 2 minutes and 44 seconds.
- Faster-Whisper (int8), reduces the duration further to 2 minutes and 4 seconds, with less memory usage.

Distil-Whisper Evaluation

Faster-Whisper also shines in Word Error Rate (WER) evaluations, performing slightly better than standard distil-whisper models across various configurations on Gigaspeech.

Requirements and Installation

System Requirements

To efficiently run Faster-Whisper, one needs:

Python version 3.8 or higher
For GPU execution, NVIDIA’s cuBLAS and cuDNN libraries compatible with CUDA 12. A workaround is available for those still on CUDA 11 by using an older version of CTranslate2.

Installation Steps

Install directly from PyPI with the command:
```
pip install faster-whisper
```
For the latest or specific versions, options are available to install from the GitHub repository's master branch or specific commits.

Using Faster-Whisper

To utilize Faster-Whisper, users can simply import the WhisperModel and execute it on different platforms:

from faster_whisper import WhisperModel

model_size = "large-v3"
model = WhisperModel(model_size, device="cuda", compute_type="float16")

segments, info = model.transcribe("audio.mp3", beam_size=5)

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end))

Faster-Whisper provides an efficient, high-speed option for those needing rapid and memory-conservative transcription services, making it a valuable tool for tech enthusiasts and professionals alike.