Whisper-WebUI - Simplified Interface for Subtitle Generation and Translation

Whisper-WebUI: A Comprehensive Project Overview

Introduction

Whisper-WebUI is an innovative browser-based interface built using Gradio, designed to simplify the process of generating subtitles, among other features. It seamlessly integrates with the widely acclaimed Whisper library developed by OpenAI. With Whisper-WebUI, users can easily create subtitles from various sources, making it a valuable tool for creators, translators, and anyone involved in multimedia content generation.

Key Features

Whisper Implementations

Whisper-WebUI provides flexibility by allowing users to choose from multiple Whisper implementations, including:

openai/whisper: The original library developed by OpenAI.
SYSTRAN/faster-whisper: Offers faster processing and is the default choice.
Vaibhavs10/insanely-fast-whisper: Focuses on extremely rapid processing.

Subtitle Generation

Whisper-WebUI supports subtitle creation from diverse sources:

Files: Generate subtitles from existing video or audio files.
YouTube: Directly process YouTube content.
Microphone: Capture live audio for real-time subtitle creation.

Supported subtitle formats include SRT, WebVTT, and plain text files without a timeline.

Translation Capabilities

Speech to Text Translation: Convert speech in any language to English using Whisper’s robust end-to-end translation.
Text to Text Translation: Leverage advanced models like Facebook’s NLLB and the DeepL API to translate subtitle files into different languages.

Audio Preprocessing and Postprocessing

Preprocessing: Utilize Silero VAD for voice activity detection and UVR for separating background music.
Postprocessing: Enhance results with speaker diarization using the pyannote model.

Installation and Running Options

Running with Pinokio

Install the Pinokio Software.
Find Whisper-WebUI in the software’s repository and install it.
Launch it via http://localhost:7860.

Running with Docker

Install Docker-Desktop.
Clone the repository using git clone https://github.com/jhj0517/Whisper-WebUI.git.
Build the Docker image and run the container.
Access Whisper-WebUI at http://localhost:7860.

Running Locally

Prerequisites: Ensure Git, Python (3.10 to 3.12), and FFmpeg are installed.

Clone the repository and run the installation script to set up dependencies.
Launch the WebUI using provided scripts.

VRAM Efficiency and Model Options

By default, Whisper-WebUI integrates with the faster-whisper library for optimal performance in terms of VRAM usage and processing speed. Users can choose different models based on their specific requirements and available resources.

Future Plans and Contributions

Whisper-WebUI is continually evolving, with plans to add fast API scripts and support real-time transcription. The project welcomes contributions, especially translation efforts into various languages, enhancing its usability worldwide.

Conclusion

Whisper-WebUI stands out as a versatile and user-friendly tool for anyone needing subtitle generation and translation. Its rich feature set and flexible installation options make it suitable for diverse applications, from individual creators to large-scale media productions.