faster-whisper-GUI - GUI for Audio and Video Transcription with Faster-Whisper and WhisperX Support

Introduction to faster-whisper-GUI

The faster-whisper-GUI project is a Graphical User Interface (GUI) built using PySide6, which provides an easy-to-use tool for transcribing audio and video files into various text formats, harnessing the power of the faster-whisper and whisperX models.

Model Download

The project offers options to download models from Hugging Face, with a focus on the trending faster-whisper models. Users can download and convert these models within the software itself. A specific large-v3 float32 model can be accessed through platforms like Hugging Face and Baidu Cloud.

Key Features

Audio and Video Transcription
- Users can transcribe audio or video files to text, subtitles (srt), or lyrics (lrc, vtt, smi) formats easily via the GUI.
Wide Model Support
- The software supports various model parameters for Voice Activity Detection (VAD) and whisper models, providing flexibility and customization to the user.
- It also supports the whisperX and Demucs models, catering to different transcription needs.
Large-v3 Model
- Particular focus is given to the support of the whisper large-v3 model, enhancing transcription accuracy for larger datasets.

Utilizing Third-Party Resources

faster-whisper-GUI integrates with various third-party projects and resources:

pyside6-fluent-widgets offers a sleek GUI design, improving the user experience.
Demucs and Ultimate Vocal Remover (UVR) provide enhanced audio-visual separation capabilities, enriching the overall audio processing.

Compliance and Usage Agreement

By using this software, users agree to a set of terms ensuring responsible usage. The terms emphasize compliance with local laws and a commitment to ethical usage, steering clear of unlawful activities or content.

Visual User Interface

The GUI presents several user-friendly features, such as:

Theme Customization
- Users have the option to adjust the theme color for a personalized experience.
Batch Processing
- This feature allows users to process multiple files simultaneously, saving time and effort.
Model Management
- The interface supports loading, downloading, and converting models, streamlining model management for users.

Additional Functionalities

Silero VAD Integration
- This integration enhances voice activity detection, improving transcription accuracy.
File Management and Filtering
- A comprehensive file system with filters helps users organize and manage their transcriptions effectively.
Results Display and Timestamp Editing
- Users can view results and edit timestamps, ensuring precise syncing of transcribed content.
Word-Level Timestamps for Karaoke Lyrics
- This feature supports formats like VTT and LRC, facilitating karaoke-style playback with software like foobar2000.

Conclusion

Overall, faster-whisper-GUI offers a powerful and versatile solution for transcription needs, bolstered by a robust set of features and third-party integrations, making it a valuable tool for users requiring detailed and accurate transcription of audio and video content.