#audio transcription

Logo of buzz
buzz
Buzz provides offline transcribing and translating of audio on personal computers using OpenAI's Whisper technology. It features audio playback, drag-and-drop import, and transcript editing. Compatible with macOS, Windows, and Linux, Buzz offers easy installation across platforms, enhancing efficiency for offline tasks.
Logo of Stage-Whisper
Stage-Whisper
Stage Whisper utilizes OpenAI's Whisper model to offer precise audio transcription with a focus on user-friendliness and cross-platform compatibility. Featuring a Node/Electron interface and Python backend, it simplifies transcription for non-technical users on MacOS, Windows, and Linux. Led by Peter Sterne and Christina Warren, the project seeks to enhance transcription accessibility for journalists and others lacking technical expertise. Engage with the community for collaborative development, feedback, and contributions.
Logo of faster-whisper
faster-whisper
Faster-whisper provides an optimized version of OpenAI's Whisper model with CTranslate2, achieving up to four times faster transcription speeds with comparable accuracy and reduced memory usage. It utilizes 8-bit quantization on CPUs and GPUs for improved efficiency and is compatible with CUDA 12 libraries for enhanced GPU performance. Installation is simplified via PyPI. The solution is highly suitable for applications requiring quick transcription with minimal resource consumption.
Logo of faster-whisper-GUI
faster-whisper-GUI
This GUI software facilitates transcribing audio and video to formats including srt, txt, smi, vtt, and lrc. It integrates with WhisperX and Demucs models for flexible and efficient use. Features encompass VAD and Whisper model parameter settings, whisper large-v3 model support, batch processing, and an intuitive PySide6 interface. Models can be downloaded or converted within the application, adhering to legal usage terms.
Logo of whisper-clip
whisper-clip
WhisperClip uses OpenAI's Whisper to automatically convert audio recordings into text, offering a one-click solution for quick transcription. The interface allows easy saving of transcriptions to the clipboard. The software, free to use, supports installation on both CPU and GPU setups. Users can adjust configurations and select models according to their system's capabilities, enhancing the transcription experience. This tool is ideal for efficiently turning spoken words into text without manual effort.
Logo of insanely-fast-whisper
insanely-fast-whisper
Achieve rapid audio transcription with Whisper and Flash Attention on supported devices. Utilizing fp16 and batching optimizations, transcribe 150 minutes of audio significantly faster. The tool allows for automatic speech recognition with options for language detection and speaker diarization. Compatible with CUDA and mps, it streamlines installation and execution from any directory.
Logo of vibe
vibe
Vibe offers an efficient offline transcription solution using OpenAI Whisper for audio and video, maintaining user privacy as data remains on your device. Its intuitive interface supports multiple languages and batch processing, accommodating diverse file formats such as SRT, VTT, and TXT. Features include real-time previews, language translation to English, and GPU optimization for macOS, Windows, and Linux. Additional functionality includes customizable models and command-line interface (CLI) support, making it a robust choice for transcription across various multimedia sources like YouTube and Facebook.
Logo of EmoV-DB
EmoV-DB
Discover the Emotional Voices Database aimed at improving emotional expressiveness in speech synthesis. Derived from the CMU Arctic database, it includes audio from four speakers in various emotional styles, such as neutral, anger, and amusement, in 16-bit .wav format. Essential for TTS development, this dataset employs Montreal Forced Aligner for precise phoneme timing to differentiate vocal types. Access the guide for usage and download details.