#OpenAI Whisper
buzz
Buzz provides offline transcribing and translating of audio on personal computers using OpenAI's Whisper technology. It features audio playback, drag-and-drop import, and transcript editing. Compatible with macOS, Windows, and Linux, Buzz offers easy installation across platforms, enhancing efficiency for offline tasks.
vibe
Vibe offers an efficient offline transcription solution using OpenAI Whisper for audio and video, maintaining user privacy as data remains on your device. Its intuitive interface supports multiple languages and batch processing, accommodating diverse file formats such as SRT, VTT, and TXT. Features include real-time previews, language translation to English, and GPU optimization for macOS, Windows, and Linux. Additional functionality includes customizable models and command-line interface (CLI) support, making it a robust choice for transcription across various multimedia sources like YouTube and Facebook.
whisper-youtube
OpenAI's Whisper facilitates efficient and accurate transcription of YouTube videos with multilingual speech recognition, translation, and language identification. Compatible with various GPUs on Google Colab, it ensures high processing speeds and effective performance, even on less powerful GPUs. Users can modify inference settings and save transcripts and audio to Google Drive. Whisper's capability to handle diverse audio datasets makes it relevant for precise transcriptions.
insanely-fast-whisper-api
This open-source API offers rapid audio transcription by leveraging OpenAI's Whisper Large v3. The project integrates Transformers and flash-attn for speed and features easy deployment on any GPU cloud provider. With built-in speaker diarization and secure access through admin authentication, it enables task management with cancellation and status endpoints. Optimized for concurrency, it supports asynchronous tasks and webhooks for flexible operations. Easily deployable via Docker, it caters to Fly.io or other VM environments. Discover a scalable, cost-efficient API fully managed through JigsawStack.
whisper-diarization
This third-party project leverages OpenAI's Whisper ASR alongside Voice Activity Detection and Speaker Embedding to improve the accuracy of speaker diarization. By utilizing tools like MarbleNet and TitaNet for audio segmentation and speaker identification, the system effectively manages transcription and timestamp alignment. With compatibility for Python 3.10 and dependencies on FFMPEG and Cython, the project provides options for parallel processing and is designed to efficiently handle large audio files, while continuously addressing issues like overlapping speakers.
WhisperLive
WhisperLive utilizes OpenAI's Whisper model for real-time speech-to-text conversion from various audio sources including microphone input, pre-recorded files, RTSP, and HLS streams. With support for Faster Whisper and TensorRT backends, it provides flexible performance across different environments. The project supports multilingual transcription and can be deployed in both GPU and CPU setups. Additionally, browser extensions enhance its usability by enabling direct audio transcription. WhisperLive offers an efficient setup and environment configuration for diverse transcription needs.
whisper.rn
Whisper.rn seamlessly integrates OpenAI's Whisper ASR model into React Native apps. The library supports various platforms and offers high-performance recognition through whisper.cpp. Key features include multi-platform support, detailed installation instructions for iOS and Android, and real-time transcription capabilities. It includes microphone permissions, audio session management, and iOS Core ML support, providing a comprehensive solution for enhancing application speech recognition.
whisper_android
This guide details how to incorporate Whisper and the Recorder class into Android apps for effective offline speech recognition. It includes setup methods using TensorFlow Lite, practical code examples for Whisper initialization, and audio recording integration for efficient speech-to-text functionality. The tutorial covers key aspects such as setting file paths, managing permissions, and ensuring accurate transcription, thus enhancing Android app capabilities with reliable offline speech recognition.
whisper.unity
Discover the capabilities of OpenAI's Whisper model for automatic speech recognition within Unity3D. Supporting approximately 60 languages, this package operates locally without an internet connection, offering language translation such as German to English. It provides various model sizes for a balance between speed and accuracy and is compatible with platforms like Windows, MacOS, Linux, iOS, Android, and VisionOS. Open-source under the MIT License, it can be used in commercial projects, with guidelines for optimizing CUDA and Metal for specific hardware.
JARVIS-ChatGPT
The JARVIS-ChatGPT is a voice-based AI assistant that integrates OpenAI Whisper, IBM Watson, and OpenAI ChatGPT technologies for real-time conversational support. Primarily suitable for professionals and tech enthusiasts interested in research tasks, it features a 'Research Mode' for accessing databases, downloading papers, and managing information. The system operates through authorized microphones and uses synthetic voices including the J.A.R.V.I.S voice for an engaging experience. Requires an OpenAI account and API keys; installation options support various functionalities.
multimedia-gpt
Multimedia GPT facilitates interaction with OpenAI’s models through vision and audio inputs via an API key, allowing for image, audio, and PDF submissions with future video support, yielding text and image responses. Utilizing models like OpenAI Whisper and DALLE, it removes the necessity for local GPU resources and operates on Microsoft's Visual ChatGPT prompt management system, providing configurable integration with OpenAI LLMs such as ChatGPT and GPT-4 for a versatile multimodal experience.
auto-subs
Effortlessly transcribe and translate your Davinci Resolve editing timelines using the power of OpenAI Whisper and Stable-TS, available for Mac, Linux, and Windows. This complimentary tool offers both free and studio versions of Resolve, featuring customizable subtitles and language translation support, a Subtitle Navigator, and comprehensive setup guides.
Whisper-TikTok
Whisper-TikTok uses AI technologies, including OpenAI-Whisper and Microsoft Edge TTS, to automate the creation of TikTok videos. It provides accurate transcription from audio files and integrates natural-sounding voiceovers, improving video quality. Users can generate videos by altering JSON inputs, with FFMPEG formatting the output. The tool supports both local and online usage, compatible with multiple systems, facilitating easy video creation and direct TikTok uploads through command line or web interface.
whisper-clip
WhisperClip uses OpenAI's Whisper to automatically convert audio recordings into text, offering a one-click solution for quick transcription. The interface allows easy saving of transcriptions to the clipboard. The software, free to use, supports installation on both CPU and GPU setups. Users can adjust configurations and select models according to their system's capabilities, enhancing the transcription experience. This tool is ideal for efficiently turning spoken words into text without manual effort.
openai-whisper-realtime
This project offers a nearly realtime audio transcription solution using OpenAI's Whisper, ideal for efficient audio processing. It requires Python and key libraries to divide audio input into chunks for transcription. Planned enhancements include optimizing performance and improving word break detection. A fast CPU or GPU is recommended to ensure real-time efficiency.
RuntimeSpeechRecognizer
Runtime Speech Recognizer provides efficient speech recognition utilizing OpenAI Whisper. It supports both English-only and multilingual models covering up to 100 languages. Offers the ability to choose model sizes from 75 Mb to 2.9 Gb with automatic language model download and optional speech translation to English. Customizable features come without static libraries or external dependencies, allowing cross-platform integration on Windows, Mac, Linux, Android, and iOS. Ideal for developers in need of reliable speech recognition across different applications.
Whisper-Finetune
The project aims to enhance the Whisper speech recognition model through Lora fine-tuning, accommodating diverse training contexts such as timestamped and non-timestamped data. It also accelerates inference using CTranslate2 and GGML. Capable of recognizing and translating 98 languages into English, the Whisper model is deployable on Windows, Android, and servers, adaptable to both original and fine-tuned model versions. Discover how models from whisper-tiny to whisper-large-v3 help minimize word error rates across different datasets and utilize available tools for effortless integration into diverse applications.
Feedback Email: [email protected]