#Whisper
VectorDB-Plugin-for-LM-Studio
The repository facilitates the creation and search of vector databases to enhance context retrieval across various document types, thereby refining responses of large language models. Key features encompass extracting text from formats such as PDF and DOCX, summarizing images, and transcribing audio files. It supports text-to-speech playback and is compatible with CPU and Nvidia GPU, with additional support for AMD and Intel GPUs in progress. Tailored for retrieval augmented generation, this tool minimizes hallucinations in language model outputs and supports comprehensive functionalities from file input to vector database management.
ruby-openai
Ruby-openai allows developers to integrate AI functionalities such as text generation with GPT-4o, audio transcription and translation via Whisper, and image creation using DALL·E with the OpenAI API. The library supports custom configurations for API keys, fine-tuning, and is compatible with Azure. Designed for developers, it simplifies installation with Bundler or Gem and includes features like logging and error handling to support various AI models and vector operations.
ollama-voice-mac
An entirely offline Mac-compatible voice assistant utilizing Mistral 7b and Whisper models for speech recognition. Improvements on earlier versions ensure seamless compatibility and enhanced features. Easy installation and language customization offer a convenient solution for privacy-conscious users seeking efficient voice interaction on macOS.
ChatTTS-Forge
Explore sophisticated AI text-to-speech capabilities offering versatile voice customization and efficient API support. ChatTTS and CosyVoice models facilitate advanced speech synthesis with flexible speaker options. Featuring a Gradio-based WebUI, users benefit from straightforward deployment, comprehensive docker support, and local setup. Notable features include speaker selection, style modulation, long-text processing, and real-time enhancement with multi-model support like FishSpeech and GPT-SoVITS. The API server provides optimized performance for high-demand applications, suitable for both technical and non-technical users.
Stage-Whisper
Stage Whisper utilizes OpenAI's Whisper model to offer precise audio transcription with a focus on user-friendliness and cross-platform compatibility. Featuring a Node/Electron interface and Python backend, it simplifies transcription for non-technical users on MacOS, Windows, and Linux. Led by Peter Sterne and Christina Warren, the project seeks to enhance transcription accessibility for journalists and others lacking technical expertise. Engage with the community for collaborative development, feedback, and contributions.
tensorflow-speech-recognition
Discover insights into speech recognition using the TensorFlow sequence-to-sequence framework. Despite its outdated status, this project serves educational purposes, focusing on creating standalone Linux speech recognition. While new projects like Whisper and Mozilla's DeepSpeech lead advancements, foundational techniques remain essential. Packed with modular extensions and educational examples, it offers a platform for learning and experimentation. Detailed installation guides specify key dependencies such as PyAudio.
awesome-whisper
Explore a curated list of tools and resources for Whisper, OpenAI's open-source speech recognition system. This organized catalog features official documentation, model variations, apps, CLI utilities, web platforms, articles, videos, and community links. Understand implementations for diverse uses, including iOS and macOS applications, web solutions, and third-party APIs, focusing on speed, speaker diarization, and accuracy advancements, all aimed at enhancing speech-to-text processes across platforms.
LiveWhisper
LiveWhisper leverages OpenAI's Whisper model to perform continuous audio transcription by capturing microphone input and processing it during silent intervals. It facilitates voice commands for weather updates, trivia, and media control, activating with phrases like 'hey computer'. Serving as an alternative to SpeechRecognition, this system employs sounddevice, numpy, scipy, and libraries like requests and pyttsx3. Contributions via Ko-fi support ongoing development.
whisper
Whisper by OpenAI is an open-source library facilitating audio-to-text conversion across platforms without the need for format alterations. It supports Linux and Android and integrates smoothly with Dart and Flutter, promoting versatile application. Whisper excels in real-time transcription of various audio and video forms, utilizing flexible JSON parameters for broader applicability. This library, refined for improved cross-platform functionality, seeks collaborative contributions to expand its feature set.
chatgpt-web-application
The web application provides an intuitive interface for engaging with OpenAI models like Davinci, DALL·E, and Whisper. Easily generate AI images, transcribe audio, and highlight code syntax through a chat-like interface. With Express-based server support, it facilitates API requests without frameworks. Ideal for exploring the capabilities of OpenAI's API, requiring Node.js and an API Key for setup. Contributions are welcomed to improve functionality and resolve issues. Visit http://localhost:3001 post-installation for an interactive experience.
pyvideotrans
Provides a solution for translating and dubbing videos across various languages with automated subtitle and voiceover generation. Enables wide support for speech recognition and text-to-speech models such as OpenAI and Google. Facilitates batch processing tasks including audio-visual conversions, subtitle translations, and video merges. Compatible with numerous languages and various operating systems including Windows, MacOS, and Linux. Pre-packaged Windows versions available. Ideal for developers looking for integration with video translation APIs.
Whisper-Finetune
Discover how to optimize Whisper, the advanced ASR model with multilingual support. The project emphasizes Lora fine-tuning for non-timestamped, timestamped, and audio-less data, and accelerates model performance via CTranslate2 and GGML for deployment on diverse platforms, including Windows and Android. Recent updates show enhanced Chinese recognition and processing speed, with this comprehensive guide detailing setup, data preparation, and evaluation strategies for maximizing Whisper's potential.
WhisperSpeech
Explore an innovative open source text-to-speech system designed for flexibility and commercial use. Currently supporting English with plans for multilingual compatibility, recent updates enhance performance and introduce voice cloning. Test its capabilities on Google Colab with models leveraging Whisper, EnCodec, and Vocos.
LLMtuner
LLMTuner provides an efficient solution for adjusting large language models, such as Whisper and Llama, using sophisticated techniques like LoRA and QLoRA. Featuring a user-friendly, scikit-learn-inspired interface, it facilitates streamlined parameter tuning and model deployment. The tool offers effortless demo creation and model deployment with minimal code, making it suitable for researchers and developers seeking fast and reliable ML outcomes. Its future features, including deployment readiness on platforms like AWS and GCP, are designed to significantly enhance model training and deployment capabilities.
decipher
Utilize AI to easily transcribe and add subtitles to videos with Decipher, enhancing accessibility. Leveraging OpenAI's Whisper system for precise transcriptions even in difficult audio conditions. Choose between Google Colab or manual setup, and utilize GUI or command-line interfaces as per your preference.
openai-whisper-realtime
This project offers a nearly realtime audio transcription solution using OpenAI's Whisper, ideal for efficient audio processing. It requires Python and key libraries to divide audio input into chunks for transcription. Planned enhancements include optimizing performance and improving word break detection. A fast CPU or GPU is recommended to ensure real-time efficiency.
stable-ts
The stable-ts library enhances the functionality of Whisper by providing reliable timestamps in audio transcription. Key integrations include voice isolation, noise reduction, and dynamic time warping to achieve precise word-level timestamps. It supports diverse model configurations and preprocessing methods, improving transcription accuracy by suppressing silence and refining outputs. The library requires FFmpeg and PyTorch for installation, offering cross-platform compatibility and customizable options via denoisers and voice detection methods. Additionally, it connects with any ASR system, enabling its application in various audio transcription scenarios.
ScribeWizard
ScribeWizard is a Streamlit app that effectively transforms audio lectures into structured notes. By leveraging Groq's Whisper API and the Llama3-8b and Llama3-70b models, it generates accurate note structures swiftly. The app supports markdown styling, allowing for visually appealing notes with tables and code, and offers downloads in text or PDF format. Suitable for users seeking efficient note-taking solutions, ScribeWizard is presented as a useful tool in its beta phase, open to user feedback and contributions.
go-openai
The unofficial Go library for OpenAI API supports ChatGPT, GPT-3, GPT-4, DALL·E, and Whisper models, enabling integration of AI features such as text and image generation, and speech-to-text in Go applications. This library offers installation guides, usage examples, and key management for optimized programming.
yt-whisper
The yt-whisper project facilitates YouTube subtitle generation through yt-dlp and OpenAI's Whisper, supporting multiple languages. It offers straightforward installation with Python and ffmpeg, producing VTT files and allowing model adjustments for improved accuracy. The tool also provides subtitle translation into English within an MIT-licensed open-source structure.
local-talking-llm
This guide provides instructions to construct a Python-based voice assistant with offline capabilities. Utilizing technologies such as Whisper for speech recognition, Bark for text-to-speech synthesis, and Ollama for managing LLMs, it details the setup of essential libraries like Rich and langchain. Additionally, the guide discusses performance optimization strategies and suggests improvements like customizable prompts and multimodal capabilities for enhanced functionality.
Whisper-transcription_and_diarization-speaker-identification-
Discover the use of OpenAI's Whisper for precise audio transcription and speaker differentiation with Pyannote-audio. This guide offers comprehensive instructions on audio preparation and the integration of transcription with speaker segments. Benefit from Whisper's robust model trained on vast multilingual data for enhanced performance across diverse acoustic conditions.
notesGPT
NotesGPT offers quick conversion of voice notes into tasks using AI from Convex, Together.ai, and Whisper. Utilizing a robust tech stack comprising Next.js and Convex, it streamlines task management and search. The platform integrates easily with authentication by Clerk and transcriptions via Replicate. Anticipated updates include persistent recordings, animated elements, and Notion integration enhancing workflow efficiency, providing a robust solution for diverse needs.
Feedback Email: [email protected]