LiveWhisper - Whisper-based Transcription
LiveWhisper is an innovative tool designed to provide real-time transcription of audio input directly from a microphone. The project leverages the capabilities of OpenAI's Whisper model to efficiently transcribe spoken words into text, displaying them sentence by sentence in the terminal. It utilizes the sounddevice library to capture audio, storing it when the input reaches specific volume and frequency levels. Once silence is detected after speaking, the audio gets saved to a temporary file and is processed by the Whisper model for transcription.
Key Features
- Real-time Dictation: LiveWhisper transcribes speech as it is happening, providing immediate textual output.
- Threshold-based Recording: Audio is recorded based on preset volume and frequency parameters, ensuring relevant data is captured.
- Automated Processing: Detects silence to finalize and process the audio file, then delivers a transcription.
Dependencies
To run LiveWhisper, certain libraries are necessary:
- Whisper
- numpy
- scipy
- sounddevice
While LiveWhisper can serve as an alternative to the SpeechRecognition library, it's worth noting that SpeechRecognition now supports Whisper as well.
Whisper Assistant
An extension of LiveWhisper, the Whisper Assistant uses the same foundational technology to create a simple voice-command assistant, akin to popular counterparts like Siri, Alexa, or Jarvis.
Additional Features
The assistant comes with the same requirements as LiveWhisper, plus a few extras:
- requests
- pyttsx3
- wikipedia
- bs4
- espeak and python3-espeak
Functionality
- Activation: The assistant is activated by saying its default name, "computer," or phrases like "hey computer" or "okay computer."
- Versatile Commands: Users can command the assistant to:
- Check the weather
- Announce the date and time
- Tell jokes
- Conduct Wikipedia searches
- It also supports basic requests such as simple arithmetic or retrieving trivia with the help of Google’s instant-answer feature.
Users can further control media players through commands like play, pause, next, previous, and so on. For optimal performance, enabling noise and echo cancellation is recommended, especially on Linux systems using PulseAudio.
To exit the assistant, users can press ctrl+c
or issue a voice command to "terminate" it.
Supporting the Project
If you find LiveWhisper and its companion tools valuable, consider offering support through donations to the developer's Ko-fi page, helping them to continue developing such projects.