LiveWhisper - Enhances Real-Time Audio Transcription with Assistive Voice Commands

LiveWhisper - Whisper-based Transcription

LiveWhisper is an innovative tool designed to provide real-time transcription of audio input directly from a microphone. The project leverages the capabilities of OpenAI's Whisper model to efficiently transcribe spoken words into text, displaying them sentence by sentence in the terminal. It utilizes the sounddevice library to capture audio, storing it when the input reaches specific volume and frequency levels. Once silence is detected after speaking, the audio gets saved to a temporary file and is processed by the Whisper model for transcription.

Key Features

Real-time Dictation: LiveWhisper transcribes speech as it is happening, providing immediate textual output.
Threshold-based Recording: Audio is recorded based on preset volume and frequency parameters, ensuring relevant data is captured.
Automated Processing: Detects silence to finalize and process the audio file, then delivers a transcription.

Dependencies

To run LiveWhisper, certain libraries are necessary:

Whisper
numpy
scipy
sounddevice

While LiveWhisper can serve as an alternative to the SpeechRecognition library, it's worth noting that SpeechRecognition now supports Whisper as well.

Whisper Assistant

An extension of LiveWhisper, the Whisper Assistant uses the same foundational technology to create a simple voice-command assistant, akin to popular counterparts like Siri, Alexa, or Jarvis.

Additional Features

The assistant comes with the same requirements as LiveWhisper, plus a few extras:

requests
pyttsx3
wikipedia
bs4
espeak and python3-espeak

Functionality

Activation: The assistant is activated by saying its default name, "computer," or phrases like "hey computer" or "okay computer."
Versatile Commands: Users can command the assistant to:
- Check the weather
- Announce the date and time
- Tell jokes
- Conduct Wikipedia searches
It also supports basic requests such as simple arithmetic or retrieving trivia with the help of Google’s instant-answer feature.

Users can further control media players through commands like play, pause, next, previous, and so on. For optimal performance, enabling noise and echo cancellation is recommended, especially on Linux systems using PulseAudio.

To exit the assistant, users can press ctrl+c or issue a voice command to "terminate" it.

Supporting the Project

If you find LiveWhisper and its companion tools valuable, consider offering support through donations to the developer's Ko-fi page, helping them to continue developing such projects.

ko-fi