#Voice Activity Detection
whisper-timestamped
Whisper-timestamped provides an advanced multilingual speech recognition solution with precise word timestamps and confidence scores. Utilizing Dynamic Time Warping, it enhances Whisper models for better speech segment estimation. Integrating Voice Activity Detection, it minimizes hallucinations and handles disfluencies robustly without additional neural networks. Compatible with all versions of the OpenAI-whisper Python package, this tool is optimized for memory efficiency, even with long audio files, enhancing transcription accuracy and supporting multilingual use.
silero-vad
Silero VAD is a pre-trained solution for voice activity detection, notable for its accuracy and speed. Supporting over 6000 languages and multiple sampling rates, it is adaptable to diverse audio environments. Lightweight and portable, it utilizes PyTorch and ONNX ecosystems for broad application in IoT, mobile, telephony, and voice interfaces. Free from restrictions, it ensures privacy with no telemetry or vendor lock-in, offering robust performance suitable for real-time detection across varied use cases.
Feedback Email: [email protected]