whisper-diarization
This third-party project leverages OpenAI's Whisper ASR alongside Voice Activity Detection and Speaker Embedding to improve the accuracy of speaker diarization. By utilizing tools like MarbleNet and TitaNet for audio segmentation and speaker identification, the system effectively manages transcription and timestamp alignment. With compatibility for Python 3.10 and dependencies on FFMPEG and Cython, the project provides options for parallel processing and is designed to efficiently handle large audio files, while continuously addressing issues like overlapping speakers.