audio-to-text-transcription - Accurately Transcribe YouTube Audio to Text with Language Detection

YouTube Audio-to-Text Transcription

Description

The YouTube Audio-to-Text Transcription project offers a convenient and automated way to transform audio from YouTube videos into text. It cleverly bypasses the lengthy manual process of transcription by using technology to do the heavy lifting. Users simply provide a YouTube video URL, and the system extracts, transcribes, and saves the audio as a text file. This approach is particularly advantageous for anyone needing quick and accurate transcriptions, whether for research, content creation, or making content more accessible.

Key Features

User-Friendly Interface: The process is straightforward. Users just need to input a YouTube video URL, reducing complicated setup steps.
Efficient Audio Extraction: Using the pytube library, the system can filter and download the audio stream from a YouTube video effectively.
Quality Transcription: It employs the whisper library, a top-notch tool for converting speech to text, ensuring transcriptions are precise.
Convenient Output: The text is saved in a simple text file within the script's directory, making it easy to access and share.

Prerequisites

To make the best use of this project, you need:

Python 3.6 or higher.
pip, the package installer for Python, to add necessary libraries.

Required Libraries

pytube: A Python library that makes downloading YouTube videos and extracting audio streams straightforward.
whisper: A high-tech library for speech-to-text conversion.
langdetect: A library that detects the transcription's language using Google's language-detection system.

Installation

Start by cloning the repository or downloading the script.

Install the required libraries by running these commands:

pip install pytube

pip install git+https://github.com/openai/whisper.git

pip install langdetect

Install FFmpeg and set it up in your system's environment variables. Guides for Windows, Mac, and Ubuntu can be easily found online for your convenience.

Usage

Execute the script with the following command:
```
python youtube_audio_to_text.py
```

When asked, enter the YouTube video URL you would like to transcribe:

Enter the YouTube video URL: https://www.youtube.com/watch?v=XXXXXXXXXXX

The script will handle downloading the audio, transcribing it, identifying the language, and saving it in a text file named output_{language}.txt.
Check the transcription in the generated file located where the script runs.

Workflow

The workflow is seamless, involving just a few steps:

Input a YouTube video URL when the script requests it.
The pytube library uses this URL to create a YouTube object and captures the audio stream.
The audio is downloaded as an MP3 file to the YoutubeAudios directory.
The whisper library then transcribes this audio into textual form.
langdetect identifies the language of the text.
The transcription is saved in an output_{language}.txt file, making it readily accessible.

Contributing

There's a warm invitation to contribute to enhancing this project. Contributions can be made in two main forms:

Pull Requests

Create a fork of the repository and make a new branch from the main branch.
Implement your changes or additions.
Commit your changes and push them to your branch.
File a pull request to the main branch with a succinct yet detailed explanation of your changes.

Issues

Visit the project's Issues page.
Look for an issue similar to what you intend to submit.
If you don't find one, hit "New issue" to start.
Clearly describe your idea or the enhancement you suggest for the current script.

Feel free to jump in, make improvements, and share this project to make it even better!