YouTube Audio-to-Text Transcription
Description
The YouTube Audio-to-Text Transcription project offers a convenient and automated way to transform audio from YouTube videos into text. It cleverly bypasses the lengthy manual process of transcription by using technology to do the heavy lifting. Users simply provide a YouTube video URL, and the system extracts, transcribes, and saves the audio as a text file. This approach is particularly advantageous for anyone needing quick and accurate transcriptions, whether for research, content creation, or making content more accessible.
Key Features
- User-Friendly Interface: The process is straightforward. Users just need to input a YouTube video URL, reducing complicated setup steps.
- Efficient Audio Extraction: Using the
pytube
library, the system can filter and download the audio stream from a YouTube video effectively. - Quality Transcription: It employs the
whisper
library, a top-notch tool for converting speech to text, ensuring transcriptions are precise. - Convenient Output: The text is saved in a simple text file within the script's directory, making it easy to access and share.
Prerequisites
To make the best use of this project, you need:
- Python 3.6 or higher.
pip
, the package installer for Python, to add necessary libraries.
Required Libraries
pytube
: A Python library that makes downloading YouTube videos and extracting audio streams straightforward.whisper
: A high-tech library for speech-to-text conversion.langdetect
: A library that detects the transcription's language using Google's language-detection system.
Installation
-
Start by cloning the repository or downloading the script.
-
Install the required libraries by running these commands:
pip install pytube
pip install git+https://github.com/openai/whisper.git
pip install langdetect
-
Install FFmpeg and set it up in your system's environment variables. Guides for Windows, Mac, and Ubuntu can be easily found online for your convenience.
Usage
-
Execute the script with the following command:
python youtube_audio_to_text.py
-
When asked, enter the YouTube video URL you would like to transcribe:
Enter the YouTube video URL: https://www.youtube.com/watch?v=XXXXXXXXXXX
-
The script will handle downloading the audio, transcribing it, identifying the language, and saving it in a text file named
output_{language}.txt
. -
Check the transcription in the generated file located where the script runs.
Workflow
The workflow is seamless, involving just a few steps:
- Input a YouTube video URL when the script requests it.
- The
pytube
library uses this URL to create aYouTube
object and captures the audio stream. - The audio is downloaded as an MP3 file to the
YoutubeAudios
directory. - The
whisper
library then transcribes this audio into textual form. langdetect
identifies the language of the text.- The transcription is saved in an
output_{language}.txt
file, making it readily accessible.
Contributing
There's a warm invitation to contribute to enhancing this project. Contributions can be made in two main forms:
Pull Requests
- Create a fork of the repository and make a new branch from the
main
branch. - Implement your changes or additions.
- Commit your changes and push them to your branch.
- File a pull request to the
main
branch with a succinct yet detailed explanation of your changes.
Issues
- Visit the project's Issues page.
- Look for an issue similar to what you intend to submit.
- If you don't find one, hit "New issue" to start.
- Clearly describe your idea or the enhancement you suggest for the current script.
Feel free to jump in, make improvements, and share this project to make it even better!