OpenAI Whisper API
An Open Source Solution for Speech-to-Text and More
The OpenAI Whisper API is an open-source AI model microservice that utilizes OpenAI's cutting-edge whisper API, a leading automatic speech recognition (ASR) system. Built with Node.js, Bun.sh, and Typescript, this service is designed to run seamlessly on Docker without any dependencies, making it a highly adaptable tool for developers working on various speech and language-related applications.
Features and Capabilities
The Whisper API is a sophisticated speech-to-text model trained on extensive multilingual and multitask datasets, including diverse audio files and recordings. It is a versatile tool capable of handling multiple tasks such as language identification, speech translation, and most significantly, converting spoken language into written text.
This model excels at managing sequences of tokens and working with natural language, making it an invaluable resource for machine learning applications. It is designed to recognize multilingual speech and can efficiently handle background noise, making it suitable for transcribing video calls, YouTube videos, Zoom calls, or any non-chat scenarios in English and more.
Ease of Use
The API is straightforward and user-friendly, making it accessible to developers of all experience levels. As an open-source project under the MIT license, the Whisper API allows for broad usage in personal or commercial projects with minimal restrictions. Whether your goal is to transcribe voice messages, enhance system efficiency through system-wide optimizations, or explore the possibilities of the OpenAI Whisper API, getting started with this tool is hassle-free. Simply dive into the provided code to begin leveraging its capabilities and activate your OpenAI Account with a new API key.
How to Use
The OpenAI Whisper API functions as a microservice using Node.js, Bun.sh, and Typescript, capable of operating on Docker with zero dependencies. It listens for MP3 files on the /transcribe
route and returns the corresponding text transcription.
Running Locally
To run the Whisper API locally, start by installing bun.sh, clone the repository, and execute the following commands:
bun install
bun run dev
After completing these steps, visit http://localhost:3000 or the specified PORT to begin testing.
Deployment with Docker
The OpenAI Whisper API is available on Docker: illyism/openai-whisper-api.
Google Cloud Run Deployment
Deploying the Whisper API on Google Cloud Run involves cloning the repository and executing these commands (replace PROJECT_ID
with your Google Cloud project ID):
docker build --platform linux/amd64 -t gcr.io/PROJECT_ID/whisper-docker .
docker push gcr.io/PROJECT_ID/whisper-docker
gcloud run deploy whisper-docker \
--image gcr.io/PROJECT_ID/whisper-docker \
--region us-central1 \
--allow-unauthenticated \
--project PROJECT_ID
Once deployed, you will receive a Service URL to access and test the API.
Testing and Connecting
To test HTTP functionality, simply open the /ping
endpoint on the URL. To utilize the transcription service, connect to the /transcribe
endpoint and send a POST request with a JSON body like this:
{
"audio": "BASE64_ENCODED_AUDIO"
}
API Key Requirement
An API key from OpenAI is necessary for authentication. Include it in the HEADER as:
Authorization: Bearer OPENAI_KEY
Alternatively, launch the Docker image or server with the OPENAI_KEY
environment variable:
OPENAI_KEY=YOUR_KEY_HERE bun run dev
# or
docker run -p 3000:3000 -e OPENAI_KEY=YOUR_KEY_HERE gcr.io/magicbuddy-chat/whisper-docker
# or set it in Cloud Run with:
gcloud run deploy whisper-docker \
--image gcr.io/PROJECT_ID/whisper-docker \
--set-env-vars OPENAI_KEY=YOUR_KEY_HERE \
--region us-central1 \
--allow-unauthenticated \
--project PROJECT_ID
Live Demonstration
A live example of the Whisper API in action is available through MagicBuddy, a Telegram ChatGPT bot. Explore the OpenAI Whisper Docker and witness the functionality on: