openai-whisper-api - Seamless Multilingual Speech-to-Text Conversion Using OpenAI Whisper API

OpenAI Whisper API

An Open Source Solution for Speech-to-Text and More

The OpenAI Whisper API is an open-source AI model microservice that utilizes OpenAI's cutting-edge whisper API, a leading automatic speech recognition (ASR) system. Built with Node.js, Bun.sh, and Typescript, this service is designed to run seamlessly on Docker without any dependencies, making it a highly adaptable tool for developers working on various speech and language-related applications.

Features and Capabilities

The Whisper API is a sophisticated speech-to-text model trained on extensive multilingual and multitask datasets, including diverse audio files and recordings. It is a versatile tool capable of handling multiple tasks such as language identification, speech translation, and most significantly, converting spoken language into written text.

This model excels at managing sequences of tokens and working with natural language, making it an invaluable resource for machine learning applications. It is designed to recognize multilingual speech and can efficiently handle background noise, making it suitable for transcribing video calls, YouTube videos, Zoom calls, or any non-chat scenarios in English and more.

Ease of Use

The API is straightforward and user-friendly, making it accessible to developers of all experience levels. As an open-source project under the MIT license, the Whisper API allows for broad usage in personal or commercial projects with minimal restrictions. Whether your goal is to transcribe voice messages, enhance system efficiency through system-wide optimizations, or explore the possibilities of the OpenAI Whisper API, getting started with this tool is hassle-free. Simply dive into the provided code to begin leveraging its capabilities and activate your OpenAI Account with a new API key.

How to Use

The OpenAI Whisper API functions as a microservice using Node.js, Bun.sh, and Typescript, capable of operating on Docker with zero dependencies. It listens for MP3 files on the /transcribe route and returns the corresponding text transcription.

Running Locally

To run the Whisper API locally, start by installing bun.sh, clone the repository, and execute the following commands:

bun install
bun run dev

After completing these steps, visit http://localhost:3000 or the specified PORT to begin testing.

Deployment with Docker

The OpenAI Whisper API is available on Docker: illyism/openai-whisper-api.

Google Cloud Run Deployment

Deploying the Whisper API on Google Cloud Run involves cloning the repository and executing these commands (replace PROJECT_ID with your Google Cloud project ID):

docker build --platform linux/amd64 -t gcr.io/PROJECT_ID/whisper-docker .
docker push gcr.io/PROJECT_ID/whisper-docker

gcloud run deploy whisper-docker \
  --image gcr.io/PROJECT_ID/whisper-docker  \
  --region us-central1 \
  --allow-unauthenticated \
  --project PROJECT_ID

Once deployed, you will receive a Service URL to access and test the API.

Testing and Connecting

To test HTTP functionality, simply open the /ping endpoint on the URL. To utilize the transcription service, connect to the /transcribe endpoint and send a POST request with a JSON body like this:

{
  "audio": "BASE64_ENCODED_AUDIO"
}

API Key Requirement

An API key from OpenAI is necessary for authentication. Include it in the HEADER as:

Authorization: Bearer OPENAI_KEY

Alternatively, launch the Docker image or server with the OPENAI_KEY environment variable:

OPENAI_KEY=YOUR_KEY_HERE bun run dev

# or

docker run -p 3000:3000 -e OPENAI_KEY=YOUR_KEY_HERE gcr.io/magicbuddy-chat/whisper-docker

# or set it in Cloud Run with:

gcloud run deploy whisper-docker \
  --image gcr.io/PROJECT_ID/whisper-docker  \
  --set-env-vars OPENAI_KEY=YOUR_KEY_HERE \
  --region us-central1 \
  --allow-unauthenticated \
  --project PROJECT_ID

Live Demonstration

A live example of the Whisper API in action is available through MagicBuddy, a Telegram ChatGPT bot. Explore the OpenAI Whisper Docker and witness the functionality on:

OpenAI Whisper Live Example