use-whisper - Enhance React Apps with OpenAI Whisper API for Real-Time Speech Transcription

Introduction to useWhisper

useWhisper is a React Hook designed to facilitate integration with the OpenAI Whisper API. It simplifies the process of audio recording and real-time transcription, providing built-in features such as silence removal to optimize the transcription process.

Key Features

Speech Recorder: Provides functionality to record speech easily.
Real-Time Transcription: Converts recorded speech into text in real time.
Silence Removal: Automatically removes silent parts of the recording, saving processing time and cost.

Getting Started

Installation

Developers can quickly integrate useWhisper into their projects by installing the package using npm or yarn:

npm i @chengsokdara/use-whisper

yarn add @chengsokdara/use-whisper

Basic Usage

To use useWhisper, developers need to import it and initialize it in their React application. Here is a simple example:

import { useWhisper } from '@chengsokdara/use-whisper'

const App = () => {
  const {
    recording,
    speaking,
    transcribing,
    transcript,
    pauseRecording,
    startRecording,
    stopRecording,
  } = useWhisper({
    apiKey: process.env.OPENAI_API_TOKEN, // YOUR_OPEN_AI_TOKEN
  })

  return (
    <div>
      <p>Recording: {recording}</p>
      <p>Speaking: {speaking}</p>
      <p>Transcribing: {transcribing}</p>
      <p>Transcribed Text: {transcript.text}</p>
      <button onClick={() => startRecording()}>Start</button>
      <button onClick={() => pauseRecording()}>Pause</button>
      <button onClick={() => stopRecording()}>Stop</button>
    </div>
  )
}

Custom Server Implementation

useWhisper also enables developers to implement custom servers to handle transcription securely and efficiently. This is particularly useful to keep the OpenAI API token safe and customize the processing of recorded speech before transcription.

Advanced Features

Real-time Streaming Transcription: Developers can enable continuous transcription with the help of a streaming feature which processes audio every specified interval.
Silence Removal: This feature can be activated to clean up recordings by removing silent portions before transcription, which helps reduce costs.
Auto Start Recording and Non-Stop Recording: These options allow the recorder to start automatically when the component mounts or to keep recording as long as the user is speaking, respectively.
Custom Whisper API Configuration: Allows personalization of transcription by adjusting parameters like language, output format, and other Whisper API settings.

Dependencies

useWhisper relies on several libraries for its operation:

react-hooks-async: Facilitates asynchronous operations within React.
recordrtc: Ensures cross-browser support for audio recording.
lamejs: Converts wav files to mp3 format.
@ffmpeg/ffmpeg: Utilized for eliminating silence from recordings.
hark: Detects when the user is speaking.
axios: Used for API requests, especially since fetch does not support the Whisper endpoint.

These dependencies are loaded only when necessary to keep the application lightweight and efficient.

API Overview

useWhisper provides a comprehensive API that allows detailed control over the recording and transcription process. The configuration options include API keys, recording modes, and transcription customization, among others.

Roadmap

The useWhisper project plans to extend support to React Native, allowing mobile developers to leverage its functionalities seamlessly. This upcoming feature will be available as use-whisper/native.

For developers looking for expert assistance with web or mobile app development using React or React Native, the creator is available for contact through their website.