Introduction to whisper.rn
whisper.rn is an exciting project that serves as a React Native binding for the high-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model, originally implemented in whisper.cpp. This project makes it easy for developers to integrate advanced speech recognition capabilities into their mobile apps, leveraging the strengths of the Whisper model through a simple integration process.
Core Features
- Cross-Platform Support: whisper.rn is compatible with both iOS and Android devices, providing seamless integration and consistent performance across platforms.
- Model Support: It supports OpenAI's Whisper models, known for their accuracy and efficiency in speech recognition tasks.
- Core ML Integration: For iOS developers, whisper.rn offers integration with Apple's Core ML, enhancing performance by leveraging device-specific optimizations.
Installation and Setup
Installation is straightforward using npm:
npm install whisper.rn
iOS Setup
- After installation, execute
npx pod-install
to set up the iOS project. - For larger models, enabling the Extended Virtual Addressing capability is recommended.
Android Setup
- Developers should add a proguard rule to protect application data and follow build configuration recommendations to prevent issues, especially on Apple Silicon Macs.
Using with Expo
- The project needs to be prebuilt to work with Expo, following the Expo guide for library integration.
Permissions for Realtime Transcription
To use the real-time transcription feature effectively, microphone permissions are essential:
- iOS: Add a permission description in
info.plist
indicating the need for microphone access. - Android: Include a RECORD_AUDIO permission in the Android manifest file.
Getting Started with whisper.rn
To start using whisper.rn, initialize the service with your model file:
import { initWhisper } from 'whisper.rn';
const whisperContext = await initWhisper({
filePath: 'file://.../ggml-tiny.en.bin',
});
Transcribe an audio file with ease:
const sampleFilePath = 'file://.../sample.wav';
const options = { language: 'en' };
const { stop, promise } = whisperContext.transcribe(sampleFilePath, options);
const { result } = await promise;
// This provides the transcribed text from the audio file
For real-time transcription, subscribe to the transcription stream:
const { stop, subscribe } = await whisperContext.transcribeRealtime(options);
subscribe(evt => {
const { isCapturing, data } = evt;
console.log(`Realtime transcribing: ${isCapturing ? 'ON' : 'OFF'}\nResult: ${data.result}`);
});
Audio Sessions and Permissions
On iOS, managing the audio session settings enhances recording quality:
import { AudioSessionIos } from 'whisper.rn';
await AudioSessionIos.setCategory(
AudioSessionIos.Category.PlayAndRecord,
[AudioSessionIos.CategoryOption.MixWithOthers]
);
await AudioSessionIos.setMode(AudioSessionIos.Mode.Default);
On Android, ensure to handle microphone permissions correctly using tools such as PermissionAndroid
.
Integration with Assets
whisper.rn allows models and audio files to be included within app assets. This requires configuration adjustments in metro.config.js
to accommodate the required file extensions.
const defaultAssetExts = require('metro-config/src/defaults/defaults').assetExts;
module.exports = {
resolver: {
assetExts: [...defaultAssetExts, 'bin', 'mil'],
},
};
This approach requires careful consideration of app size, especially in release modes.
Utilizing Core ML
For iOS 15.0+ and tvOS 15.0+, whisper.rn supports Core ML to optimize model performance. Developers need Core ML model files that align with the ggml model files being used. This can involve managing .mlmodelc
directories and potentially using resources like react-native-zip-archive
.
Example Application
An example app is provided to demonstrate whisper.rn’s capabilities with a user-friendly UI, using the Whisper model tiny.en
and a sample audio file jfk.wav
.
Testing and Troubleshooting
whisper.rn includes a mock for testing with Jest, facilitating a smooth development workflow. For more common issues and their resolutions, refer to the troubleshooting documentation.
Conclusion
whisper.rn is a robust integration for speech recognition in React Native applications, providing a blend of performance, ease of use, and cross-platform support, making it a valuable tool for developers seeking to add speech-to-text functionalities to their mobile applications.