SwiftWhisper: Transcription Made Easy
Overview
SwiftWhisper is a powerful Swift library that seamlessly integrates audio transcription capabilities into any app or package. It serves as a wrapper around the open-source project whisper.cpp, making it convenient for developers to add transcription with minimal effort.
Installation
Swift Package Manager
To get started with SwiftWhisper, developers can integrate it into their projects using the Swift Package Manager. This involves adding it as a dependency in the Package.swift
file of their project. Here's a quick guide:
let package = Package(
...
dependencies: [
// Add the package to your dependencies
.package(url: "https://github.com/exPHAT/SwiftWhisper.git", branch: "master"),
],
...
targets: [
// Add SwiftWhisper as a dependency on any target you want to use it in
.target(name: "MyTarget",
dependencies: [.byName(name: "SwiftWhisper")])
]
...
)
Installing via Xcode
Integration is also straightforward through Xcode, where developers can add the package URL in the "Swift Package Manager" section of their project settings.
Usage
SwiftWhisper is designed to be user-friendly. After setting up, it’s as simple as importing the package and running transcription on audio data. Here's a brief example of how to utilize it:
import SwiftWhisper
let whisper = Whisper(fromFileURL: /* Model file URL */)
let segments = try await whisper.transcribe(audioFrames: /* 16kHz PCM audio frames */)
print("Transcribed audio:", segments.map(\.text).joined())
For more detailed usage, developers can refer to the API Documentation.
Advanced Features
Working with Delegates
SwiftWhisper supports delegate methods for observing transcription progress, receiving new segments, handling completion, and catching errors. The library provides a protocol WhisperDelegate
for this purpose. Implementing this protocol allows more control and feedback during transcription processes.
protocol WhisperDelegate {
// Progress updates as a percentage from 0-1
func whisper(_ aWhisper: Whisper, didUpdateProgress progress: Double)
// Any time a new segments of text have been transcribed
func whisper(_ aWhisper: Whisper, didProcessNewSegments segments: [Segment], atIndex index: Int)
// Finished transcribing, includes all transcribed segments of text
func whisper(_ aWhisper: Whisper, didCompleteWithSegments segments: [Segment])
// Error with transcription
func whisper(_ aWhisper: Whisper, didErrorWith error: Error)
}
Downloading Models
To perform transcriptions, pre-trained models must be downloaded from sources like Hugging Face.
CoreML Support
For developers keen on using CoreML, SwiftWhisper supports it by requiring specific CoreML model files. Ensuring these files are correctly configured allows CoreML to be utilized during transcriptions, with activation details available through console outputs.
Supporting Audio Conversion
SwiftWhisper requires audio input in the form of 16kHz PCM data. For converting audio files to this format, the AudioKit library is recommended. The conversion process is simplified through a helper function that transforms audio files into the required PCM format.
Enhancements for Speed
Transcriptions can be slow in Debug
mode due to optimization settings. To improve performance, developers can use the Release
configuration or a version of SwiftWhisper that applies maximum optimization using special compiler flags from the fast
branch.
...
dependencies: [
// Using latest commit hash for `fast` branch:
.package(url: "https://github.com/exPHAT/SwiftWhisper.git", revision: "deb1cb6a27256c7b01f5d3d2e7dc1dcc330b5d01"),
],
...
Conclusion
SwiftWhisper is a beneficial tool for developers seeking to add robust transcription features to their Swift projects swiftly. With its accessible setup, comprehensive documentation, and efficient performance options, it stands out as a go-to choice for audio transcription in the Swift ecosystem.