SwiftWhisper - Seamless Transcription Integration with whisper.cpp for Swift Applications

SwiftWhisper: Transcription Made Easy

Overview

SwiftWhisper is a powerful Swift library that seamlessly integrates audio transcription capabilities into any app or package. It serves as a wrapper around the open-source project whisper.cpp, making it convenient for developers to add transcription with minimal effort.

Installation

Swift Package Manager

To get started with SwiftWhisper, developers can integrate it into their projects using the Swift Package Manager. This involves adding it as a dependency in the Package.swift file of their project. Here's a quick guide:

let package = Package(
  ...
  dependencies: [
    // Add the package to your dependencies
    .package(url: "https://github.com/exPHAT/SwiftWhisper.git", branch: "master"),
  ],
  ...
  targets: [
    // Add SwiftWhisper as a dependency on any target you want to use it in
    .target(name: "MyTarget",
            dependencies: [.byName(name: "SwiftWhisper")])
  ]
  ...
)

Installing via Xcode

Integration is also straightforward through Xcode, where developers can add the package URL in the "Swift Package Manager" section of their project settings.

Usage

SwiftWhisper is designed to be user-friendly. After setting up, it’s as simple as importing the package and running transcription on audio data. Here's a brief example of how to utilize it:

import SwiftWhisper

let whisper = Whisper(fromFileURL: /* Model file URL */)
let segments = try await whisper.transcribe(audioFrames: /* 16kHz PCM audio frames */)

print("Transcribed audio:", segments.map(\.text).joined())

For more detailed usage, developers can refer to the API Documentation.

Advanced Features

Working with Delegates

SwiftWhisper supports delegate methods for observing transcription progress, receiving new segments, handling completion, and catching errors. The library provides a protocol WhisperDelegate for this purpose. Implementing this protocol allows more control and feedback during transcription processes.

protocol WhisperDelegate {
  // Progress updates as a percentage from 0-1
  func whisper(_ aWhisper: Whisper, didUpdateProgress progress: Double)

  // Any time a new segments of text have been transcribed
  func whisper(_ aWhisper: Whisper, didProcessNewSegments segments: [Segment], atIndex index: Int)
  
  // Finished transcribing, includes all transcribed segments of text
  func whisper(_ aWhisper: Whisper, didCompleteWithSegments segments: [Segment])

  // Error with transcription
  func whisper(_ aWhisper: Whisper, didErrorWith error: Error)
}

Downloading Models

To perform transcriptions, pre-trained models must be downloaded from sources like Hugging Face.

CoreML Support

For developers keen on using CoreML, SwiftWhisper supports it by requiring specific CoreML model files. Ensuring these files are correctly configured allows CoreML to be utilized during transcriptions, with activation details available through console outputs.

Supporting Audio Conversion

SwiftWhisper requires audio input in the form of 16kHz PCM data. For converting audio files to this format, the AudioKit library is recommended. The conversion process is simplified through a helper function that transforms audio files into the required PCM format.

Enhancements for Speed

Transcriptions can be slow in Debug mode due to optimization settings. To improve performance, developers can use the Release configuration or a version of SwiftWhisper that applies maximum optimization using special compiler flags from the fast branch.

  ...
  dependencies: [
    // Using latest commit hash for `fast` branch:
    .package(url: "https://github.com/exPHAT/SwiftWhisper.git", revision: "deb1cb6a27256c7b01f5d3d2e7dc1dcc330b5d01"),
  ],
  ...

Conclusion

SwiftWhisper is a beneficial tool for developers seeking to add robust transcription features to their Swift projects swiftly. With its accessible setup, comprehensive documentation, and efficient performance options, it stands out as a go-to choice for audio transcription in the Swift ecosystem.