speech-recognition

Transcribe speech to text using Apple's Speech framework. Use when implementing live microphone transcription with AVAudioEngine, recognizing recorded audio files, handling speech and microphone authorization, choosing on-device vs server-backed SFSpeechRecognizer behavior, or adopting SpeechAnalyzer, SpeechTranscriber, DictationTranscriber, AssetInventory, and async result streams on iOS 26+.
Skill file

Preview skill file↓↑
---
name: speech-recognition
description: "Transcribe speech to text using Apple's Speech framework. Use when implementing live microphone transcription with AVAudioEngine, recognizing recorded audio files, handling speech and microphone authorization, choosing on-device vs server-backed SFSpeechRecognizer behavior, or adopting SpeechAnalyzer, SpeechTranscriber, DictationTranscriber, AssetInventory, and async result streams on iOS 26+."
---

# Speech Recognition

Transcribe live and pre-recorded audio to text using Apple's Speech framework.
Covers `SpeechAnalyzer` / `SpeechTranscriber` (iOS 26+) and
`SFSpeechRecognizer` (iOS 10+). Targets Swift 6.3 / iOS 26+ while preserving
fallback guidance for apps that support older OS versions.

**Scope boundary:** Use this skill for speech-to-text recognition, speech
authorization, microphone capture plumbing, and result handling. Hand off text
analysis, language identification after transcription, sentiment, embeddings,
and translation to `natural-language`; hand off audio playback UI to `avkit`;
hand off summarization or generation over transcripts to `apple-on-device-ai`.

## Contents

- [SpeechAnalyzer Strategy (iOS 26+)](#speechanalyzer-strategy-ios-26)
- [SFSpeechRecognizer Setup](#sfspeechrecognizer-setup)
- [Authorization](#authorization)
- [Live Microphone Transcription](#live-microphone-transcription)
- [Pre-Recorded Audio File Recognition](#pre-recorded-audio-file-recognition)
- [On-Device vs Server Recognition](#on-device-vs-server-recognition)
- [Handling Results](#handling-results)
- [Common Mistakes](#common-mistakes)
- [Review Checklist](#review-checklist)
- [References](#references)

## SpeechAnalyzer Strategy (iOS 26+)

Use `SpeechAnalyzer` for modern iOS 26+ speech analysis, especially long-form
recordings, live transcription, time-indexed transcripts, and fully on-device
flows. Keep `SFSpeechRecognizer` for iOS 10+ deployment targets, server-backed
locale coverage, or existing callback/delegate implementations.

Read [SpeechAnalyzer patterns](references/speechanalyzer-patterns.md) when
implementing an iOS 26+ transcription pipeline, model asset handling, volatile
results, or file/buffer examples.

### SpeechAnalyzer setup checklist

1. Choose the module:
   - `SpeechTranscriber` for the newer general-purpose on-device model.
   - `DictationTranscriber` when `SpeechTranscriber` is unavailable for the
     current device or locale and dictation-compatible support is acceptable.
   - `SpeechDetector` only in conjunction with a transcriber when voice
     activity detection is worth the accuracy/power tradeoff.
2. Check support before creating the session:
   - `SpeechTranscriber.isAvailable`
   - `SpeechTranscriber.supportedLocale(equivalentTo:)`
   - `SpeechTranscriber.installedLocales` / `supportedLocales` when showing
     language choices.
3. Pick a documented preset:
   - `.transcription` for basic accurate transcription.
   - `.progressiveTranscription` for live UI updates.
   - `.timeIndexedProgressiveTranscription` when playback highlighting needs
     `audioTimeRange`.
4. Install required assets with `AssetInventory.assetInstallationRequest`.
5. Convert live audio buffers to
   `SpeechAnalyzer.bestAvailableAudioFormat(compatibleWith:)` before yielding
   `AnalyzerInput`.
6. Consume module results from their `AsyncSequence` in a separate task.
7. Finish explicitly with `finalizeAndFinish(through:)`,
   `finalizeAndFinishThroughEndOfInput()`, or `cancelAndFinishNow()`.

Do not use an `offlineTranscription` preset; Apple does not document one.
Finishing an `AsyncStream` input sequence does not finish the analyzer session.

## SFSpeechRecognizer Setup

### Creating a recognizer with locale

```swift
import Speech

// Default locale (user's current language)
let recognizer = SFSpeechRecognizer()

// Specific locale
let recognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))

// Check if recognition is available for this locale
guard let recognizer, recognizer.isAvailable else {
    print("Speech recognition not available")
    return
}
```

### Monitoring availability changes

```swift
final class SpeechManager: NSObject, SFSpeechRecognizerDelegate {
    private let recognizer = SFSpeechRecognizer()!

    override init() {
        super.init()
        recognizer.delegate = self
    }

    func speechRecognizer(
        _ speechRecognizer: SFSpeechRecognizer,
        availabilityDidChange available: Bool
    ) {
        // Update UI — disable record button when unavailable
    }
}
```

## Authorization

Request **both** speech recognition and microphone permissions before starting
live transcription. Add these keys to `Info.plist`:

- `NSSpeechRecognitionUsageDescription`
- `NSMicrophoneUsageDescription`

```swift
import Speech
import AVFoundation

func requestPermissions() async -> Bool {
    let speechStatus = await withCheckedContinuation { continuation in
        SFSpeechRecognizer.requestAuthorization { status in
            continuation.resume(returning: status)
        }
    }
    guard speechStatus == .authorized else { return false }

    let micStatus: Bool
    if #available(iOS 17, *) {
        micStatus = await AVAudioApplication.requestRecordPermission()
    } else {
        micStatus = await withCheckedContinuation { continuation in
            AVAudioSession.sharedInstance().requestRecordPermission { granted in
                continuation.resume(returning: granted)
            }
        }
    }
    return micStatus
}
```

## Live Microphone Transcription

The standard pattern: `AVAudioEngine` captures microphone audio → buffers are
appended to `SFSpeechAudioBufferRecognitionRequest` → results stream in.

```swift
import Speech
import AVFoundation

final class LiveTranscriber {
    private let recognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))!
    private let audioEngine = AVAudioEngine()
    private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
    private var recognitionTask: SFSpeechRecognitionTask?

    func startTranscribing() throws {
        // Cancel any in-progress task
        recognitionTask?.cancel()
        recognitionTask = nil

        // Configure audio session
        let audioSession = AVAudioSession.sharedInstance()
        try audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
        try audioSession.setActive(true, options: .notifyOthersOnDeactivation)

        // Create request
        let request = SFSpeechAudioBufferRecognitionRequest()
        request.shouldReportPartialResults = true
        self.recognitionRequest = request

        // Start recognition task
        recognitionTask = recognizer.recognitionTask(with: request) { result, error in
            if let result {
                let text = result.bestTranscription.formattedString
                print("Transcription: \(text)")

                if result.isFinal {
                    self.stopTranscribing()
                }
            }
            if let error {
                print("Recognition error: \(error)")
                self.stopTranscribing()
            }
        }

        // Install audio tap
        let inputNode = audioEngine.inputNode
        let recordingFormat = inputNode.outputFormat(forBus: 0)
        inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) {
            buffer, _ in
            request.append(buffer)
        }

        audioEngine.prepare()
        try audioEngine.start()
    }

    func stopTranscribing() {
        audioEngine.stop()
        audioEngine.inputNode.removeTap(onBus: 0)
        recognitionRequest?.endAudio()
        recognitionRequest = nil
        recognitionTask?.cancel()
        recognitionTask = nil
    }
}
```

## Pre-Recorded Audio File Recognition

Use `SFSpeechURLRecognitionRequest` for audio files on disk:

```swift
func transcribeFile(at url: URL) async throws -> String {
    guard let recognizer = SFSpeechRecognizer(), recognizer.isAvailable else {
        throw SpeechError.unavailable
    }
    let request = SFSpeechURLRecognitionRequest(url: url)
    request.shouldReportPartialResults = false

    return try await withCheckedThrowingContinuation { continuation in
        var didResume = false
        recognizer.recognitionTask(with: request) { result, error in
            guard !didResume else { return }
            if let error {
                didResume = true
                continuation.resume(throwing: error)
            } else if let result, result.isFinal {
                didResume = true
                continuation.resume(
                    returning: result.bestTranscription.formattedString
                )
            }
        }
    }
}
```

## On-Device vs Server Recognition

`SFSpeechRecognizer` can use on-device recognition for supported locales on
iOS 13+. If `supportsOnDeviceRecognition` is false, the recognizer requires a
network connection. `requiresOnDeviceRecognition` only has effect when the
recognizer supports it.

```swift
let recognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))!

// Check if on-device is supported for this locale
if recognizer.supportsOnDeviceRecognition {
    let request = SFSpeechAudioBufferRecognitionRequest()
    request.requiresOnDeviceRecognition = true  // Force on-device
}
```

`SFSpeechRecognizer` requests may still be a poor fit for long-form capture.
Apple documents a roughly one-minute task limit for speech recognition and
other service limits. For long recordings on iOS 26+, prefer `SpeechAnalyzer`;
otherwise chunk or restart recognition before the limit and preserve transcript
state across tasks.

## Handling Results

### Partial vs final results

```swift
let request = SFSpeechAudioBufferRecognitionRequest()
request.shouldReportPartialResults = true  // default is true

recognizer.recognitionTask(with: request) { result, error in
    guard let result else { return }

    if result.isFinal {
        // Final transcription — recognition is complete
        let final = result.bestTranscription.formattedString
    } else {
        // Partial result — may change as more audio is processed
        let partial = result.bestTranscription.formattedString
    }
}
```

### Accessing alternative transcriptions and confidence

```swift
recognizer.recognitionTask(with: request) { result, error in
    guard let result else { return }

    // Best transcription
    let best = result.bestTranscription

    // All alternatives (sorted by confidence, descending)
    for transcription in result.transcriptions {
        for segment in transcription.segments {
            print("\(segment.substring): \(segment.confidence)")
        }
    }
}
```

### Adding punctuation (iOS 16+)

```swift
let request = SFSpeechAudioBufferRecognitionRequest()
request.addsPunctuation = true
```

### Contextual strings

Improve recognition of domain-specific terms:

```swift
let request = SFSpeechAudioBufferRecognitionRequest()
request.contextualStrings = ["SwiftUI", "Xcode", "CloudKit"]
```

## Common Mistakes

### Not requesting both speech and microphone authorization

```swift
// ❌ DON'T: Only request speech authorization for live audio
SFSpeechRecognizer.requestAuthorization { status in
    // Missing microphone permission — audio engine will fail
    self.startRecording()
}

// ✅ DO: Request both permissions before recording
SFSpeechRecognizer.requestAuthorization { status in
    guard status == .authorized else { return }
    AVAudioSession.sharedInstance().requestRecordPermission { granted in
        guard granted else { return }
        self.startRecording()
    }
}
```

### Not handling availability changes

```swift
// ❌ DON'T: Assume recognizer stays available after initial check
let recognizer = SFSpeechRecognizer()!
// Recognition may fail if network drops or locale changes

// ✅ DO: Monitor availability via delegate
recognizer.delegate = self
func speechRecognizer(
    _ speechRecognizer: SFSpeechRecognizer,
    availabilityDidChange available: Bool
) {
    recordButton.isEnabled = available
}
```

### Not stopping the audio engine when recognition ends

```swift
// ❌ DON'T: Leave audio engine running after recognition finishes
recognizer.recognitionTask(with: request) { result, error in
    if result?.isFinal == true {
        // Audio engine still running, wasting resources and battery
    }
}

// ✅ DO: Clean up all audio resources
recognizer.recognitionTask(with: request) { result, error in
    if result?.isFinal == true || error != nil {
        self.audioEngine.stop()
        self.audioEngine.inputNode.removeTap(onBus: 0)
        self.recognitionRequest?.endAudio()
        self.recognitionRequest = nil
    }
}
```

### Assuming on-device recognition is available for all locales

```swift
// ❌ DON'T: Force on-device without checking support
let request = SFSpeechAudioBufferRecognitionRequest()
request.requiresOnDeviceRecognition = true // Ignored unless the recognizer supports it

// ✅ DO: Check support before requiring on-device
if recognizer.supportsOnDeviceRecognition {
    request.requiresOnDeviceRecognition = true
} else {
    // Fall back to server-based or inform user
}
```

### Not handling the one-minute recognition limit

```swift
// ❌ DON'T: Start one long continuous recognition session
func startRecording() {
    // SFSpeechRecognizer tasks can be cut off after about 60 seconds
}

// ✅ DO: roll the segment before the limit and let cleanup end audio once
func scheduleRecognitionRollover() {
    recognitionTimer = Timer.scheduledTimer(withTimeInterval: 55, repeats: false) { [weak self] _ in
        self?.commitLatestPartialText()
        self?.stopTranscribing()     // owns endAudio(), tap removal, and task cancellation
        try? self?.startTranscribing()
    }
}
```
`SFSpeechRecognitionTask` exposes `finish()`, `cancel()`, `state`, and `error`;
do not invent task properties such as `recognitionTask` to restart work. Keep
the active `SFSpeechAudioBufferRecognitionRequest` in your manager and call
`endAudio()` from one cleanup path only.

### Treating SpeechAnalyzer input completion as session completion

```swift
// ❌ DON'T: Only finish the AsyncStream and expect result streams to close
inputBuilder.finish()

// ✅ DO: explicitly finish or cancel the analyzer session
let lastSampleTime = try await analyzer.analyzeSequence(inputSequence)
if let lastSampleTime {
    try await analyzer.finalizeAndFinish(through: lastSampleTime)
} else {
    try analyzer.cancelAndFinishNow()
}
```

### Duplicating volatile SpeechAnalyzer results

```swift
// ✅ Replace volatile text with the finalized result for the same audio range
for try await result in transcriber.results {
    if result.isFinal {
        volatileTranscript = AttributedString()
        finalizedTranscript.append(result.text)
    } else {
        volatileTranscript = result.text
    }
}
```

### Creating multiple simultaneous recognition tasks

```swift
// ❌ DON'T: Start a new task without canceling the previous one
func startRecording() {
    recognitionTask = recognizer.recognitionTask(with: request) { ... }
    // Previous task is still running — undefined behavior
}

// ✅ DO: Cancel existing task before creating a new one
func startRecording() {
    recognitionTask?.cancel()
    recognitionTask = nil
    recognitionTask = recognizer.recognitionTask(with: request) { ... }
}
```

## Review Checklist

- [ ] `NSSpeechRecognitionUsageDescription` is in Info.plist
- [ ] `NSMicrophoneUsageDescription` is in Info.plist (if using live audio)
- [ ] Authorization is requested before starting recognition
- [ ] `SFSpeechRecognizerDelegate` is set to handle `availabilityDidChange`
- [ ] Audio engine is stopped and tap removed when recognition ends
- [ ] `recognitionRequest.endAudio()` is called when done recording
- [ ] Previous `recognitionTask` is canceled before starting a new one
- [ ] `supportsOnDeviceRecognition` is checked before requiring on-device mode
- [ ] Partial results are handled separately from final (`isFinal`) results
- [ ] `SFSpeechRecognizer` one-minute/service limits are accounted for
- [ ] For iOS 26+: `AssetInventory` assets are installed before using `SpeechAnalyzer`
- [ ] For iOS 26+: `SpeechTranscriber.isAvailable` and locale support are checked
- [ ] For iOS 26+: live buffers are converted to the analyzer-compatible format
- [ ] For iOS 26+: analyzer sessions are explicitly finalized or canceled
- [ ] For iOS 26+: volatile results are replaced by finalized results, not duplicated

## References

- [Speech framework](https://sosumi.ai/documentation/speech)
- [SpeechAnalyzer](https://sosumi.ai/documentation/speech/speechanalyzer)
- [SpeechTranscriber](https://sosumi.ai/documentation/speech/speechtranscriber)
- [SpeechTranscriber.Preset](https://sosumi.ai/documentation/speech/speechtranscriber/preset)
- [DictationTranscriber](https://sosumi.ai/documentation/speech/dictationtranscriber)
- [SpeechDetector](https://sosumi.ai/documentation/speech/speechdetector)
- [SFSpeechRecognizer](https://sosumi.ai/documentation/speech/sfspeechrecognizer)
- [SFSpeechAudioBufferRecognitionRequest](https://sosumi.ai/documentation/speech/sfspeechaudiobufferrecognitionrequest)
- [SFSpeechURLRecognitionRequest](https://sosumi.ai/documentation/speech/sfspeechurlrecognitionrequest)
- [SFSpeechRecognitionResult](https://sosumi.ai/documentation/speech/sfspeechrecognitionresult)
- [SFSpeechRecognitionRequest](https://sosumi.ai/documentation/speech/sfspeechrecognitionrequest)
- [AssetInventory](https://sosumi.ai/documentation/speech/assetinventory)
- [Asking Permission to Use Speech Recognition](https://sosumi.ai/documentation/speech/asking-permission-to-use-speech-recognition)
- [Recognizing Speech in Live Audio](https://sosumi.ai/documentation/speech/recognizing-speech-in-live-audio)
- [Bring advanced speech-to-text to your app with SpeechAnalyzer](https://sosumi.ai/videos/play/wwdc2025/277)
Source

Creator's repository · dpearson2699/swift-ios-skills
View on GitHub ↗
Security

Security checks in progress
Results will appear here once audits complete
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk