Livcap

A live caption app for macOS.

Privacy first, light weight, friendly user experience for macOS users. What happens on your device, stays on your device.

Highlights

Privacy First – No cloud, no analytics, no ads, no internet required, and no screen capture access.
Lightweight & Fast – Runs efficiently, with up to 1.7× faster word-level performance, 10% latency reduce compared to default live caption.
Minimalist Design – One-click on/off, no distractions. Less is more.
Open Source – Free and transparent.

Promo.mp4

Livcap.demo0.mp4

Livcap-demo1.mp4

Livcap-demo2.mp4

Released Notes

🎉 v1.0 Now Available on the App Store!

Download Livcap from the Mac App Store

Development Introduction

How good performance is Livcap?

Livcap outperforms macOS's native Live Caption with significant improvements:

✅ 1.7x faster word-level lead rate
✅ 10% lower latency
✅ More efficient processing with better resource utilization

See detailed comparison benchmarks in livcapComparision.md

Technical Approach

Our performance gains come from three key optimizations:

🎯 Single-pass inference - Uses one SFSpeechRecognizer call instead of multiple inferences observed in native Live Caption

⚡ Smart downsampling - Converts audio from 48kHz to 16kHz before processing, maintaining quality while reducing computational overhead

🔇 VAD-based silence skipping - Voice Activity Detection prevents unnecessary processing during silent periods, saving resources and improving responsiveness

Why is Livcap Privacy-First?

Complete local processing with zero external dependencies:

🔒 No cloud services - Built entirely on Apple's native SFSpeechRecognizer framework, ensuring all speech processing happens locally on your device

🎵 Direct audio access - Uses CoreAudio Tap to capture system audio directly from the buffer, eliminating the need for ScreenCaptureKit or screen recording permissions

🛡️ Zero data transmission - Your conversations never leave your Mac - no servers, no analytics, no tracking

Development

Development History

History Highlight

Compare the whisper.cpp and built-in SFSpeechRecognizer.
3 Approaches audio arch:
- VAD-Based Silence Detection
- 5-Second Fixed Sliding Windows
- 30-Second WhisperLive-Inspired Buffer

Permission issue:

tccutil reset All com.xxx.xx

Current Implementation:

Based on SFSpeechRecognizer from the apple built-in framework.

3 Approaches Considerations History

Approach 1: VAD-Based Silence Detection ✅ **Most Reliable**

Files: BufferManager.swift, VADProcessor.swift, EnhancedVAD.swift

How it works:

Accumulates speech until 3 consecutive silence frames
Triggers inference on speech end or 15s maximum
RMS threshold (0.01) with asymmetric hysteresis

Characteristics: Event-driven, variable buffer, speech-only segments

Status: ✅ Best balance of quality and usability

Limitations: Variable latency, potential word cutoff, VAD tuning needed

Approach 2: 5-Second Sliding Windows ❌ **Word-Level Chaos**

Files: ContinuousStreamManager.swift, TranscriptionStabilizationManager.swift

How it works:

5s sliding window with 1s stride (4s overlap)
LocalAgreement algorithm for word-level stabilization
Temporal overlap analysis for conflicts

Characteristics: Fixed 1s intervals, 5s buffer, word-level matching

Status: ❌ Overlap analysis creates transcription instability

Limitations: Complex word matching, frequent text changes, poor readability

Approach 3: 30-Second WhisperLive ❌ **High Latency**

Files: WhisperLiveContinuousManager.swift, WhisperLiveAudioBuffer.swift

How it works:

Continuous 30s audio buffer
1s inference intervals with smart trimming
Pre-inference VAD for speech extraction

Characteristics: Fixed 1s intervals, 30s context, maximum Whisper context

Status: ❌ >2s latency unsuitable for real-time

Limitations: Excessive latency, high overhead, memory intensive

Current Conclusions

After extensive testing of all three approaches:

Approach 1 (VAD-Based) is currently the most practical solution, providing the best balance of quality and usability despite variable latency.
Approach 2 (5s Sliding) suffers from word-level chaos due to complex overlap analysis, making transcriptions unstable and hard to read.
Approach 3 (30s WhisperLive) provides excellent context but has unacceptable latency (>2s) for real-time applications.

Comparison Chart

Aspect	Approach 1: VAD-Based	Approach 2: 5s Sliding	Approach 3: 30s WhisperLive
Trigger	Silence detection	Fixed 1s intervals	Fixed 1s intervals
Buffer Size	Variable (up to 15s)	Fixed 5s sliding	Variable (0-30s)
Overlap	None	4s temporal overlap	Continuous context
Latency	Variable (silence-dependent)	Predictable 1s	Predictable 1s
Context	Speech segments only	5s windows	Maximum 30s context
Stabilization	None	LocalAgreement	Pre-inference VAD

Contributing

We welcome contributions! Please read our Contributing Guidelines before submitting PRs.

Key Requirements:

Privacy first (no data collection/network features)
Lightweight performance (maintain efficiency)
Simple UI design (minimal interface)
Follow PR template with motivation, code summary, AI assistance docs, and demo(optional)

Future Work

Isuse Solve:

invalid display identifier 37D8832A-2D66-02CA-B9F7-8F30A301B230 when happend at the monitor changing.

Compare new API SpeechAnalyzer when macOS 26 is released (non-beta). Nov 2025.
Implement MLX whisper and compare performance. Oct 2025.
- Add KV cache support.
- Tokenizer support.
- Quantization support for speed up
Explore hybrid approaches combining the best aspects of each method
Investigate adaptive buffer sizing based on speech patterns
Optimize VAD parameters for different acoustic environments

Technical Notes

MLX-Swift only supports safetensors files. Use Utilities/convert.py to convert .pt files to .safetensors format.

Required Files:

Livcap/CoreWhisperCpp/ggml-base.en.bin
Livcap/CoreWhisperCpp/ggml-tiny.en.bin
Livcap/CoreWhisperCpp/ggml-base.en-encoder.mlmodelc
Livcap/CoreWhisperCpp/ggml-tiny.en-encoder.mlmodelc
Livcap/CoreWhisperCpp/whisper.xcframework

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
Livcap.xcodeproj		Livcap.xcodeproj
Livcap		Livcap
LivcapTests		LivcapTests
LivcapUITests		LivcapUITests
Utilities		Utilities
.gitignore		.gitignore
CONTRIBUTION.md		CONTRIBUTION.md
LICENSE		LICENSE
Livcap-Architecture.md		Livcap-Architecture.md
livcapComparision.md		livcapComparision.md
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Livcap

Highlights

Released Notes

Development Introduction

How good performance is Livcap?

Technical Approach

Why is Livcap Privacy-First?

Development

History Highlight

Permission issue:

Current Implementation:

3 Approaches Considerations History

Current Conclusions

Contributing

Future Work

Isuse Solve:

Technical Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Livcap

Highlights

Released Notes

Development Introduction

How good performance is Livcap?

Technical Approach

Why is Livcap Privacy-First?

Development

History Highlight

Permission issue:

Current Implementation:

3 Approaches Considerations History

Current Conclusions

Contributing

Future Work

Isuse Solve:

Technical Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages