"Thoughts become things." — Kai Greene
Named for the bodybuilder and three-time Mr. Olympia runner-up Kai Greene, whose line "thoughts become things" is the whole point. kAI the tool turns thinking — driving monologues, livestream riffs, newsletter drafts — into durable, queryable things.
Kai ingests three content sources, transcribes or cleans each via Gemini, and files everything into one Google Doc per month. Point NotebookLM at the Kai Transcripts folder and you can query years of your own thinking by topic, month, or cross-source evolution.
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Apple Photos │ │ YouTube channel │ │ beehiiv RSS │
│ (selfie MOVs) │ │ (past streams) │ │ (newsletter) │
└────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘
│ osxphotos │ yt-dlp │ encoding/xml
│ + ffmpeg (audio) │ + auto-captions │ + HTML strip
▼ ▼ ▼
┌──────────────────────────────────────────────────────────────┐
│ Gemini 2.5 Flash │
│ audio → transcript+summary | captions → clean+summary │
│ | prose → summary only │
└───────────────────────────┬──────────────────────────────────┘
▼
┌──────────────────────────────────────────────────────────────┐
│ Google Drive → `Kai Transcripts` folder │
│ 1 Doc per YYYY-MM, mixed sources tagged │
└───────────────────────────┬──────────────────────────────────┘
▼
┌───────────────┐
│ NotebookLM │
└───────────────┘
See KAI_PLAN.md for the full project vision across all phases.
brew install go pipx yt-dlp ffmpeg
pipx install osxphotos- Go 1.24+ (tested on 1.25–1.26).
- osxphotos shells out to macOS PhotoKit. On first use, grant Terminal (or iTerm) access under System Settings → Privacy & Security → Photos.
- ffmpeg extracts audio from iPhone videos before Gemini upload. Without it the Apple-Photos path won't work (iPhone .MOV files are often 1 GB+ and Gemini's file processor chokes).
- Optimize Mac Storage (Settings → Photos) should be ON before a large batch, otherwise ~200 GB of iCloud originals will land in your Photos library.
Get a key at https://aistudio.google.com/apikey:
printf '%s' '<YOUR_KEY>' > gemini_api_key.txt
chmod 600 gemini_api_key.txtOr set GEMINI_API_KEY in your environment.
Enable billing on the Google Cloud project that owns the key — the free tier caps at 20 requests/day, which is not enough for any real batch. Paid tier for gemini-2.5-flash is well under $0.30 per hour of audio and $0.001 per newsletter post.
- https://console.cloud.google.com/ → create project
kai-poc. - APIs & Services → enable Google Drive API and Google Docs API.
- OAuth consent screen → External → add your email as a test user.
Scopes:
https://www.googleapis.com/auth/drive.filehttps://www.googleapis.com/auth/documents
- Credentials → Create Credentials → OAuth client ID → Desktop app → download the JSON.
- Save it here as
client_secrets.json.
On first kai process run, a browser opens for consent. The token is cached in google_token.json and auto-refreshed on expiry.
go build -o kai ./cmd/kai./kai scan # Apple Photos → selfie_videos.csv
./kai process --limit 10 --allow-partial-monthly
./kai youtube scan # YouTube past livestreams → youtube_videos.csv
./kai youtube process --limit 5 --allow-partial-monthly
./kai newsletter scan # beehiiv RSS → newsletter_posts.csv
./kai newsletter process --limit 5 --allow-partial-monthly
./kai stats # tag landscape + per-month counts
./kai tail-run # follow tmp/kai.log from a second shellEdit the CSVs between scan and process: flip process from yes to no for rows you want to skip. All three sources write to the same monthly Google Docs, tagged with **Source:** apple-photos | youtube | newsletter.
| Command | Flag | Default | Notes |
|---|---|---|---|
| scan | --min-duration |
10m |
Videos shorter than this are dropped. |
| scan | --max-duration |
40m |
Videos longer than this are dropped. |
| scan | --landscape-too |
off | By default only portrait-aspect videos are included. |
| process | --limit |
10 |
Max videos this run. |
| process | --allow-partial-monthly |
off | Generate a monthly overview even when a month is not fully processed yet. |
| process | --dry-run |
off | Print what would be processed (count, hours, est cost) without touching iCloud/Gemini/Docs. |
| process | --model |
gemini-2.5-flash |
Override via flag or GEMINI_MODEL env. |
| youtube scan | --channel-url |
https://www.youtube.com/@EnterpriseVibeCode/streams |
Point at any channel's past-streams URL. |
| youtube process | --limit / --allow-partial-monthly / --model |
same defaults | Uses auto-captions; Gemini does cleanup + summary only. |
| newsletter scan | --feed-url |
EVC beehiiv feed | Fetches an RSS 2.0 feed; works with any beehiiv or similar publication. |
| newsletter process | --limit / --allow-partial-monthly / --model / --feed-url |
same defaults | Strips HTML locally; Gemini does summary+tags only (cheapest source). |
- Scan. osxphotos' built-in
--selfieflag does not match videos (verified against a 4,572-movie library: 0 selfie-flagged movies). Kai pulls every movie viaosxphotos query --only-movies --jsonand filters client-side onoriginal_height > original_width(portrait) andexif_info.duration ∈ [min, max]. osxphotos emits Python'sInfinity/NaNas bare tokens in its JSON; Kai sanitizes these before decoding. - Download. Every candidate is typically
ismissing=true(iCloud-only). Kai shells toosxphotos export --uuid <U> --only-movies --download-missing --use-photokit --skip-original-if-edited tmp/downloads/<U>. Files are stripped from the subdir hidden.osxphotos_export.dbstate. - Audio extraction. iPhone
.MOVfiles are often 1 GB+ at 15 Mbps — uploads succeed but Gemini's file processor returnsThe file failed to be processed. Kai usesffmpeg -vn -acodec copyto extract the AAC audio track (about 1% the size, ~7 MB per 10 min), then uploads that. Transcription quality is the same for talking-head content. - Transcribe.
google.golang.org/genaiFile API upload → poll untilFileStateActive→GenerateContentwithprompts.Transcription(verbatim with[MM:SS]paragraph markers,## Key Topics,## Open Questions). - Summarize. Second
GenerateContentcall withprompts.Summaryreturns{summary, tags}JSON. - File. Append a timestamped entry to the monthly Google Doc for the video's
date, update the stable-marker header, delete local audio + video.
yt-dlp --skip-download --dump-json <channel>/streamslists all past streams; Kai filters onwas_live=true && is_live=false.yt-dlp --skip-download --write-auto-subs --sub-lang en --sub-format vttfetches auto-captions per video. If captions aren't available yet (YouTube lag for just-ended streams), the entry fails with stagecaptions_unavailableand can be retried later.- VTT is parsed locally, collapsing YouTube's rolling-caption duplication into plain text with
[HH:MM:SS]anchors. - Gemini call #1 (
prompts.CaptionClean, text-only) structures into the same Markdown format as driving videos. Call #2 is the standard summary+tags.
- HTTP GET the feed, parse RSS 2.0 via stdlib
encoding/xml, readcontent:encodedfull HTML. - Local HTML→text walker using
golang.org/x/net/html(no new deps — pulled in transitively by the Google API client). - One Gemini call:
prompts.Summary→{summary, tags}. No cleanup pass; newsletter prose is already clean.
Each doc opens with a marker-delimited block:
<!-- kai:header:start -->
# Thoughts — January 2026
**Recordings:** 23 **Total Duration:** 51:57:11
<!-- kai:header:end -->
On every header update, Kai walks the Docs Document.Body.Content to find the markers' UTF-16 indices and BatchUpdate { DeleteContentRange + InsertText } within them. Entries below are untouched. Legacy docs (no markers) are upgraded on first contact.
process_log.jsonis keyed by source-specific ID (Apple UUID / YouTube video ID / newsletter URL). Re-running anyprocesscommand skips IDs already present.monthly_docs.jsontracks per-month doc ID + running entry count + total duration.run_summary_<UTC>.jsonis written at the end of everyprocessrun with success/failure breakdown, cost estimate, and failing stages.
Running two process commands in parallel against the same state files races — the last writer clobbers the other's updates. The Google Docs themselves are safe (Docs API serializes BatchUpdates per doc), but process_log.json / monthly_docs.json can end up stale. Run one at a time, or use cmd/rebuild-state (below) to reconcile.
| Command | Purpose |
|---|---|
go run ./cmd/preview |
Export cached Photos-library thumbnails for every scan candidate into tmp/previews/ and open Finder, so you can visually sanity-check the filter before spending Gemini budget. |
go run ./cmd/inspect <docID> |
Dump a single Google Doc's body to stdout. Useful for duplicate-entry spot checks. |
go run ./cmd/rebuild-state |
Walk every doc in Kai Transcripts, re-parse entries, and rewrite process_log.json + monthly_docs.json from what's actually in Drive. Run this after a concurrent-run mistake or any time you suspect state drift. |
./kai tail-run |
tail -f-style follow of tmp/kai.log from another shell during long process runs. |
- https://notebooklm.google.com/ → new notebook.
- Add sources → Google Drive → search
Thoughts —→ add each monthly doc. - Syncing after new ingestion: don't re-add — hover the existing source and click the refresh icon. Re-adding creates duplicate sources.
- Ask questions. Log what's useful / misleading / flat in
VALIDATION_NOTES.md.
| File | Purpose | In .gitignore |
|---|---|---|
selfie_videos.csv |
Apple-Photos scan output; user-editable. | ✓ |
youtube_videos.csv |
YouTube-streams scan output; user-editable. | ✓ |
newsletter_posts.csv |
Newsletter scan output; user-editable. | ✓ |
process_log.json |
Per-item state for idempotent re-runs. | ✓ |
monthly_docs.json |
Monthly Doc ID registry. | ✓ |
run_summary_*.json |
Per-run audit trail. | ✓ |
google_token.json |
OAuth token cache. | ✓ |
gemini_api_key.txt |
Gemini key (alt to env var). | ✓ |
client_secrets.json |
Google OAuth desktop client. | ✓ |
tmp/downloads/… |
Per-video iCloud download scratch (auto-cleaned per item). | ✓ |
tmp/yt/… |
Per-video caption download scratch. | ✓ |
tmp/previews/… |
Preview thumbnails from cmd/preview. |
✓ |
tmp/kai.log |
Tee of long-running command output for tail-run. |
✓ |
- osxphotos
--selfieflag does not match videos. Scan relies on portrait + duration heuristics; spot-check viago run ./cmd/preview. - iPhone videos are too large for Gemini video. Kai's apple-photos path extracts audio via ffmpeg first. Audio is ~1% the size and loses no verbal content.
- iCloud downloads go through your Photos library. Turn on Optimize Mac Storage for large batches so macOS auto-evicts originals under disk pressure.
- Gemini free tier caps at 20 requests/day. You'll want paid tier to run anything meaningful. Billing-propagation sometimes takes a few minutes after linking.
- Concurrent
processruns race state files. Run one at a time; usecmd/rebuild-stateto recover. - NotebookLM re-adds do not deduplicate. Use the per-source refresh icon to sync, not "Add from Drive" again.
- Gemini output-token cap can truncate summary JSON. Kai's summary parser salvages the partial summary text and proceeds with empty tags rather than failing the entry.