CLI tool for adding TikTok-style burned-in captions to vertical MP4 videos.
It transcribes speech with whisperkit-cli, groups timed words into short caption chunks, highlights the currently spoken word, and renders the result directly into the final MP4 so the video is ready to upload to TikTok or similar platforms.
Burned-in captions always require re-encoding the video stream. This project keeps audio untouched and uses a high-quality H.264 output profile by default.
The caption renderer does not depend on ffmpeg subtitle filters such as ass or drawtext. Those filters are missing in some local builds. Instead, tiktoksubs generates transparent overlay frames in Go and composites them with ffmpeg, which makes the pipeline more portable.
whisperkit-cliavailable inPATHffmpegandffprobeavailable inPATH- Go
1.26+to build from source
go build -o tiktoksubs ../tiktoksubs -input 20260311_082403.mp4Default output:
20260311_082403_captioned.mp4
./tiktoksubs \
-input input.mp4 \
-output output.mp4 \
-language en \
-quality high \
-font "Verdana Bold"-input input MP4 file
-output output MP4 file
-language spoken language for WhisperKit, for example en or de
-model WhisperKit model name
-font font name or path to a .ttf/.otf file
-quality high, smaller, or lossless
-keep-temp keep temporary transcription and overlay files
-uppercase render captions in uppercase
-max-words maximum words per caption block
-max-duration maximum caption duration in seconds
high: good default for upload-ready output with low visible quality losssmaller: smaller file size with stronger compressionlossless: lossless H.264 output, much larger files
- Probe the source video with
ffprobe. - Transcribe the video audio with
whisperkit-cliusing word timestamps. - Group words into short caption blocks optimized for short-form video.
- Render caption frames with outline, shadow, centered layout, and active-word highlight.
- Overlay the transparent caption video on top of the source video with
ffmpeg.
Run tests:
go test ./...Build a local binary:
go build -o tiktoksubs .- Audio is copied without re-encoding.
- Video is always re-encoded because the captions are burned in.
- The repository includes a sample vertical MP4 for local testing.