Skip to content

ozgtg797/solana-clipper

Repository files navigation

solana-clipper

License: MIT Python 3.10+

Turn long crypto-podcast recordings (Supertalks, Ownership, and similar Solana-ecosystem shows) into transcripts and clip-ready post-production briefs that a video editor can drop straight into CapCut.

Two stages, one repo:

  1. Transcriptiontranscribe.py runs every MP3 in audio/ through OpenAI Whisper, using vocabulary.txt as a proper-noun hint to reduce mishearings. Outputs output/<episode>/raw_transcript.txt and transcript.srt.
  2. Clip pipeline — driven by Claude Code (or any LLM coding agent) following prompts/master_prompt.md. It cleans the transcript, proposes 5–8 standalone clip candidates, and writes per-clip briefs (.md) plus rebased SRTs (.srt) into output/<episode>/<episode>_clips/NN_slug/. The <episode>_clips/ folder is then handed off to a video editor who already has the master video.

The vocabulary list (vocabulary.txt) is the longest-lived asset here — it grows with every episode as new project names, tokens, and people get confirmed.


Quickstart (macOS)

One line in Terminal sets up everything — Homebrew, Python, ffmpeg, the repo itself, dependencies, and your .env. You'll be prompted to paste your OpenAI API key once.

curl -fsSL https://raw.githubusercontent.com/ozgtg797/solana-clipper/main/install.sh | bash

The repo lands in ~/solana-clipper. Re-running the command later just pulls the latest.

If the repo is still private: the raw URL above won't work without auth. Either flip the repo to public (Settings → General → Change repository visibility — the repo holds no secrets, .env is gitignored), or have the user clone with the GitHub CLI: gh repo clone ozgtg797/solana-clipper && cd solana-clipper && bash install.sh.

Manual setup

If you'd rather install everything by hand:

git clone https://github.com/ozgtg797/solana-clipper.git
cd solana-clipper
pip install -r requirements.txt
cp .env.example .env
# then edit .env and put your OPENAI_API_KEY there

setup.sh is a separate health-check that verifies your machine has everything ready. See SETUP.md for the full walkthrough (Serbian).


Usage

Transcribe new episodes

Drop MP3s into audio/, then:

python3 transcribe.py

For each MP3 the script writes output/<stem>/raw_transcript.txt and output/<stem>/transcript.srt. Episodes that already have both files are skipped, so re-running is safe.

MP3s over 25 MB need to be compressed before the Whisper API will accept them. Known-good recipe for clean speech: ffmpeg -i input.mp3 -ac 1 -ar 16000 -b:a 32k output.mp3

Make clips for an episode

Open the repo with Claude Code and ask it to "make clips for <episode>" (or napravi klipove, pokreni clips). It reads prompts/master_prompt.md and runs the pipeline:

  1. Pre-analysis — flags ambiguous proper nouns and asks you to confirm spellings (interactive).
  2. After confirmation, appends new terms to vocabulary.txt.
  3. Writes cleaned.srt (intermediate).
  4. Proposes a clip shortlist; you pick which to keep.
  5. Writes output/<episode>/<episode>_clips/NN_slug/NN_slug.md + NN_slug.srt for each chosen clip.

The final folder is what you ship to the video editor.


Layout

solana-clipper/
├── transcribe.py            # Whisper -> raw_transcript.txt + transcript.srt
├── vocabulary.txt           # proper-noun hints (constantly updated)
├── prompts/
│   └── master_prompt.md     # rules the LLM follows for the clips pipeline
├── CLAUDE.md                # short operational guide for Claude Code
├── SETUP.md                 # local-setup walkthrough (Serbian)
├── setup.sh                 # one-shot dep installer (macOS)
├── audio/                   # gitignored — drop your MP3s here
└── output/                  # gitignored — generated artifacts per episode

Notes

  • The cleaned SRT is an intermediate artifact, not finished captions. Caption boundaries that slice through stutters or half-words still need a human pass with eyes/ears on the source video. Don't ship cleaned.srt directly.
  • English is the default transcription language. Flag Serbian (or other) episodes explicitly when running.
  • This tool is built for the Solana podcast ecosystem but the pipeline (Whisper + vocabulary hints + LLM-driven clip extraction) generalizes to any long-form spoken-word content.

About

Whisper transcription + LLM-driven clip extraction pipeline for Solana-ecosystem crypto podcasts

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors