Meeting audio transcription pipeline for Windows: live capture, audio enhancement, speech-to-text, LLM correction, and structured summaries. Designed for real meetings with noisy audio, domain-specific jargon, and multi-language support.
- Live audio capture — WASAPI loopback + microphone recording (stereo: system audio + mic)
- Audio enhancement — EBU R128 normalization, spectral noise reduction, speech EQ, dynamic compression
- 3 transcription backends — WhisperX (local CPU), OpenAI Whisper API, Groq API (free tier)
- LLM 5-stage pipeline — Speaker resolution + correction + quality metrics + structured summary + meeting minutes (MoM)
- Meeting minutes — Logseq-format MoM with Executive Summary, Action Items, and full transcript
- Domain glossary — Custom terminology for accurate transcription of technical terms
- Speaker diarization — Via pyannote (WhisperX backend + HuggingFace token)
- Channel selection — Process mic, system audio, or both channels separately
- Desktop app — System tray icon + floating recording widget with timer and VU meters
- Dictation mode — Mic-only recording for voice prompts
- Self-improving lexicon — Auto-learned corrections from transcript metrics with human-reviewed feedback loop
- Claude Code slash commands —
/process-transcript,/run-pipeline,/update-lexicon, and more
- Windows 10/11
- Python 3.13+ via conda (environment:
social_env) - ffmpeg (
conda install -c conda-forge ffmpeg)
# 1. Create and activate conda environment
conda create -n social_env python=3.13
conda activate social_env
conda install -c conda-forge ffmpeg
# 2. Install dependencies
pip install -r requirements.txt
# 3. Configure API keys
cp .env.example .env
# Edit .env with your API keys
# 4. (Optional) Check GPU for WhisperX CPU backend
.\scripts\check_gpu.ps1# Launch desktop recorder (tray icon + floating widget)
python -m spymeet.recorder_app
# Auto-start meeting recording
python -m spymeet.recorder_app --mode meeting
# Auto-start dictation (mic only)
python -m spymeet.recorder_app --mode dictation# Full pipeline: transcribe + LLM correction (Groq, recommended)
.\scripts\run.ps1 -Backend groq-api -Language it
# Transcribe only (skip LLM)
.\scripts\run.ps1 -Backend groq-api -Language it -SkipLLM
# LLM correction on existing transcripts
.\scripts\run.ps1 -LLMOnly -Input .\audio\02_txt_raw
# With domain glossary
.\scripts\run.ps1 -Backend groq-api -Language it -Glossary .\glossary.txtspymeet/recorder_app.py (live capture: WASAPI loopback + mic -> stereo WAV)
|
v
spymeet/audio_enhance.py (normalize, denoise, EQ, compress -> _enhanced.wav)
|
v
spymeet/transcribe.py (WhisperX CPU / OpenAI API / Groq API -> .txt + .json)
|
v
spymeet/llm_process.py (Claude 5-stage: speaker resolution + correction + metrics + summary + MoM)
spymeet/ # project root
├── CLAUDE.md # Claude Code instructions
├── README.md # This file
├── .env.example # API key template
├── requirements.txt # Python dependencies
├── glossary.txt # Domain terminology
├── recorder.spec # PyInstaller build config
│
├── spymeet/ # Python package
│ ├── __init__.py
│ ├── record.py # Core recording engine
│ ├── recorder_app.py # Desktop entry point (tray + widget)
│ ├── recorder_widget.py # Floating tkinter widget
│ ├── recorder_tray.py # System tray icon
│ ├── audio_player.py # Audio playback
│ ├── pipeline_runner.py # Background subprocess executor
│ ├── diagnostics_window.py # Audio diagnostics window
│ ├── audio_enhance.py # Audio preprocessing
│ ├── transcribe.py # Speech-to-text (3 backends)
│ └── llm_process.py # LLM 5-stage pipeline
│
├── docs/ # Documentation
│ ├── PRD.md # Product requirements + roadmap
│ ├── architecture.md # System architecture
│ ├── competitive_analysis.md
│ ├── sprint_live_capture.md
│ └── README_WIN.md # Setup guide (Italian)
│
├── scripts/ # PowerShell helper scripts
│ ├── run.ps1 # Pipeline launcher
│ ├── setup.ps1 # Automated setup
│ └── check_gpu.ps1 # GPU/CUDA detection
│
├── tests/ # Test suite
└── audio/ # Runtime data (gitignored)
├── 00_audio_raw/ # Raw recordings (.m4a)
├── 01_audio_proc/ # Enhanced audio (_enhanced.wav)
├── 02_txt_raw/ # Raw transcripts (.txt + .json)
└── 03_txt_proc/ # LLM output (_corrected, _metrics, _summary, _mom)
Copy .env.example to .env and fill in your keys:
| Key | Required for |
|---|---|
groq_api |
Groq API transcription (free tier) |
anthropic_api |
LLM correction + summary |
openai_api |
OpenAI Whisper API (optional) |
hf_token |
Speaker diarization (optional) |
Private project.