Skip to content

tinix84/spymeet-ai

Repository files navigation

SpyMeet

Meeting audio transcription pipeline for Windows: live capture, audio enhancement, speech-to-text, LLM correction, and structured summaries. Designed for real meetings with noisy audio, domain-specific jargon, and multi-language support.

Features

  • Live audio capture — WASAPI loopback + microphone recording (stereo: system audio + mic)
  • Audio enhancement — EBU R128 normalization, spectral noise reduction, speech EQ, dynamic compression
  • 3 transcription backends — WhisperX (local CPU), OpenAI Whisper API, Groq API (free tier)
  • LLM 5-stage pipeline — Speaker resolution + correction + quality metrics + structured summary + meeting minutes (MoM)
  • Meeting minutes — Logseq-format MoM with Executive Summary, Action Items, and full transcript
  • Domain glossary — Custom terminology for accurate transcription of technical terms
  • Speaker diarization — Via pyannote (WhisperX backend + HuggingFace token)
  • Channel selection — Process mic, system audio, or both channels separately
  • Desktop app — System tray icon + floating recording widget with timer and VU meters
  • Dictation mode — Mic-only recording for voice prompts
  • Self-improving lexicon — Auto-learned corrections from transcript metrics with human-reviewed feedback loop
  • Claude Code slash commands/process-transcript, /run-pipeline, /update-lexicon, and more

Quick Start

Prerequisites

  • Windows 10/11
  • Python 3.13+ via conda (environment: social_env)
  • ffmpeg (conda install -c conda-forge ffmpeg)

Setup

# 1. Create and activate conda environment
conda create -n social_env python=3.13
conda activate social_env
conda install -c conda-forge ffmpeg

# 2. Install dependencies
pip install -r requirements.txt

# 3. Configure API keys
cp .env.example .env
# Edit .env with your API keys

# 4. (Optional) Check GPU for WhisperX CPU backend
.\scripts\check_gpu.ps1

Recording

# Launch desktop recorder (tray icon + floating widget)
python -m spymeet.recorder_app

# Auto-start meeting recording
python -m spymeet.recorder_app --mode meeting

# Auto-start dictation (mic only)
python -m spymeet.recorder_app --mode dictation

Transcription Pipeline

# Full pipeline: transcribe + LLM correction (Groq, recommended)
.\scripts\run.ps1 -Backend groq-api -Language it

# Transcribe only (skip LLM)
.\scripts\run.ps1 -Backend groq-api -Language it -SkipLLM

# LLM correction on existing transcripts
.\scripts\run.ps1 -LLMOnly -Input .\audio\02_txt_raw

# With domain glossary
.\scripts\run.ps1 -Backend groq-api -Language it -Glossary .\glossary.txt

Architecture

spymeet/recorder_app.py  (live capture: WASAPI loopback + mic -> stereo WAV)
    |
    v
spymeet/audio_enhance.py (normalize, denoise, EQ, compress -> _enhanced.wav)
    |
    v
spymeet/transcribe.py    (WhisperX CPU / OpenAI API / Groq API -> .txt + .json)
    |
    v
spymeet/llm_process.py   (Claude 5-stage: speaker resolution + correction + metrics + summary + MoM)

Project Structure

spymeet/                        # project root
├── CLAUDE.md                   # Claude Code instructions
├── README.md                   # This file
├── .env.example                # API key template
├── requirements.txt            # Python dependencies
├── glossary.txt                # Domain terminology
├── recorder.spec               # PyInstaller build config
│
├── spymeet/                    # Python package
│   ├── __init__.py
│   ├── record.py               # Core recording engine
│   ├── recorder_app.py         # Desktop entry point (tray + widget)
│   ├── recorder_widget.py      # Floating tkinter widget
│   ├── recorder_tray.py        # System tray icon
│   ├── audio_player.py         # Audio playback
│   ├── pipeline_runner.py      # Background subprocess executor
│   ├── diagnostics_window.py   # Audio diagnostics window
│   ├── audio_enhance.py        # Audio preprocessing
│   ├── transcribe.py           # Speech-to-text (3 backends)
│   └── llm_process.py          # LLM 5-stage pipeline
│
├── docs/                       # Documentation
│   ├── PRD.md                  # Product requirements + roadmap
│   ├── architecture.md         # System architecture
│   ├── competitive_analysis.md
│   ├── sprint_live_capture.md
│   └── README_WIN.md           # Setup guide (Italian)
│
├── scripts/                    # PowerShell helper scripts
│   ├── run.ps1                 # Pipeline launcher
│   ├── setup.ps1               # Automated setup
│   └── check_gpu.ps1           # GPU/CUDA detection
│
├── tests/                      # Test suite
└── audio/                      # Runtime data (gitignored)
    ├── 00_audio_raw/           # Raw recordings (.m4a)
    ├── 01_audio_proc/          # Enhanced audio (_enhanced.wav)
    ├── 02_txt_raw/             # Raw transcripts (.txt + .json)
    └── 03_txt_proc/            # LLM output (_corrected, _metrics, _summary, _mom)

API Keys

Copy .env.example to .env and fill in your keys:

Key Required for
groq_api Groq API transcription (free tier)
anthropic_api LLM correction + summary
openai_api OpenAI Whisper API (optional)
hf_token Speaker diarization (optional)

Documentation

License

Private project.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors