Skip to content

AhsanRiaz786/clutch-ai

Repository files navigation

⚡ Clutch.ai

Real-time interview co-pilot that listens to call audio, transcribes questions, classifies intent, retrieves relevant context, and shows concise live hints in a floating overlay.

Repository: AhsanRiaz786/clutch-ai
Course: CS-419 Deep Learning (NUST SEECS, Spring 2026)


What It Does

Clutch.ai runs this online pipeline:

  1. Capture audio from your selected input device (mic or loopback).
  2. Detect speech segments (RMS VAD; optional GRU VAD model if trained).
  3. Transcribe with Faster-Whisper (base.en, CPU int8).
  4. Classify transcript as technical_question, personal_behavioral, or noise.
  5. Retrieve supporting context from ChromaDB (+ optional reranker).
  6. Generate short interview-ready guidance via Groq (default 70B) with Ollama fallback.
  7. Stream output to a floating PyQt5 overlay.

Key Components

  • pipeline.py — app entry point and orchestration
  • audio/capture.py — capture + VAD + transcription
  • audio/devices.py — input device resolution (--list-devices support)
  • classifier/predict.py — BiLSTM/MLP classifier inference
  • rag/retriever.py — Chroma retrieval + rerank hook
  • llm/hint_gen.py — prompting and Groq/Ollama generation
  • ui/overlay.py — stealth/demo overlay behavior

Setup

1) Install dependencies

git clone https://github.com/AhsanRiaz786/clutch-ai.git
cd clutch-ai
pip install -r requirements.txt

2) Configure environment

cp .env.example .env

Update .env with your values:

  • GROQ_API_KEY — required for Groq
  • LLM_MODEL — defaults to llama-3.3-70b-versatile
  • CLUTCH_INPUT_DEVICE — optional input selector (e.g. blackhole)
  • OVERLAY_DEMO_MODE
    • 0 = stealth (hidden from screen capture on macOS)
    • 1 = demo mode (capture-visible)
  • MIN_CLASSIFIER_CONFIDENCE — default 65

Audio Input Modes

Default mic mode

Leave CLUTCH_INPUT_DEVICE unset (or set to default/mic).

Meeting/browser loopback mode (recommended for Meet/Zoom playback)

List devices:

python audio/capture.py --list-devices

Then set CLUTCH_INPUT_DEVICE to a matching name fragment or index.

macOS (BlackHole 2ch)

  1. Install BlackHole 2ch.
  2. In Audio MIDI Setup, create a Multi-Output Device with:
    • BlackHole 2ch
    • your speakers/headphones
  3. In System Settings → Sound → Output, select that Multi-Output device.
  4. Set .env:
    • CLUTCH_INPUT_DEVICE=blackhole

Windows

Use Stereo Mix (or equivalent loopback input), then set CLUTCH_INPUT_DEVICE accordingly.


Data and Model Prep

Add personal/context documents:

  • data/notes/ — notes/docs
  • data/code/ — code files
  • data/resume/ — resume context

Build vector DB and (optionally) retrain models:

python ingest/ingest.py
python classifier/train.py
python classifier/lstm_classifier.py
python classifier/finetune_embeddings.py

Run

python pipeline.py

Expected startup signals:

  • [UI] Overlay mode: DEMO (capture-visible) or STEALTH (capture-hidden)
  • [PIPELINE] Prerequisites OK
  • [AUDIO] VAD capture ready — listening for speech ...

Demo vs Stealth Overlay

  • Class demo: set OVERLAY_DEMO_MODE=1, and share entire screen if your meeting app excludes floating overlays in window-only share.
  • Interview stealth: set OVERLAY_DEMO_MODE=0 (macOS applies NSWindowSharingNone).

Fallback Behavior

  • If Groq key is missing/invalid, app falls back to local Ollama.
  • If Ollama is not running, app returns a safe generic fallback response.

Start Ollama fallback:

ollama pull llama3.2:3b
ollama serve

Evaluation (Optional)

python eval/eval_retrieval.py
python eval/eval_latency.py