Skip to content

hastagAB/Khabri-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🕵️ Khabri AI

Your personal khabri (खबरी) — the informant who sees everything, hears everything, and never forgets.

A daemon that lives on your laptop, captures everything you see and hear, and becomes your always-on personal AI assistant powered by a local or cloud LLM.

╔═══════════════════════════════════════════════════════════╗
║  Khabri AI — Captures everything. Forgets nothing.        ║
║  Sees your screen. Hears your meetings.                   ║
║  Tracks your tasks. Knows your context.                   ║
╚═══════════════════════════════════════════════════════════╝

What It Does

Capability Description
Screen Capture + OCR Screenshots every 15s, extracts text via EasyOCR, skips unchanged screens
Audio Transcription Captures system audio (WASAPI loopback) and transcribes with local Whisper — hears your meetings without recording
Clipboard Monitoring Watches for new text copied to clipboard
Window Tracking Tracks active app/window, auto-detects meetings (Teams, Zoom, Webex)
Semantic Search ChromaDB-backed vector search across all captured content
Meeting Intelligence Auto-detects meetings, summarizes when they end, extracts action items
Task Management Auto-extracts tasks from conversations, tracks status, deadline reminders
Daily Digest End-of-day markdown report: accomplishments, meetings, tasks, app usage
Focus Analysis Tracks app switches, detects distraction patterns, productivity coaching
Timeline Recall Reconstructs what happened at a specific time or around a topic
Knowledge Synthesis Connects information across meetings, screen activity, and clipboard
Proactive Alerts Overdue tasks, stale items, context-switching warnings

Architecture

src/
├── main.py                ← CLI daemon (interactive chat + background capture)
├── tray.py                ← System tray mode (taskbar icon, right-click menu)
├── capture/
│   ├── screen.py          ← Screenshot + OCR + diff detection + multi-monitor
│   ├── audio.py           ← WASAPI loopback + Whisper transcription
│   ├── clipboard.py       ← Clipboard monitoring (Win32 API)
│   └── window.py          ← Active window tracking + meeting detection
├── models/
│   └── local_llm.py       ← GPT-5.2 primary + Ollama fallback + retry
├── storage/
│   └── database.py        ← SQLite (events, conversations, tasks, digests, insights)
├── indexing/
│   ├── embeddings.py      ← ChromaDB semantic search
│   └── indexer.py         ← Background indexer + auto-pruning
├── assistant/
│   ├── chat.py            ← Context-aware chat engine (7 capabilities)
│   ├── commands.py        ← 15 slash commands
│   ├── daily_digest.py    ← End-of-day markdown report generator
│   └── proactive.py       ← Autonomous intelligence engine
└── utils/
    └── config.py          ← YAML config loader

Quick Start

Prerequisites

  • Python 3.11+
  • Windows (uses WASAPI for audio capture, Win32 API for clipboard/window)
  • Ollama (optional, for offline fallback): ollama.com

Install

cd khabri-ai
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt

Configure

Edit config/settings.yaml:

llm:
  # Primary: any OpenAI-compatible endpoint
  provider: openai_compat
  base_url: "http://localhost:3741/v1"  # your endpoint
  api_key: "your-api-key"              # or set LLM_API_KEY env var
  model: "gpt-4o"                      # model name

  # Fallback: local Ollama (auto-switches when primary is down)
  fallback:
    provider: ollama
    base_url: "http://localhost:11434"
    model: "llama3"

Run

# Interactive CLI mode — chat + captures in background
python -m src.main

# System tray mode — lives in your taskbar silently
python -m src.tray

Commands

Command Description
/help Show all commands
/status Capture stats, event counts, open tasks
/summary [min] Summarize recent activity (default: 60 min)
/actions Extract action items from meeting audio
/tasks Show all open tasks
/done <id> Mark a task as completed
/search <query> Text search across captured content
/recall <topic> Reconstruct timeline for a topic or time
/synth <topic> Synthesize knowledge from all sources
/focus [hours] Analyze focus & productivity (default: 4h)
/digest Generate today's daily digest now
/insights Show proactive intelligence insights
/history Show conversation history
/clear Clear conversation history
/quit Exit the assistant

How It Captures Meetings

Khabri does not join or record meetings. Instead:

  1. Window tracker detects when Teams/Zoom/Webex is the active window
  2. Audio capture grabs your system audio output via WASAPI loopback (the same audio your speakers/headphones play)
  3. Whisper transcribes the audio locally — no data is sent anywhere
  4. Proactive engine detects the meeting ended and auto-generates a summary + action items
  5. Audio is processed in memory and never saved to disk

No one in the meeting knows. No bots join. No recordings are created.

Data Flow

Screen ─────┐
Audio ──────┤
Clipboard ──┼──▶ SQLite DB ──▶ Background Indexer ──▶ ChromaDB
Window ─────┘         │                                   │
                      │                                   │
                      ▼                                   ▼
                 Daily Digest              Semantic Search for Chat
                 Proactive Engine          Timeline Recall
                 Task Extraction           Knowledge Synthesis

Storage

All data is stored locally:

Store Path Purpose
SQLite data/db/assistant.db Events, conversations, tasks, digests, insights
ChromaDB data/db/chroma/ Vector embeddings for semantic search
Digests data/digests/ Daily markdown reports

Auto-pruning keeps the database from growing unbounded (default: 500K events max).

Configuration Reference

All settings are in config/settings.yaml:

Section Key Settings
llm Provider, model, temperature, timeout, retries, fallback
screen Interval (15s), OCR toggle, multi-monitor (-1=all), diff threshold
audio Chunk duration (30s), Whisper model size, silence threshold
clipboard Poll interval (2s), max content length
window Poll interval (3s), meeting app detection
storage DB path, ChromaDB path, max events, prune interval
daily Digest hour (18:00), output directory, retention (90 days)
proactive Check interval (15 min), meeting/focus/pattern/reminder detection
assistant System prompt, max context items, history length

Offline Mode

To run entirely offline with no network:

llm:
  provider: ollama
  base_url: "http://localhost:11434"
  model: "llama3"
ollama pull llama3
ollama pull nomic-embed-text
python -m src.main

Dependencies

Package Purpose
openai LLM client (OpenAI-compatible / Ollama)
pyyaml Configuration
mss Fast cross-platform screenshots
Pillow Image processing
easyocr Offline OCR text extraction
numpy Array operations
pyaudiowpatch Windows audio loopback capture
openai-whisper Local speech-to-text
torch Whisper backend
chromadb Vector database for semantic search
pystray System tray icon (tray mode)

About

Your personal khabri (खबरी) - the informant who sees everything, hears everything, and never forgets.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages