Skip to content

tevfik/gleann

Repository files navigation

gleann

CI Go Report Card License: MIT Go Reference

A lightweight, high-performance AI/RAG workspace and autonomous agent framework implemented in Go. Inspired by the Leann RAG backend architecture, designed for terminal environments.

🤖 Note: This project, including its documentation, was developed with the assistance of AI.

Gleann terminal demo — index AI docs, semantic search, RAG Q&A, long-term memory, MCP setup


Why Gleann?

🔒 Privacy first — every embedding and inference call runs locally via Ollama or llama.cpp. Your data never leaves your machine.

Single binary — no Python virtualenv, no Node.js runtime, no Docker required. One go install and you're done.

🤖 MCP-native — one command (gleann install) wires the knowledge base into 17 AI editors and agents: Claude Code, Cursor, Windsurf, Cline, Kiro, Amazon Q, Zed, JetBrains, Neovim, Codex, Gemini CLI, OpenCode, Amp, Continue, Aider, OpenClaw, and GitHub Copilot CLI.

🧠 Code intelligence, not just text search — AST-aware chunking, call-graph traversal, and blast-radius analysis give LLMs structural code context that plain vector search cannot provide.

📦 Portable — indexes live in ~/.gleann/indexes/. Copy the folder between machines and your entire knowledge base travels with you.


Project Context and Motivation

Gleann was developed to automate engineering workflows and facilitate the analysis of codebases and technical documents within terminal environments.

The architecture is inspired by the Leann project, which introduced a high-performance RAG backend architecture designed for efficient indexing and retrieval. We acknowledge the original Leann authors for their approach to selective recomputation and vector retrieval.

While Leann provides a robust RAG engine, deploying it typically requires a Python/Node environment and a set of external dependencies. Gleann aims to provide a self-contained environment where the LLM, plugin system, and RAG storage operate as a consolidated, zero-dependency unit.

Built as a Go-native implementation of core RAG concepts, Gleann features a compact architecture. It incorporates an agent layer based on ReAct (Reasoning and Acting) patterns and provides direct LLM integration.

The system is optimized for fast initialization and low memory utilization, managing AI workloads via a single compiled binary.

Gleann vs LEANN — side by side

LEANN Gleann
Language Python Go
Deploy pip install + venv Single static binary
Core innovation Graph-based selective recomputation (97% less storage) Full-stack AI assistant — RAG + code graph + memory + MCP server in one binary
Vector backends HNSW, DiskANN HNSW, DiskANN, FAISS
Code analysis AST-aware chunking AST call-graph, callers/callees, impact analysis, community detection
Long-term memory Tiered BBolt blocks (short / medium / long), auto-injected into every prompt
Agent protocol MCP client (reads from servers) MCP server (exposes tools) + Google A2A protocol
Privacy Local-first Local-first
Platforms Linux, macOS, Windows (WSL) Linux, macOS, Windows (native)

Both projects share the same foundational idea — graph-based ANN indices + selective recomputation — but optimise for different trade-offs: LEANN minimises storage overhead for massive personal-data corpora, while Gleann maximises developer ergonomics and AI-editor integration.

Key Features

  • Context Field Theory (Φ Scoring): MCP search results are re-ranked using a multi-factor Φ score — recency (1hr half-life decay), frequency, graph proximity, and structural degree centrality.
  • 10 File Read Modes: Smart file reading for LLM agents: map, signatures, entropy, diff, task, reference, aggressive, lines:N, auto, and full. Saves 60–90% of context-window tokens.
  • Shell Output Compression: 95+ tool-specific regex patterns (Git, Go, NPM, Docker, Cargo) collapse noisy terminal output before it enters the LLM context window.
  • 17 Agent & IDE Platforms: gleann install auto-detects and configures OpenCode, Claude Code, Cursor, Codex, Gemini CLI, Windsurf, Cline/Roo, Amp, Kiro, Amazon Q, Continue, Zed, Neovim, JetBrains, OpenClaw, Aider, and GitHub Copilot CLI.
  • Token Gain Tracking: gleann_gain MCP tool reports cumulative token savings across a session, enabling budget-aware agent loops.
  • Academic Vision, Full-Fledged Agent: Built on the shoulders of Leann's RAG architecture to create an autonomous assistant where LLM, vector/graph DBs, and plugins unite in one Go app.
  • Zero-Config Extractive Summarization: High-density sentences are extracted algorithmically during build time, bypassing LLMs and enabling zero-latency "Smart Summaries".
  • Flexible Intelligence (Local or Cloud): Run LLMs 100% locally via llama.cpp for total privacy, or connect to any OpenAI-compatible API for high-reasoning tasks.
  • Advanced RAG (Faiss / HNSW & Graph DB): Indexes documents and code semantically (vector) and relationally (graph), not just via simple keyword matching.
  • Smart Chunking (Tree-sitter): Intelligent AST-aware partitioning preserves the structural integrity of your code functions and classes.
  • Graph-Augmented Search: Search results are enriched with callers/callees from the AST graph, giving LLMs structural code context alongside semantic matches.
  • Impact Analysis: Blast radius analysis via BFS traversal — find all direct and transitive callers of any symbol and the files they belong to.
  • Multi-Index Chat: Ask questions across multiple indexes simultaneously with gleann ask docs,code "question". Results are merged by relevance score.
  • Conversations: Persistent conversation history with --continue, --continue-last, --title. Manage via gleann chat --list / --show / --delete.
  • Roles & Format Control: Named system prompt roles (--role code, --role shell) and output format control (--format json, --format markdown). Custom roles in config.
  • Markdown Rendering: Terminal markdown rendering via glamour. Disable with --raw.
  • Word-wrap: Terminal-aware word wrapping with --word-wrap N for streaming output.
  • LLM Title Summarization: Auto-generated conversation titles via LLM when no title is provided.
  • Embedding Cache: Two-tier cache (L1: otter in-memory ≤50k vectors; L2: disk keyed by SHA-256). L2 hits are promoted to L1; unchanged chunks skip recompute entirely during rebuilds.
  • Pipe-Friendly: Full stdin/pipe support (cat file | gleann ask index "review"), auto-raw mode when stdout is piped, --quiet for scripting.
  • No-Cache / No-Limit: --no-cache skips conversation save, --no-limit removes token cap for unlimited output.
  • .gleannignore: Gitignore-style patterns to exclude files during index builds.
  • Config Management: gleann config show/path/edit/validate for easy configuration.
  • Model Context Protocol (MCP) Server: A background service that bridges the gap between your local context and AI tools like Cursor or Claude Desktop.
  • Long-term Memory (BBolt Blocks): Hierarchical short/medium/long-term memory that is automatically injected into every LLM query. Store facts with /remember, browse with /memories.
  • OpenAI-Compatible Proxy: Drop-in replacement for OpenAI API — use any OpenAI SDK with model: "gleann/<index>" for instant RAG.
  • Batch Query (MCP): gleann_batch_ask runs up to 10 questions concurrently against an index in a single round-trip.
  • Rate Limiting & Timeouts: Per-IP token-bucket rate limiting (429) and per-endpoint context deadlines (504) protect the server in production.
  • Retry Logic: Automatic exponential-backoff retry for transient LLM/embedding failures (503, 502, 429, connection refused).
  • Background Maintenance: Scheduler auto-promotes memory blocks between tiers and prunes expired entries.
  • A2A Protocol (Agent-to-Agent): Google's A2A protocol for agent discovery — other AI agents find and communicate with gleann via /.well-known/agent-card.json.
  • Unified Memory API: Single POST /api/memory/ingest + POST /api/memory/recall interface that orchestrates block memory, knowledge graph, and vector search in parallel.
  • Multimodal Detection: Automatically detects and uses multimodal Ollama models (Gemma4, Qwen3-VL, LLaVA) for processing images, audio, and video.
  • Background Task Manager: Monitor long-running operations (indexing, memory consolidation) with progress tracking via GET /api/tasks.
  • Auto-Bootstrap: gleann serve detects Ollama, selects models, and creates an initial config file without manual intervention.
  • gleann setup --auto — Zero-Friction Onboarding Tool: Detects environment, pulls required models, and builds initial indexes via an automated workflow.
  • Cross-Platform Service Management: gleann service install/start/stop/status manages a background server via systemd (Linux), launchd (macOS), or Task Scheduler (Windows).
  • Auto Model Management: Missing models are automatically retrieved with progress tracking.
  • Tiered Model Strategy: Defaults to lightweight models for fast initialization, with the ability to configure larger models for advanced use cases.
  • Terminal User Interface (TUI): A keyboard-centric interface for interacting with indexed data and executing AI operations directly from the shell.

Index anything

Gleann indexes any directory of files. The type of content determines which capabilities unlock:

Source Command What you get
Markdown / TXT gleann index build docs --docs ./docs Semantic search, RAG Q&A, summaries
Source code gleann index build code --docs ./src --graph Semantic search + AST call graph, callers/callees, blast-radius
PDF / DOCX / XLSX gleann index build docs --docs ./papers Requires gleann-plugin-docs (MarkItDown)
Audio / Video gleann index build media --docs ./recordings Requires gleann-plugin-sound (whisper.cpp)
Multiple indexes gleann ask docs,code "how does auth work?" Fan-out query, results merged by score

Exclude files with .gleannignore (same syntax as .gitignore).

Documentation

Detailed guides:

Installation

Go Install (Recommended)

The easiest way to install Gleann is via go install:

go install github.com/tevfik/gleann/cmd/gleann@latest

One-Liner Install (Linux / macOS)

curl -sSfL https://raw.githubusercontent.com/tevfik/gleann/main/scripts/install.sh | sh

Options:

GLEANN_VERSION=v1.0.0 curl -sSfL .../install.sh | sh   # specific version
GLEANN_FULL=1 curl -sSfL .../install.sh | sh            # full build (tree-sitter)
GLEANN_INSTALL_DIR=/usr/local/bin curl -sSfL .../install.sh | sh  # custom location

From Source

git clone https://github.com/tevfik/gleann.git
cd gleann

# Build CLI (includes TUI, REST server, MCP server)
go build -o gleann ./cmd/gleann/

# Run setup wizard
./gleann setup

Requires Go 1.24+.

Docker

# Pure-Go image (~10MB, no tree-sitter/FAISS)
docker build -t gleann .
docker run -p 8080:8080 -v gleann-data:/data/indexes gleann serve

# Full image with tree-sitter AST support (CGo)
docker build -f Dockerfile.full -t gleann-full .
docker run -p 8080:8080 -v gleann-data:/data/indexes gleann-full serve

# docker-compose (gleann + Ollama sidecar)
docker-compose up -d

# Or via Makefile
make docker          # Build pure-Go image
make docker-full     # Build CGo + tree-sitter image
make docker-run      # Run with docker-compose

Install to PATH

The setup wizard (gleann setup / gleann tui → Setup) installs the binary to ~/.local/bin or /usr/local/bin with shell completions (bash, zsh, fish).

You can also install via Makefile:

# Install gleann-full (FAISS + tree-sitter) to ~/.local/bin/gleann (recommended)
make install-user

# Install plain gleann (no FAISS, just tree-sitter) to ~/.local/bin/gleann
make install-user-lite

# Install gleann to /usr/local/bin (system-wide, needs sudo)
sudo make install

Usage

Quick Start

# Zero-friction onboarding: detect Ollama → auto-configure → pull models → index
gleann setup --auto

# Or with specific options
gleann setup --auto --docs ./my-project --name my-project --yes

gleann setup --auto detects your environment, shows the configuration for confirmation, pulls any missing models automatically, indexes your current directory, and prints next steps.

Background Service

# Start gleann server in background
gleann service start

# Auto-start on login (systemd/launchd/schtasks)
gleann service install

# Server status
gleann service status

# View logs
gleann service logs

# Stop server
gleann service stop

CLI

# Interactive setup wizard
gleann setup

# Quick auto-configuration (detects Ollama + models)
gleann setup --bootstrap

# Check system health
gleann doctor

# Build index from documents
gleann index build my-docs --docs ./documents/

# Build with AST code graph
gleann index build my-code --docs ./src --graph

# Search
gleann search my-docs "what is HNSW?"

# Search with reranking
gleann search my-docs "what is HNSW?" --rerank

# Search with graph context (callers/callees enrichment)
gleann search my-code "handleSearch" --graph

# Index management
gleann index list
gleann index info my-docs
gleann index remove my-docs
gleann index rebuild my-code --docs ./src --graph
gleann index watch my-code --docs ./src --graph

# Chat with an index (interactive TUI mode)
gleann chat my-docs

# Ask a question (single-shot)
gleann ask my-docs "Explain the architecture"
gleann ask my-docs "Explain the architecture" --interactive

# Multi-index ask (comma-separated)
gleann ask docs,code "How does auth work?"

# Pipe input
cat main.go | gleann ask my-code "Review this code"

# Continue a conversation
gleann ask my-docs --continue-last "What about error handling?"

# Use a role and output format
gleann ask my-docs "List the API endpoints" --role summarize --format json

# Unlimited output, skip conversation save
gleann ask my-docs "Give me everything" --no-limit --no-cache

# Word-wrap streaming output at 80 columns
gleann ask my-docs "Explain architecture" --word-wrap 80

# Raw output (no markdown rendering, for scripts)
gleann ask my-docs "List endpoints" --raw

# Manage conversations
gleann chat --list
gleann chat --show-last
gleann chat --delete-older-than 30d

# Configuration management
gleann config show
gleann config edit
gleann config validate

# Launch TUI
gleann tui

# Start MCP server (for AI editors)
gleann mcp

# Start REST API server
gleann serve --port 8080

# Open interactive API docs (Swagger UI)
open http://localhost:8080/api/docs

Generic Plugin Architecture

Gleann supports external Plugins for parsing complex files via local HTTP APIs. Registry: ~/.gleann/plugins.json.

Contributing

See CONTRIBUTING.md for development setup and guidelines.

Security

See SECURITY.md for security policy and reporting vulnerabilities.

License

MIT

About

A lightweight, high-performance AI/RAG workspace and autonomous agent framework implemented in Go. Inspired by the Leann RAG backend architecture

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors