Groma

A semantic code search tool for Git repositories that uses vector embeddings to find relevant files based on natural language queries.

Two Versions Available

1. `groma` - Cloud-based with Qdrant

Uses Qdrant vector database (requires Docker)
Uses OpenAI API for embeddings (requires API key, costs money)
Higher quality embeddings
Needs internet connection

2. `groma-lancedb` - Fully Local & Free

Uses LanceDB (embedded, no server needed)
Uses local fastembed model (AllMiniLML6V2)
100% offline, no API calls
Completely free
Your code never leaves your machine

Installation

# Clone the repository
git clone https://github.com/yourusername/groma.git
cd groma

# Build both versions
cargo build --release --features qdrant --bin groma
cargo build --release --features lancedb --bin groma-lancedb

# Install to your PATH
cp target/release/groma ~/.local/bin/
cp target/release/groma-lancedb ~/.local/bin/

Usage

Both versions use the same command-line interface:

# Basic usage - pipe your query through stdin
echo "authentication logic" | groma /path/to/repo --cutoff 0.3

# Or use the LanceDB version (no setup needed!)
echo "authentication logic" | groma-lancedb /path/to/repo --cutoff 0.3

Options

--cutoff - Similarity threshold (0.0-1.0, default: 0.7)
--suppress-updates - Skip indexing, query existing data only
--debug - Enable debug logging

Setup Requirements

For `groma` (Qdrant version)

Start Qdrant Docker container:

docker run -p 6334:6334 -v ~/.qdrant_data:/qdrant/storage qdrant/qdrant

Set OpenAI API key:

export OPENAI_API_KEY='your-api-key-here'

Optional - Set custom Qdrant URL:

export QDRANT_URL='http://your-qdrant-host:6334'

For `groma-lancedb` (Local version)

No setup required! Just run it. The first run will download the embedding model (~80MB) automatically.

MCP Server Mode (LanceDB only)

groma-lancedb can run as an MCP (Model Context Protocol) server:

# Run as MCP server
groma-lancedb mcp

# With debug logging (logs to /tmp/groma.log)
groma-lancedb mcp --debug

MCP Configuration

Add to your MCP client config (e.g., Claude Desktop):

{
  "mcpServers": {
    "groma": {
      "command": "/path/to/groma-lancedb",
      "args": ["mcp"]
    }
  }
}

The MCP server provides a query tool for semantic code search:

query: Search query string
folder: Repository path to search
cutoff: Similarity threshold (0.0-1.0, default 0.3)

How It Works

Indexing: On first run, Groma scans your Git repository and creates embeddings for all tracked files
Incremental Updates: Subsequent runs only process changed files
Semantic Search: Your query is embedded and compared against the indexed files
Results: Returns relevant file paths and content snippets in JSON format

File Filtering

Both versions respect:

.gitignore - Files ignored by Git are not indexed
.gromaignore - Additional patterns to exclude from indexing
Only Git-tracked files are processed
Binary files are automatically skipped

Output Format

Results are returned as JSON for easy integration with other tools:

{
  "path": "src/auth.rs",
  "score": 0.82,
  "content": "impl Authentication {\n    pub fn verify_token..."
}

Integration with Aider

Groma works great with aider for AI-assisted coding:

# Use with aider's --read flag
aider --read $(echo "authentication" | groma . --cutoff 0.3 | jq -r '.path')

# Or use the helper script
aider --read $(groma-files "authentication logic" .)

Performance Comparison

Feature	`groma` (Qdrant)	`groma-lancedb` (Local)
Setup Required	Docker + API Key	None
Internet Required	Yes	No
Cost	OpenAI API fees	Free
Privacy	API calls	100% local
Embedding Quality	Higher	Good
Speed	Fast after indexing	Fast after indexing
Storage	External (Qdrant)	Local (.groma_lancedb)

Why Groma?

The name comes from the groma, a surveying instrument used in the Roman Empire. Just as the ancient groma helped surveyors find structure in the physical landscape, this tool helps you find relevant files within your codebase.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 166 Commits
.github/workflows		.github/workflows
assets/images		assets/images
scripts		scripts
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
test_file.rs		test_file.rs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Groma

Two Versions Available

1. `groma` - Cloud-based with Qdrant

2. `groma-lancedb` - Fully Local & Free

Installation

Usage

Options

Setup Requirements

For `groma` (Qdrant version)

For `groma-lancedb` (Local version)

MCP Server Mode (LanceDB only)

MCP Configuration

How It Works

File Filtering

Output Format

Integration with Aider

Performance Comparison

Why Groma?

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

lambdamechanic/groma

Folders and files

Latest commit

History

Repository files navigation

Groma

Two Versions Available

1. groma - Cloud-based with Qdrant

2. groma-lancedb - Fully Local & Free

Installation

Usage

Options

Setup Requirements

For groma (Qdrant version)

For groma-lancedb (Local version)

MCP Server Mode (LanceDB only)

MCP Configuration

How It Works

File Filtering

Output Format

Integration with Aider

Performance Comparison

Why Groma?

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

1. `groma` - Cloud-based with Qdrant

2. `groma-lancedb` - Fully Local & Free

For `groma` (Qdrant version)

For `groma-lancedb` (Local version)

Packages