A semantic code search tool for Git repositories that uses vector embeddings to find relevant files based on natural language queries.
- Uses Qdrant vector database (requires Docker)
- Uses OpenAI API for embeddings (requires API key, costs money)
- Higher quality embeddings
- Needs internet connection
- Uses LanceDB (embedded, no server needed)
- Uses local fastembed model (AllMiniLML6V2)
- 100% offline, no API calls
- Completely free
- Your code never leaves your machine
# Clone the repository
git clone https://github.com/yourusername/groma.git
cd groma
# Build both versions
cargo build --release --features qdrant --bin groma
cargo build --release --features lancedb --bin groma-lancedb
# Install to your PATH
cp target/release/groma ~/.local/bin/
cp target/release/groma-lancedb ~/.local/bin/Both versions use the same command-line interface:
# Basic usage - pipe your query through stdin
echo "authentication logic" | groma /path/to/repo --cutoff 0.3
# Or use the LanceDB version (no setup needed!)
echo "authentication logic" | groma-lancedb /path/to/repo --cutoff 0.3--cutoff- Similarity threshold (0.0-1.0, default: 0.7)--suppress-updates- Skip indexing, query existing data only--debug- Enable debug logging
- Start Qdrant Docker container:
docker run -p 6334:6334 -v ~/.qdrant_data:/qdrant/storage qdrant/qdrant- Set OpenAI API key:
export OPENAI_API_KEY='your-api-key-here'- Optional - Set custom Qdrant URL:
export QDRANT_URL='http://your-qdrant-host:6334'No setup required! Just run it. The first run will download the embedding model (~80MB) automatically.
groma-lancedb can run as an MCP (Model Context Protocol) server:
# Run as MCP server
groma-lancedb mcp
# With debug logging (logs to /tmp/groma.log)
groma-lancedb mcp --debugAdd to your MCP client config (e.g., Claude Desktop):
{
"mcpServers": {
"groma": {
"command": "/path/to/groma-lancedb",
"args": ["mcp"]
}
}
}The MCP server provides a query tool for semantic code search:
- query: Search query string
- folder: Repository path to search
- cutoff: Similarity threshold (0.0-1.0, default 0.3)
- Indexing: On first run, Groma scans your Git repository and creates embeddings for all tracked files
- Incremental Updates: Subsequent runs only process changed files
- Semantic Search: Your query is embedded and compared against the indexed files
- Results: Returns relevant file paths and content snippets in JSON format
Both versions respect:
.gitignore- Files ignored by Git are not indexed.gromaignore- Additional patterns to exclude from indexing- Only Git-tracked files are processed
- Binary files are automatically skipped
Results are returned as JSON for easy integration with other tools:
{
"path": "src/auth.rs",
"score": 0.82,
"content": "impl Authentication {\n pub fn verify_token..."
}Groma works great with aider for AI-assisted coding:
# Use with aider's --read flag
aider --read $(echo "authentication" | groma . --cutoff 0.3 | jq -r '.path')
# Or use the helper script
aider --read $(groma-files "authentication logic" .)| Feature | groma (Qdrant) |
groma-lancedb (Local) |
|---|---|---|
| Setup Required | Docker + API Key | None |
| Internet Required | Yes | No |
| Cost | OpenAI API fees | Free |
| Privacy | API calls | 100% local |
| Embedding Quality | Higher | Good |
| Speed | Fast after indexing | Fast after indexing |
| Storage | External (Qdrant) | Local (.groma_lancedb) |
The name comes from the groma, a surveying instrument used in the Roman Empire. Just as the ancient groma helped surveyors find structure in the physical landscape, this tool helps you find relevant files within your codebase.
MIT