Skip to content

TrevorS/rhizome

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rhizome - Lightweight Cross-Repo Semantic Code Search CLI

Overview

rhizome is a fast, minimal, single-binary command-line interface (CLI) tool designed for local-first semantic code search across multiple code repositories. It leverages natural language queries, Tree-sitter grammars for syntactically aware chunking, and an embedded vector database (SQLite with sqlite-vec) for efficient storage and retrieval. The user experience aims to be familiar to users of tools like ripgrep.

Goals

  • Fast, Local-First Search: All indexing and searching occur on the user's machine.
  • Single Binary: Easy distribution and installation.
  • Semantic Search Focused: Prioritizes natural language queries.
  • Minimal Configuration: Sensible defaults with a clear configuration file for customization.
  • Ripgrep-like UX: Familiar command structure, output formatting, and exit codes.
  • Robust Indexing: Reliable change detection and transactional database updates.

Features

  • Indexes code repositories locally.
  • Performs semantic search using natural language queries.
  • Supports structure-aware chunking for Rust, Python, and TypeScript using Tree-sitter.
  • Provides fallback text-based chunking for other UTF-8 files.
  • Uses ONNX embedding models (defaults to BAAI/bge-small-en-v1.5).
  • Stores index in an SQLite database (~/.rhizome/index.db) with vector support via sqlite-vec.
  • Respects .gitignore rules during indexing.
  • Provides commands for initialization, indexing, searching, status checking, and deleting index data.
  • Offers configurable options via ~/.rhizome/config.toml.
  • Ripgrep-like output formatting with optional scores and context display.

Commands

  • rhizome init [--force]: Initializes configuration (~/.rhizome/config.toml), creates necessary directories (~/.rhizome/models/), initializes the database (~/.rhizome/index.db), and downloads the default embedding model. Use --force to overwrite existing configuration.
  • rhizome index [--repo <path>] [--dry-run] [--include-ignored] [--ext <extensions>] [--jobs <N>]: Indexes the specified repository (or current directory). Discovers files, chunks content, generates embeddings, and stores data in the database.
    • --dry-run: Shows what would be processed without modifying the database.
    • --include-ignored: Includes files normally ignored by .gitignore.
    • --ext: Comma-separated list of file extensions to include.
    • --jobs: Number of parallel jobs for processing (defaults to number of CPUs).
  • rhizome search <query> [--repo <path>] [--limit <N>] [--show-scores] [--show-range] [--context <N>] [--path <glob>] [--lang <lang>]: Performs a semantic search using the provided query.
    • --limit: Maximum number of results.
    • --show-scores: Displays similarity scores.
    • --show-range: Shows the start and end line numbers of the chunk.
    • --context: Number of context lines to display before and after matches.
    • --path: Filter results to files with paths matching these patterns.
    • --lang: Filter results to a specific programming language.
  • rhizome status [--repo <path>]: Shows the indexing status for a repository, including indexed file/chunk counts, languages, and detected changes (new, outdated, deleted files).
  • rhizome delete [--repo <path>] [--all] [--yes]: Removes index data. Can remove data for a specific repository or the entire index (--all). Requires confirmation unless --yes is provided.

Global Flags:

  • --verbose: Enable verbose output.
  • --help: Show help message.
  • --version: Show version information.
  • --no-color: Disable colored output.

Configuration

Configuration is managed via ~/.rhizome/config.toml. Key options include:

  • [model]:
    • path: Identifier for the Hugging Face model (e.g., "Xenova/bge-small-en-v1.5") or a local path.
    • provider: ONNX runtime provider (e.g., "cpu").
    • dimension: Embedding dimension (e.g., 384).
  • [chunking]:
    • size: Target chunk size in tokens (e.g., 450).
    • overlap: Ratio of overlap between chunks (e.g., 0.1).
  • [database]:
    • path: Path to the SQLite database file (e.g., "~/.rhizome/index.db").
  • [paths]:
    • models_dir: Directory to store downloaded models (e.g., "~/.rhizome/models").

The init command creates a default configuration file if one doesn't exist.

Development Setup

Install ONNX Runtime and system dependencies locally by running:

./scripts/setup_deps.sh

This script downloads the runtime and copies its libraries into target/debug. Run tests with:

ORT_STRATEGY=system ORT_LIB_LOCATION=target/debug make test

Building and Testing

The project uses a Makefile for common development tasks:

  • make check: Verify formatting and linting.
  • make test: Execute tests.
  • make run-with-args ARGS='...': Run the application with specific arguments.
  • make all: Run checks and tests together.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages