Skip to content

alta3/code-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Alta3 Code Crawler: The Code Genome System

"Specifications Are All You Need."

πŸ— System Architecture

Alta3 Code Crawler is a Semantic Surgery Engine designed to treat your codebase as a living organism rather than a collection of text files.

🧠 The Intent Bridge

The core of the architecture is an agentic Local Vector Index that maps human language to code reality. When you ask a question in "plain English," Code Crawler performs a high-speed mathematical lookup against your locally stored embeddings.

  • Zero-Knowledge Discovery:
    • You don't need to know the function names.
    • Searching for "How do we handle expired sessions?" will rank the relevant security logic #1, even if the function is named pkg.ValidateTick().
  • Privacy-First Embeddings:
    • Your semantic map is processed and stored locally in genome.index.json.
    • Your architectural "intent" stays on your machine, not in a cloud database.
  • The Identity Triad:
    • Every piece of code is tracked by its Public ID, Structural Fingerprint, and Logic Hash.
    • If you change a single character, the "Nervous System" detects the drift instantly.

πŸ›  The Workflow Loop

  1. Ingest: init and enrich build the initial "Map of Intent."
  2. Locate: search-intent and trace find the exact "Twigs" that need pruning.
  3. Operate: plan and apply execute multi-file AST splicing with 100% syntax safety.

πŸš€ Quick Start

Installation

Clone the repo and build the binary:

git clone https://github.com/your-org/code-crawler
cd code-cralwer
go build -o code-crawler main.go

πŸš€ The Code Crawler Workflow (In Execution Order)

🧬 Initialize the Code Genome

  • cc verify-llm-targets – Validate connectivity for FAST, DEEP, and embedding tiers.
  • cc init – Map all functions, structs, and variables into a genome.json registry.
  • cc enrich – Generate semantic "business purpose" summaries and logic hashes.

πŸ” Verify Your Context

  • cc search-intent – Perform semantic natural language searches across the genome.
  • cc trace – Structurally identify call stacks and dependency trees for any target.
  • cc verify – Audit the codebase for "logic drift" against registered genome hashes.

⚑ Execute the Surgery

  • cc draft – Translate natural language requests into structured multi-file plans.
  • cc graph – Analyze the "blast radius" and hydrate source code for deep context.
  • cc plan – Orchestrate LLMs to generate and refine strict code patch sets.
  • cc apply – Execute batch AST splicing to write patches to the local filesystem.
  • cc gen-test – Automatically generate and run Go tests to verify mutated logic.

πŸ› οΈ Maintenance & Utilities

  • cc help – Provides detailed documentation and flags for any specific command.
  • cc completion – Generates shell autocompletion scripts for a smoother CLI experience.

βš–οΈ Guarantees

  • Indestructible Identity: Renaming a file does not break the registry.
  • Atomic Commits: Files are only updated if they pass go vet and go build.
  • Format Preservation: All edits are automatically passed through gofmt.

cobra style help, just type "cc"

./cc 
A deterministic, AST-based mutation and repair engine 
that treats code as a living genome.

Usage:
  cc [command]

Available Commands:
  apply              Apply a generated V5 surgery patch set to the local filesystem using batch AST splicing
  completion         Generate the autocompletion script for the specified shell
  draft              Initialize a code surgery plan based on a natural language intent
  enrich             Uses FAST Cognition to automatically document the business purpose of code
  gen-test           Generates and verifies a Go test for the last mutated function
  graph              Analyze the blast radius of a planned surgery and hydrate context source code
  help               Help about any command
  init               Initialize the Code Genome for the project
  plan               Generate and auto-refine a strict multi-file code patch set from a surgery plan
  search-intent      Search the semantic genome index using a natural language intent
  trace              Structurally grep the codebase to find all callers of a specific target
  verify             Detects logic drift between live code and the genome.json
  verify-llm-targets Verifies end-to-end LLM connectivity for both FAST and DEEP cognitive tiers

1. initialize the code genome.json file with the ID of major nodes

./cc init                                                                                                                                                                       
🧬 Initializing Code Genome at: .                                                                                                                                                                                   
βœ… Success! Indexed 175 nodes into genome.json                                                                                                                                                                      
πŸ’‘ Next step: Run './cc enrich' to generate semantic summaries.

2. Now populate the genome json file with the business purpose of each node

Code Crawler contacts your AI and sequences your code's genome

./cc enrich
🧠 Starting Semantic Enrichment Process...
βš™οΈ   cc.printActiveNodeHeader[enrich.go]        cmd/code-crawler/enrich.go       [21939968]
 β”œβ”€ 🧬 logic changed
 β”œβ”€ πŸ”„ purpose reset
 β”œβ”€ πŸ” analyzing [175/175]
 └─ βœ… purpose updated
    This function serves to display a standardized header for an "active" Code Crawler
    code node, providing immediate visual context about its type, identity,
    file location, and a truncated logic hash. It helps the autonomous agent or
    an observer track and understand which specific code element Code Crawler is
    currently processing or analyzing within the code genome.
πŸ“‘ Synchronizing Semantic Index...
πŸ“Š Enrichment Summary:
  - Updated: 1
  - Skipped (Already Enriched): 174
  - Skipped (Missing/Drifted):  0
  - Failed:  0
🧠 Semantic Index Summary:
  - Created: 0
  - Updated: 1
  - Deleted: 0
  - Skipped: 174

πŸ’Ύ Genome memory successfully updated.
πŸ’Ύ Semantic index successfully updated.

3. Now use Code Cralwer to walk to the code genome:

The response you see below will generate in less than a second.
Code Crawler does inferencing locally and is blazing fast

./cc search-intent "find the code that formats the summary text with indentation"

πŸ”Ž Search Intent: find the code that formats the summary text with indentation

Top Matches
1. cc.printWrappedIndented               cmd/code-crawler/search_intent.go     function   0.71
2. cc.wrapText                           cmd/code-crawler/enrich.go            function   0.68
3. cc.printIndexSummary                  cmd/code-crawler/enrich.go            function   0.66
4. cc.printEnrichmentSummary             cmd/code-crawler/enrich.go            function   0.65
5. cc.printSearchResults                 cmd/code-crawler/search_intent.go     function   0.63

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors