Skip to content

ElshadHu/mark-guard

Repository files navigation

mark-guard

A CLI tool that keeps your documentation in sync with your Go code.

You change code. You forget to update docs. mark-guard parses the AST of your old code (from git) and your new code (on disk), extracts a semantic diff of exported symbols, and produces a structured summary of what changed in the public API. It then feeds that diff plus your current markdown docs to an LLM and writes the updated docs back to disk.

Text diffs are noisy and miss the point. AST-level diffing tells you exactly what changed in the public API which is exactly what documentation cares about.

Status

End-to-end pipeline works. I might change the prompt section with detailed XML and precise prompt. Other parts depend while I test other codbases I will figure out more

Phase Description Status
1-2 Skeleton + Git Integration Done
3 Go Symbol Extraction Done
4 Symbol Diffing Done
5 Doc Scanning Done
6 LLM Integration Done
7 End-to-End Wiring Done

What works today:

  • Detects changed .go files via git
  • Parses old and new Go source, extracts exported symbols
  • Diffs symbol sets: added, removed, modified (down to parameters, fields, methods)
  • Produces compact diff summaries (reduced token usage for LLM)
  • Scans and selects relevant markdown docs via config-based mapping
  • Loads config from .markguard.yaml with sensible defaults
  • Validates LLM output before writing (content-loss guard)
  • Dry-run by default, --write to apply, --force to bypass safety checks
  • Sends diff + docs to LLM (Gemini/OpenAI compatible) and writes updates back

How It Works

  1. Detect changed .go files via git diff --name-only + git ls-files --others
  2. Read old version from git show HEAD:<file>, new version from disk
  3. Parse both with go/parser.ParseFile, extract exported symbols (functions, types, structs, interfaces, consts, vars)
  4. Diff the two symbol sets: what was added, removed, or modified (down to individual parameters, fields, methods)
  5. Scan configured doc paths, select relevant markdown files via config-based mapping
  6. Build prompt with diff summary + doc content, send to LLM
  7. Validate LLM output (reject empty results, block >50% content loss)
  8. Write updated docs back to disk

Usage

format

Updates existing docs based on changed symbols in the current git diff.

# dry run -- see what would change (default)
make run

# apply changes to doc files
make run ARGS="--write"

# see the full diff summary, prompt, and raw LLM response
make run ARGS="--debug"

# bypass content-loss safety checks
make run ARGS="--write --force"

# compare against a specific git ref
make run ARGS="--base HEAD~3"

# use a custom config file
make run ARGS="--config path/to/.markguard.yaml"

# abort if token estimate exceeds a limit
make run ARGS="--max-tokens 30000"

format flags

Flag Default Description
--base HEAD Git ref to compare against
--config .markguard.yaml Path to config file
--debug false Print diff summary, prompt, and raw LLM response
--force false Bypass content-loss safety checks
--max-tokens 50000 Abort if estimated tokens exceed this limit
--write false Apply changes to doc files (dry-run by default)

generate

Bootstraps docs from scratch by parsing all exported Go symbols and sending them to the LLM. Use this when no docs exist yet. Use format for ongoing updates.

# dry run -- preview what would be generated
make generate

# append all packages to README.md
make generate-write ARGS="--output README.md"

# write one file per package into docs/
make generate-write ARGS="--output docs/"

# target a subdirectory of your repo
make generate-write ARGS="./internal/llm --output docs/"

# overwrite existing files in directory mode
make generate-write ARGS="--output docs/ --force"

# preview with full LLM prompt visible
make generate ARGS="--debug"

Output routing:

  • --output README.md (any .md file): all packages are appended to that single file, sorted alphabetically and separated by horizontal rules.
  • --output docs/ (directory): one <pkgname>.md file is created per package.

If --output is not passed, the value from generate.output in .markguard.yaml is used, then docs.paths[0], then docs/ as a final fallback.

generate flags

Flag Default Description
--output from config Directory or .md file destination
--config .markguard.yaml Path to config file
--max-tokens 50000 Abort if estimated tokens exceed this limit
--write false Apply changes (dry-run by default)
--force false Overwrite existing files in directory mode
--debug false Print symbol list, prompt, and raw LLM response

Docker

Run mark-guard without installing Go:

docker pull ghcr.io/elshadhu/mark-guard:latest

Run it against your repo:

# dry run - see what would change
docker run --rm \
  -v "$(pwd):/repo" \
  -w /repo \
  -e GEMINI_API_KEY="$GEMINI_API_KEY" \
  ghcr.io/elshadhu/mark-guard:latest format

# apply changes
docker run --rm \
  -v "$(pwd):/repo" \
  -w /repo \
  -e GEMINI_API_KEY="$GEMINI_API_KEY" \
  ghcr.io/elshadhu/mark-guard:latest format --write

# check version
docker run --rm ghcr.io/elshadhu/mark-guard:latest version

-v "$(pwd):/repo" mounts your repo so mark-guard can see your code, docs, and .git history.

You can pin to a specific version instead of latest:

docker pull ghcr.io/elshadhu/mark-guard:1.2.3

Key Design Decisions

Decision Choice Why
Diff strategy AST-level symbol diff, not text diff Text diffs include noise (whitespace, imports, comments). AST diff gives semantic changes: "parameter added", "field type changed". That is what docs care about.
Parser go/parser only, no go/types We parse raw strings from git show. go/types needs the full module graph. We need signatures, not resolved types.
Git integration os/exec shelling out to git go-git pulls 30+ dependencies. System git is faster for simple operations.
Doc-to-code mapping Config-based mapping + send-all fallback Small repos: send all docs (zero config). Large repos: user adds mappings for precision. No false-positive symbol scanning.
CLI framework Cobra without Viper Cobra gives subcommands, flags, help text. Viper pulls 20 transitive deps for reading one YAML file. We use yaml.v3 directly.
Config .markguard.yaml with env var references API key stored as env var name, not the key itself. Config is optional, defaults work out of the box.

What It Does Not Do

  • Support languages other than Go. Each language needs its own parser. Go-only for now.
  • Auto-commit. You review the changes first.

Dependencies

github.com/spf13/cobra       # CLI framework
gopkg.in/yaml.v3              # YAML config parsing

Two external deps. Everything else is Go stdlib (go/parser, go/ast, go/token, os/exec, encoding/json).

Config

Create .markguard.yaml at your repo root (optional, defaults work without it):

llm:
  base_url: "https://generativelanguage.googleapis.com/v1beta/openai"
  api_key_env: "GEMINI_API_KEY"
  model: "gemini-2.5-flash"
docs:
  paths:
    - "docs/"
    - "README.md"
  exclude:
    - "docs/roadmap.md"
  mappings:
    - docs: ["docs/api.md"]
      code: ["internal/git/", "internal/config/"]
    - docs: ["README.md"]
      code: ["cmd/", "internal/cli/"]
generate:
  # a .md file appends all packages; a directory creates one file per package
  output: "README.md"

Without .markguard.yaml, defaults are:

  • Provider: Gemini (gemini-2.5-flash)
  • API key env: GEMINI_API_KEY
  • Doc paths: docs/, README.md
  • Mappings: None (sends all docs, fine for small repos)

Development

make build     # build binary to bin/mark-guard
make test      # go test ./... -v -race
make lint      # golangci-lint run ./...
make run       # go run ./cmd/mark-guard format
make generate  # dry-run generate (preview only)
make generate-write  # generate and append to configured output

References

Project What I used it for
golang.org/x/exp/apidiff Reference for map-keyed symbol comparison and API change detection between package versions.
go/doc Grouping methods, consts, and vars under parent types.
go/parser + go/ast AST parsing without type-checking (works on raw strings from git show).
Cobra (spf13/cobra) Subcommand routing and flag parsing.
golangci-lint Reference for shelling out to git instead of pulling in a Go git library.
Gemini OpenAI compatibility ai.google.dev/gemini-api/docs/openai

What's Next

  • Support for other languages (Python, TypeScript, Rust) each needs its own parser
  • Per-edit validation before applying (currently per-file only)
  • Configurable content-loss thresholds via .markguard.yaml

About

CLI tool that detects Go code changes and updates your markdown docs using AST diffing and LLMs

Resources

License

Contributing

Stars

Watchers

Forks

Contributors

Languages