kage

kage (影, "shadow") clones a website into a self-contained folder you can browse offline, with all the JavaScript stripped out. It renders every page in headless Chrome, snapshots the final rendered DOM, removes every script and event handler, and downloads the CSS, images, and fonts and rewrites them to local paths. The result looks like the live site but runs no code: a plain folder of .html files you can open straight from disk.

kage clone example.com
kage serve kage-out/example.com

Why

Saving a page with "Save As" gives you a copy that still phones home, still runs analytics, and often renders blank because the markup is built by JavaScript at runtime. kage takes the opposite approach:

Render first, save second. Each page goes through real headless Chrome, so a page whose content is assembled by JavaScript is captured the way a human would have seen it, not as an empty shell.
Strip every script. Once the DOM is captured, kage removes all <script> tags, every on* event handler, and any javascript: URL. The saved page is inert: no tracking, no network calls, no surprises.
Keep the layout. Stylesheets, images, fonts, and media are downloaded and rewritten to relative local paths, so the offline copy looks like the original.
Stay browsable. In-scope links are rewritten to point at the other saved pages, so you can click around the mirror exactly as you would the live site.

Install

# Go
go install github.com/tamnd/kage/cmd/kage@latest

# Homebrew (once the tap is published)
brew install tamnd/tap/kage

# Container (Chromium bundled)
docker run -v "$PWD/out:/out" ghcr.io/tamnd/kage clone example.com

Prebuilt archives, .deb/.rpm/.apk packages, and a multi-arch image are attached to each release.

kage drives a real browser, so it needs Chrome or Chromium available. It finds a system install automatically; point it at a specific binary with --chrome or the KAGE_CHROME environment variable. The container image bundles Chromium.

Usage

kage clone <url> [flags]
kage serve [dir] [flags]

Clone

# Clone a whole site into kage-out/<host>/
kage clone https://example.com

# Limit the crawl
kage clone example.com --max-pages 200 --max-depth 3

# Only a section of the site
kage clone example.com --scope-prefix /docs

# Include subdomains, and trigger lazy-loaded images by scrolling
kage clone example.com --subdomains --scroll

# Resume an interrupted run (on by default; Ctrl-C saves state)
kage clone example.com

# Re-render every page in place to pull in changed content
kage clone example.com --refresh

A clone is idempotent: each page is keyed by the file it writes, so the same URL reached over http and https, with or without a trailing slash, is fetched once. Re-running resumes where it left off; --refresh re-renders in place, --force wipes and starts clean.

Common flags:

Flag	Default	Meaning
`-o, --out`	`$HOME/data/kage`	Output root; the mirror lands in `<out>/<host>/`
`-p, --max-pages`	`0`	Stop after N pages (0 = unlimited)
`-d, --max-depth`	`0`	Link-follow depth cap (0 = unlimited)
`--scope-prefix`		Only crawl pages whose path starts with this prefix
`--subdomains`	`false`	Treat subdomains of the seed host as in scope
`--exclude`		Path prefixes to skip (repeatable)
`--scroll`	`false`	Auto-scroll each page to trigger lazy loading
`--workers`	`4`	Concurrent page render workers
`--no-robots`	`false`	Ignore `robots.txt` (be polite)
`-f, --force`	`false`	Delete any existing mirror for the host first
`--chrome`		Path to the Chrome/Chromium binary

Run kage clone --help for the full list.

Serve

kage serve runs a local static file server over a cloned folder so links and assets resolve the way they would on a real host:

kage serve kage-out/example.com
# open http://127.0.0.1:8800

How it works

seed URL ─▶ headless Chrome ─▶ final DOM ─▶ strip JS ─▶ localise assets ─▶ disk
              (render)          (snapshot)   (sanitize)   (rewrite links)

A clone is a polite breadth-first crawl. Pages are rendered by a pool of Chrome tabs; assets are fetched over plain HTTP by a separate worker pool. Every URL maps deterministically to a local path, so links can be rewritten before the asset they point at has even finished downloading. The crawl honours robots.txt and seeds itself from sitemap.xml by default. Output layout:

kage-out/example.com/
├── index.html                 # the home page, scripts stripped
├── about/index.html           # /about
├── _kage/                      # reserved: assets and crawl state
│   ├── example.com/site.css    # localised stylesheet (url() rewritten)
│   ├── example.com/logo.png
│   └── state.json              # visited set, for --resume
└── ...

Building from source

git clone https://github.com/tamnd/kage
cd kage
make build          # -> bin/kage
make test           # full suite, including Chrome-driven end-to-end tests
make test-short     # skip the tests that launch a browser

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
asset		asset
browser		browser
cli		cli
clone		clone
cmd/kage		cmd/kage
docs		docs
robots		robots
sanitize		sanitize
scripts		scripts
urlx		urlx
.gitignore		.gitignore
.gitmodules		.gitmodules
.goreleaser.yaml		.goreleaser.yaml
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kage

Why

Install

Usage

Clone

Serve

How it works

Building from source

License

About

Uh oh!

Releases 14

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

kage

Why

Install

Usage

Clone

Serve

How it works

Building from source

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages