Foliant is a Go-native OCR/document intelligence project inspired by Surya and Marker.
The project is experimental. The current implementation has a pure-Go model cache, safetensors reader, tensor primitives, image text detection, selected recognition parity fixtures, image OCR wiring, fixture-limited layout and table structure paths, OCR JSON-to-Markdown rendering, constrained image/scanned-PDF/embedded-text-PDF conversion, PDF metadata inspection, a narrow image-only scanned-PDF path for pages backed by one full-page JPEG XObject, simple 8-bit Flate RGB/gray image XObject with limited predictors, or uncompressed 8-bit RGB/gray image XObject, common xref streams including a constrained Flate predictor DecodeParms subset, and a limited embedded-PDF-text path for simple hand-authored Type1/TrueType text fixtures. Broad Surya compatibility and production speed are not claimed. Full PDF rasterization/text extraction is still deferred.
- Pure Go runtime.
- No Python runtime.
- No PyTorch runtime.
- No ONNX runtime.
- No GGUF conversion.
- No llama.cpp.
- No Ollama.
- No C/C++ inference runtime.
- No CGo for inference.
- No model format conversion.
- Original Surya artifacts must be downloaded and read directly.
Development-only Python scripts may be added later only for generating or comparing fixtures against upstream Surya. They must not become runtime dependencies.
Prerequisites:
- Go 1.23 or newer.
- A writable Go build cache and model cache.
- Enough RAM for the selected experimental workflow; see
docs/MEMORY.md.
Build the CLI:
make build
./bin/foliant versionOptionally install it somewhere on PATH:
make install
foliant versionmake install installs to ~/.local/bin by default. Override the destination
with PREFIX, for example make install PREFIX=/usr/local.
Release builds can inject metadata without changing source:
go build \
-ldflags "-X main.buildVersion=v0.0.0-experimental -X main.buildCommit=$(git rev-parse --short HEAD) -X main.buildDate=$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
-o ./bin/foliant ./cmd/foliant
./bin/foliant version --jsonNo packaged binary target or OS support promise is declared yet. The project is pure Go and is expected to build on platforms supported by Go 1.23, but current validation is CPU-only local development validation, not a cross-platform release matrix. GPU/accelerator support is not implemented.
The examples below assume the built foliant binary is available on PATH.
Use --model-cache when you want an explicit model cache location; otherwise
Foliant uses the OS user cache under datalab/models.
Fetch the detection and recognition checkpoints before the first OCR run:
foliant models fetch --model text_detection --progress always
foliant models fetch --model recognition --progress alwaysRun OCR on an image and write JSON:
foliant ocr page.png --out result.json --progress alwaysConvert an image to Markdown and keep the OCR JSON sidecar:
foliant convert page.png --format markdown --out result.md --json-out result.json --progress alwaysConvert a supported scanned PDF page range:
foliant convert scanned.pdf --pages 1,3-5 --format markdown --out result.md --progress alwaysfoliant version
foliant version --jsonPrints build metadata. --json emits foliant.version.v1.
foliant models fetch --model text_detection
foliant models fetch --model recognition
foliant models validate --model text_detection
foliant inspect-model --model text_detectionmodels fetch downloads original Surya model artifacts into the local cache.
models validate checks that cached files listed by manifest.json exist.
inspect-model prints checkpoint config and safetensors metadata as JSON.
Common model flags:
--model-cache DIR override the model cache root
--base-url URL override the model artifact base URL
--workers N parallel download workers
--no-download fail if the required model is not already cachedfoliant detect page.png --out detection.json --progress alwaysdetect loads the text detection checkpoint, runs image preprocessing and
detection, then writes foliant.detection.v1 JSON with page bbox, text-line
boxes, polygons, and confidences where available.
Useful flags:
--checkpoint MODEL_OR_DIR detection checkpoint ref or local model directory
--model-cache DIR model cache root
--out PATH write JSON to a file instead of stdout
--debug-dump DIR write preprocessing debug files
--max-pixels N reject images larger than N pixels; -1 disables
--processor-size N experimental smoke/debug override
--log-level LEVEL debug, info, warn, error
--progress MODE auto, always, neverfoliant ocr page.png --out result.json --progress always
foliant ocr scanned.pdf --pages 1,3-5 --out result.json --progress alwaysocr runs detection, crops detected lines, runs recognition, and writes
foliant.ocr.v1 JSON. PDF input is limited to the supported scanned-PDF subset
described below.
Useful flags:
--detection-model MODEL_OR_DIR
--recognition-model MODEL_OR_DIR
--model-cache DIR
--out PATH
--pages LIST PDF pages, for example 1,3-5
--pdf-max-bytes N reject PDFs larger than N bytes; -1 disables
--max-pixels N reject decoded images larger than N pixels
--max-tokens N recognition token limit per line
--crop-padding N extra pixels around detected line crops
--no-download
--log-level LEVEL
--progress MODEfoliant convert page.png --format markdown --out result.md --json-out result.json --progress always
foliant convert page.png --format json --out result.json --progress always
foliant convert scanned.pdf --pages 1,3-5 --format markdown --out result.md --progress alwaysconvert is the main document-conversion command. It runs OCR for images or the
supported scanned-PDF subset and renders Markdown or JSON. Markdown currently
uses OCR line order by default.
Optional image-only experimental paths:
foliant convert page.png --layout --layout-max-tokens 1 --format markdown --out result.md
foliant convert table-page.png --tables --table-max-boxes 32 --format markdown --out result.md--layout can attach layout blocks and use layout-aware Markdown when assignment
is usable. --tables can attach fixture-limited table structure and render
conservative Markdown pipe tables. These paths are disabled for PDFs.
foliant render result.json --format markdown --out result.mdRenders existing foliant.ocr.v1 JSON to Markdown. This is useful when OCR has
already been run and only the output format needs to change.
foliant layout page.png --out layout.json --max-tokens 1
foliant table table-crop.png --out table.json --max-boxes 32layout emits experimental foliant.layout.v1 JSON for an image. table
emits experimental foliant.table.v1 JSON for one full image treated as a table
crop. Both commands are fixture-limited and should not be treated as broad
Surya/Marker parity.
foliant pdf inspect document.pdf --out pdf.json
foliant pdf extract-images scanned.pdf --out-dir pages/ --pages 1,3-5pdf inspect reports basic PDF metadata, page boxes, page count, rotation, and
parser feature flags. pdf extract-images extracts page PNGs from the supported
image-only scanned-PDF subset for debugging and fixture work.
detect, ocr, layout, table, render, convert, pdf inspect, and
pdf extract-images are experimental. Detection and recognition have selected
opt-in fixture validation against upstream Surya, but broad full-size Surya
parity is not claimed. Production speed is not claimed.
Image OCR supports PNG/JPEG inputs. PDF OCR/conversion supports only a
constrained scanned-PDF subset where each page is represented by one full-page
JPEG image XObject, simple 8-bit FlateDecode DeviceRGB/DeviceGray image
XObject with supported predictors, or uncompressed 8-bit DeviceRGB/DeviceGray
image XObject. Limited embedded PDF text extraction exists for simple fixtures.
Arbitrary PDFs, vector pages, full PDF rasterization, complex font/text extraction, forms, transparency, broad layout/table parity, PDF layout/table recognition, and broad Marker compatibility are not supported yet.
detect rejects very large images before full decode by default. Use --max-pixels to tune the limit for trusted inputs.
Use --log-level error to suppress non-error warnings in scripted detection runs.
Use --progress auto|always|never on models fetch, models validate, detect, ocr, and convert to control dependency-free stage progress on stderr. auto is quiet for non-terminal writers so JSON/Markdown stdout remains machine-readable in tests and scripts; always is useful for long local CPU runs.
For local smoke/debug runs on the naive CPU backend, detect also has an experimental --processor-size override. Leaving it unset preserves the checkpoint's Surya processor size.
By default Foliant uses the operating system user cache directory:
<user-cache-dir>/datalab/models/<model-name>/<version>
Example:
~/.cache/datalab/models/text_detection/2025_05_07
Use --model-cache to override the cache root.
Foliant does not vendor model weights. Commands that need models either use the
existing cache or download original upstream artifacts unless --no-download is
available and set for that command. Release notes must point users to
NOTICE.md before any model-backed workflow is described as usable outside local
experiments.
The current CPU backend is correctness-oriented and expensive. Local opt-in
fixtures have required multiple minutes and several GiB of RAM: the combined
OCR/layout convert fixture reached about 5.4 GiB RSS, and the table CLI lossless
fixture reached about 0.9 GiB RSS. Full-page OCR can take several minutes on a
desktop CPU. Treat these numbers as development measurements, not production
SLOs. Keep --max-pixels guards enabled for untrusted images and review
docs/MEMORY.md plus docs/PERFORMANCE.md before publishing binaries.
Required checks:
gofmt -w .
go test ./...
go vet ./...
golangci-lint run ./...
go test -race ./...
go build -o /tmp/foliant-build-check ./cmd/foliant
go list -m all
git diff --checkLarge-model integration tests must be opt-in. Normal unit tests must not download Surya models. In sandboxed environments, use a writable Go cache such as GOCACHE=/tmp/foliant-gocache; use GOLANGCI_LINT_CACHE=/tmp/foliant-golangci-lint-cache for golangci-lint if the default cache is not writable.
Use this shape for any experimental release notes:
- Version/build: include
foliant version --json, commit, build date, target OS/arch, and Go version. - Supported scope: summarize only features marked implemented or partially validated in
docs/COMPATIBILITY.md. - Fixture status: list opt-in fixtures that passed, with dates and resource notes when relevant.
- Known gaps: broad Surya/Marker compatibility, production speed, arbitrary PDFs, full PDF rasterization/text extraction, broad layout/table parity, and model-license review.
- Licensing/dependencies: link
NOTICE.mdanddocs/DEPENDENCIES.md; state that model weights are not bundled. - Validation: include the exact
go test,go vet,golangci-lint,go test -race,go build,go list -m all, andgit diff --checkcommands used.
Project code license is pending. Model artifact license risks are documented in NOTICE.md.