GitHub - blackwell-systems/gcf: GCF: Token-optimized wire format for LLMs. 100% comprehension on every frontier model. 25.5% fewer tokens than TOON, 53% fewer than JSON. 1B+ lossless round-trips. Spec v2.0 Stable.

Drop-in JSON replacement for all AI pipelines, with superpowers for graph-shaped data.

Important

Graph Compact Format (GCF): A Token-Efficient Wire Format for LLM Tool Interactions Dayna Blackwell, 2026. DOI: 10.5281/zenodo.20579817

100% comprehension on every frontier model tested. 25.5% fewer tokens than TOON, 53% fewer than JSON across 15 datasets. 90.7% on structurally complex code graphs (vs TOON 68.5%, JSON 53.6%). Proven lossless: decode(encode(value)) == value for every JSON value, verified across 1,000,000,000+ round-trips. Zero training required.

Encode any JSON payload as GCF before sending it to an LLM. Arrays, nested objects, key-value pairs, mixed types. The model reads it natively with zero format instructions. decode() converts back to JSON when a human needs to see it. Your existing JSON schemas and validators work on the decoded output unchanged.

pip install gcf-python                    # Python
npm install @blackwell-systems/gcf        # TypeScript
go get github.com/blackwell-systems/gcf-go  # Go
cargo add gcf                             # Rust

Or wrap any existing MCP server with zero code changes:

pip install gcf-proxy

Benchmarks

1,700+ LLM evaluations across 10 models, 3 providers, and 51 independent test runs.

	GCF	TOON	JSON
Comprehension (23 runs, 10 models)	90.7%	68.5%	53.6%
Generation (28 runs, 9 models)	5/5	1.0/5	5.0/5
Input tokens (500 symbols)	11,090	16,378	53,341
Output tokens (100 symbols)	5,976	8,937	16,121

GCF wins 13/15 datasets on the expanded token efficiency benchmark. Full results: gcformat.com/guide/benchmarks

Encode any structured data (generic profile)

from gcf import encode_generic

output = encode_generic({
    "employees": [
        {"id": 1, "name": "Alice", "department": "Engineering", "salary": 95000},
        {"id": 2, "name": "Bob", "department": "Sales", "salary": 72000},
        {"id": 3, "name": "Carol", "department": "Marketing", "salary": 85000},
    ],
})

GCF profile=generic
## employees [3]{id,name,department,salary}
1|Alice|Engineering|95000
2|Bob|Sales|72000
3|Carol|Marketing|85000

One header declares field names. Rows are positional values only. No field names repeated per record. Lossless: decode(encode(value)) == value for every JSON value, proven across 1,000,000,000+ random round-trips and 7.9M fuzz executions.

Graph profile (code intelligence, knowledge graphs, MCP tools)

For data with nodes, edges, and distance groups:

from gcf import encode, Payload, Symbol, Edge

output = encode(Payload(
    tool="context_for_task", token_budget=5000, tokens_used=1847,
    symbols=[
        Symbol(qualified_name="pkg.Auth", kind="function", score=0.78, provenance="lsp", distance=0),
        Symbol(qualified_name="pkg.Server", kind="function", score=0.54, provenance="lsp", distance=1),
    ],
    edges=[Edge(source="pkg.Server", target="pkg.Auth", edge_type="calls")],
))

GCF profile=graph tool=context_for_task budget=5000 tokens=1847 symbols=2 edges=1
## targets
@0 fn pkg.Auth 0.78 lsp
## related
@1 fn pkg.Server 0.54 lsp
## edges [1]
@0<@1 calls

Local IDs (@0, @1) replace full names in edges. 233 tokens instead of 965 for JSON.

Try it live in the playground with real-time three-way comparison (JSON vs TOON vs GCF).

How it works

Generic profile

Lossless JSON encoding. Arrays, nested objects, mixed types, primitives, root scalars.

Arrays of objects. ## name [count]{field1,field2} declares field names once. Rows are pipe-separated values. Absent fields use ~, null uses -.
Nested objects. ## key becomes a section. In tabular rows, nested values use ^ cell marker with .field {} attachment.
Primitive arrays. Inlined: tags[2]: admin,user. Strings containing commas are quoted.
Scalars. key=value at the top level. Strings that collide with typed literals ("true", "123", "-") are quoted automatically.
Root values. Objects, arrays, and scalars at the document root. Every JSON value has a GCF representation.

Graph profile

Positional fields. One header declares field names. Rows are values only.
Local IDs. @0, @1. Edges reference by ID, not by repeating full identifiers.
Hierarchical grouping. Section headers (## targets, ## related) replace per-record metadata.

Both profiles share the same grammar (common scalar grammar, key grammar, header format). The savings are structural and grow with payload size.

It gets cheaper over time

Session deduplication: Symbols sent in prior responses become bare references. By the 5th tool call: 92.7% savings vs JSON.

Delta encoding: When the context changes slightly between queries, send only the diff. 81.2% additional savings on re-queries.

No other format has these. They compound across multi-turn agent interactions.

Implementations

Language	Package	Repository
Go	`go get github.com/blackwell-systems/gcf-go`	gcf-go
TypeScript	`npm install @blackwell-systems/gcf`	gcf-typescript
Python	`pip install gcf-python`	gcf-python
Rust	`cargo add gcf`	gcf-rust
Swift	Swift Package Manager	gcf-swift
Kotlin	JitPack	gcf-kotlin
MCP Proxy	`pip install gcf-proxy`	gcf-proxy (bidirectional, session dedup, HTTP frontend)
Claude Code Plugin	`/plugin install`	gcf-claude-plugin (one-command install, session stats hook)
Codex Plugin	`codex plugin add`	gcf-codex-plugin (one-command install, session stats hook)
Tree-sitter	`npm install tree-sitter-gcf`	tree-sitter-gcf

Zero runtime dependencies. MIT licensed. All implementations support both generic profile (encodeGeneric) and graph profile (encode). CLI included in all 6 languages. Syntax highlighting via tree-sitter (Neovim, Helix, Zed).

Specification: SPEC v3.0 Stable with 156 conformance fixtures, 1,000,000,000+ lossless round-trips verified across 6 languages. Six implementations at v2.0.0. Cross-language 6x6 matrix verified.

Documentation

gcformat.com

Use cases

MCP tool responses. Any MCP server returning structured data. 53-71% fewer tokens with 100% comprehension accuracy.
Agent-to-agent communication. 63% fewer tokens per handoff. 5/5 generation validity on every frontier model.
LLM structured output. LLMs produce valid GCF with a 3-line primer. No training required.
Code intelligence. Graph profile with local IDs, edges, and distance grouping.

License

MIT - Dayna Blackwell

Name		Name	Last commit message	Last commit date
Latest commit History 538 Commits
.github		.github
assets		assets
docs		docs
eval		eval
scripts		scripts
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.zenodo.json		.zenodo.json
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
SESSION-DELTA-INTEROPERABILITY.md		SESSION-DELTA-INTEROPERABILITY.md
SPEC.md		SPEC.md
gcf-whitepaper.pdf		gcf-whitepaper.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Drop-in JSON replacement for all AI pipelines, with superpowers for graph-shaped data.

Benchmarks

Encode any structured data (generic profile)

Graph profile (code intelligence, knowledge graphs, MCP tools)

How it works

Generic profile

Graph profile

It gets cheaper over time

Implementations

Documentation

Use cases

License

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Drop-in JSON replacement for all AI pipelines, with superpowers for graph-shaped data.

Benchmarks

Encode any structured data (generic profile)

Graph profile (code intelligence, knowledge graphs, MCP tools)

How it works

Generic profile

Graph profile

It gets cheaper over time

Implementations

Documentation

Use cases

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Packages