Important
Graph Compact Format (GCF): A Token-Efficient Wire Format for LLM Tool Interactions Dayna Blackwell, 2026. DOI: 10.5281/zenodo.20579817
100% comprehension on every frontier model tested. 25.5% fewer tokens than TOON, 53% fewer than JSON across 15 datasets. 90.7% on structurally complex code graphs (vs TOON 68.5%, JSON 53.6%). Proven lossless: decode(encode(value)) == value for every JSON value, verified across 1,000,000,000+ round-trips. Zero training required.
Encode any JSON payload as GCF before sending it to an LLM. Arrays, nested objects, key-value pairs, mixed types. The model reads it natively with zero format instructions. decode() converts back to JSON when a human needs to see it. Your existing JSON schemas and validators work on the decoded output unchanged.
pip install gcf-python # Python
npm install @blackwell-systems/gcf # TypeScript
go get github.com/blackwell-systems/gcf-go # Go
cargo add gcf # RustOr wrap any existing MCP server with zero code changes:
pip install gcf-proxy1,700+ LLM evaluations across 10 models, 3 providers, and 51 independent test runs.
| GCF | TOON | JSON | |
|---|---|---|---|
| Comprehension (23 runs, 10 models) | 90.7% | 68.5% | 53.6% |
| Generation (28 runs, 9 models) | 5/5 | 1.0/5 | 5.0/5 |
| Input tokens (500 symbols) | 11,090 | 16,378 | 53,341 |
| Output tokens (100 symbols) | 5,976 | 8,937 | 16,121 |
GCF wins 13/15 datasets on the expanded token efficiency benchmark. Full results: gcformat.com/guide/benchmarks
from gcf import encode_generic
output = encode_generic({
"employees": [
{"id": 1, "name": "Alice", "department": "Engineering", "salary": 95000},
{"id": 2, "name": "Bob", "department": "Sales", "salary": 72000},
{"id": 3, "name": "Carol", "department": "Marketing", "salary": 85000},
],
})GCF profile=generic
## employees [3]{id,name,department,salary}
1|Alice|Engineering|95000
2|Bob|Sales|72000
3|Carol|Marketing|85000
One header declares field names. Rows are positional values only. No field names repeated per record. Lossless: decode(encode(value)) == value for every JSON value, proven across 1,000,000,000+ random round-trips and 7.9M fuzz executions.
For data with nodes, edges, and distance groups:
from gcf import encode, Payload, Symbol, Edge
output = encode(Payload(
tool="context_for_task", token_budget=5000, tokens_used=1847,
symbols=[
Symbol(qualified_name="pkg.Auth", kind="function", score=0.78, provenance="lsp", distance=0),
Symbol(qualified_name="pkg.Server", kind="function", score=0.54, provenance="lsp", distance=1),
],
edges=[Edge(source="pkg.Server", target="pkg.Auth", edge_type="calls")],
))GCF profile=graph tool=context_for_task budget=5000 tokens=1847 symbols=2 edges=1
## targets
@0 fn pkg.Auth 0.78 lsp
## related
@1 fn pkg.Server 0.54 lsp
## edges [1]
@0<@1 calls
Local IDs (@0, @1) replace full names in edges. 233 tokens instead of 965 for JSON.
Try it live in the playground with real-time three-way comparison (JSON vs TOON vs GCF).
Lossless JSON encoding. Arrays, nested objects, mixed types, primitives, root scalars.
- Arrays of objects.
## name [count]{field1,field2}declares field names once. Rows are pipe-separated values. Absent fields use~, null uses-. - Nested objects.
## keybecomes a section. In tabular rows, nested values use^cell marker with.field {}attachment. - Primitive arrays. Inlined:
tags[2]: admin,user. Strings containing commas are quoted. - Scalars.
key=valueat the top level. Strings that collide with typed literals ("true","123","-") are quoted automatically. - Root values. Objects, arrays, and scalars at the document root. Every JSON value has a GCF representation.
- Positional fields. One header declares field names. Rows are values only.
- Local IDs.
@0,@1. Edges reference by ID, not by repeating full identifiers. - Hierarchical grouping. Section headers (
## targets,## related) replace per-record metadata.
Both profiles share the same grammar (common scalar grammar, key grammar, header format). The savings are structural and grow with payload size.
Session deduplication: Symbols sent in prior responses become bare references. By the 5th tool call: 92.7% savings vs JSON.
Delta encoding: When the context changes slightly between queries, send only the diff. 81.2% additional savings on re-queries.
No other format has these. They compound across multi-turn agent interactions.
| Language | Package | Repository |
|---|---|---|
| Go | go get github.com/blackwell-systems/gcf-go |
gcf-go |
| TypeScript | npm install @blackwell-systems/gcf |
gcf-typescript |
| Python | pip install gcf-python |
gcf-python |
| Rust | cargo add gcf |
gcf-rust |
| Swift | Swift Package Manager | gcf-swift |
| Kotlin | JitPack | gcf-kotlin |
| MCP Proxy | pip install gcf-proxy |
gcf-proxy (bidirectional, session dedup, HTTP frontend) |
| Claude Code Plugin | /plugin install |
gcf-claude-plugin (one-command install, session stats hook) |
| Codex Plugin | codex plugin add |
gcf-codex-plugin (one-command install, session stats hook) |
| Tree-sitter | npm install tree-sitter-gcf |
tree-sitter-gcf |
Zero runtime dependencies. MIT licensed. All implementations support both generic profile (encodeGeneric) and graph profile (encode). CLI included in all 6 languages. Syntax highlighting via tree-sitter (Neovim, Helix, Zed).
Specification: SPEC v3.0 Stable with 156 conformance fixtures, 1,000,000,000+ lossless round-trips verified across 6 languages. Six implementations at v2.0.0. Cross-language 6x6 matrix verified.
- Getting Started
- Benchmarks
- Benchmarks (Full Data)
- GCF vs TOON
- Schema Validation
- FAQ
- Independent AI Reviews
- Playground
- Specification
- MCP tool responses. Any MCP server returning structured data. 53-71% fewer tokens with 100% comprehension accuracy.
- Agent-to-agent communication. 63% fewer tokens per handoff. 5/5 generation validity on every frontier model.
- LLM structured output. LLMs produce valid GCF with a 3-line primer. No training required.
- Code intelligence. Graph profile with local IDs, edges, and distance grouping.
MIT - Dayna Blackwell