Important
Graph Compact Format (GCF): A Token-Efficient Wire Format for LLM Tool Interactions Dayna Blackwell, 2026. DOI: 10.5281/zenodo.20579817
100% comprehension on every frontier model tested. 25.5% fewer tokens than TOON, 53% fewer than JSON across 15 datasets. 90.7% on structurally complex code graphs (vs TOON 68.5%, JSON 53.6%). Proven lossless: decode(encode(value)) == value for every structured value, verified across 33,000,000,000+ round-trips in 5 formats (JSON, YAML, TOML, CSV, MessagePack). Zero training required.
Encode any structured data as GCF before sending it to an LLM. JSON, YAML, TOML, CSV, MessagePack: GCF encodes them all. The model reads it natively with zero format instructions. decode() converts back to any format when a human needs to see it. Your existing schemas and validators work on the decoded output unchanged.
pip install gcf-python # Python
npm install @blackwell-systems/gcf # TypeScript
go get github.com/blackwell-systems/gcf-go # Go
cargo add gcf # RustOr wrap any existing MCP server with zero code changes:
pip install gcf-proxy1,700+ LLM evaluations across 10 models, 3 providers, and 51 independent test runs.
| GCF | TOON | JSON | |
|---|---|---|---|
| Comprehension (23 runs, 10 models) | 90.7% | 68.5% | 53.6% |
| Generation (28 runs, 9 models) | 5/5 | 1.0/5 | 5.0/5 |
| Input tokens (500 symbols) | 11,090 | 16,378 | 53,341 |
| Output tokens (100 symbols) | 5,976 | 8,937 | 16,121 |
GCF wins 13/15 datasets on the expanded token efficiency benchmark. Full results: gcformat.com/guide/benchmarks
from gcf import encode_generic
output = encode_generic({
"employees": [
{"id": 1, "name": "Alice", "department": "Engineering", "salary": 95000},
{"id": 2, "name": "Bob", "department": "Sales", "salary": 72000},
{"id": 3, "name": "Carol", "department": "Marketing", "salary": 85000},
],
})GCF profile=generic
## employees [3]{id,name,department,salary}
1|Alice|Engineering|95000
2|Bob|Sales|72000
3|Carol|Marketing|85000
One header declares field names. Rows are positional values only. No field names repeated per record. Lossless: decode(encode(value)) == value for every structured value, proven across 33,000,000,000+ random round-trips in 5 formats and 6 languages.
For data with nodes, edges, and distance groups:
from gcf import encode, Payload, Symbol, Edge
output = encode(Payload(
tool="context_for_task", token_budget=5000, tokens_used=1847,
symbols=[
Symbol(qualified_name="pkg.Auth", kind="function", score=0.78, provenance="lsp", distance=0),
Symbol(qualified_name="pkg.Server", kind="function", score=0.54, provenance="lsp", distance=1),
],
edges=[Edge(source="pkg.Server", target="pkg.Auth", edge_type="calls")],
))GCF profile=graph tool=context_for_task budget=5000 tokens=1847 symbols=2 edges=1
## targets
@0 fn pkg.Auth 0.78 lsp
## related
@1 fn pkg.Server 0.54 lsp
## edges [1]
@0<@1 calls
Local IDs (@0, @1) replace full names in edges. 233 tokens instead of 965 for JSON.
Try it live in the playground with real-time multi-format comparison. Paste JSON, YAML, or TOML. Encode from and decode to JSON, YAML, TOML, CSV, and MessagePack.
Lossless structured data encoding. Arrays, nested objects, mixed types, primitives, root scalars. Works on any data that deserializes to objects and arrays, regardless of source format.
- Arrays of objects.
## name [count]{field1,field2}declares field names once. Rows are pipe-separated values. Absent fields use~, null uses-. - Nested objects.
## keybecomes a section. In tabular rows, nested values use^cell marker with.field {}attachment. - Primitive arrays. Inlined:
tags[2]: admin,user. Strings containing commas are quoted. - Scalars.
key=valueat the top level. Strings that collide with typed literals ("true","123","-") are quoted automatically. - Root values. Objects, arrays, and scalars at the document root. Every JSON value has a GCF representation.
- Positional fields. One header declares field names. Rows are values only.
- Local IDs.
@0,@1. Edges reference by ID, not by repeating full identifiers. - Hierarchical grouping. Section headers (
## targets,## related) replace per-record metadata.
Both profiles share the same grammar (common scalar grammar, key grammar, header format). The savings are structural and grow with payload size.
Session deduplication: Symbols sent in prior responses become bare references. By the 5th tool call: 92.7% savings vs JSON.
Delta encoding: When the context changes slightly between queries, send only the diff. 81.2% additional savings on re-queries.
No other format has these. They compound across multi-turn agent interactions.
| Language | Package | Repository |
|---|---|---|
| Go | go get github.com/blackwell-systems/gcf-go |
gcf-go |
| TypeScript | npm install @blackwell-systems/gcf |
gcf-typescript |
| Python | pip install gcf-python |
gcf-python |
| Rust | cargo add gcf |
gcf-rust |
| Swift | Swift Package Manager | gcf-swift |
| Kotlin | JitPack | gcf-kotlin |
| MCP Proxy | pip install gcf-proxy |
gcf-proxy (bidirectional, session dedup, HTTP frontend) |
| Claude Code Plugin | /plugin install |
gcf-claude-plugin (one-command install, session stats hook) |
| Codex Plugin | codex plugin add |
gcf-codex-plugin (one-command install, session stats hook) |
| VS Code | ext install blackwell-systems.gcf-vscode |
gcf-vscode (syntax highlighting) |
| n8n | npm install n8n-nodes-gcf |
gcf-n8n-nodes (workflow encode/decode) |
| JetBrains | Search "GCF" in Plugins | gcf-jetbrains (IntelliJ, PyCharm, WebStorm, GoLand) |
| Tree-sitter | npm install tree-sitter-gcf |
tree-sitter-gcf |
Zero runtime dependencies. MIT licensed. All implementations support both generic profile (encodeGeneric) and graph profile (encode). CLI included in all 6 languages. Syntax highlighting via tree-sitter (Neovim, Helix, Zed).
Specification: SPEC v3.1 Stable with 157 conformance fixtures, 33,000,000,000+ lossless round-trips verified across 5 formats and 6 languages. All implementations at v2.1.0+ (Go v1.2.0). Cross-language 6x6 matrix verified.
- Getting Started
- Benchmarks
- Benchmarks (Full Data)
- GCF vs TOON
- Schema Validation
- FAQ
- Independent AI Reviews
- Playground
- Specification
- MCP tool responses. Any MCP server returning structured data. 53-71% fewer tokens with 100% comprehension accuracy.
- Agent-to-agent communication. 63% fewer tokens per handoff. 5/5 generation validity on every frontier model.
- LLM structured output. LLMs produce valid GCF with a 3-line primer. No training required.
- Code intelligence. Graph profile with local IDs, edges, and distance grouping.
MIT - Dayna Blackwell