Token-Oriented Object Notation for Linked Data — A lossless Knowledge Graph Compression format for LLM Context Windows.
TOON-LD reduces token usage by 40-60% compared to JSON-LD, allowing you to fit twice as much structured data into your prompts for RAG (Retrieval-Augmented Generation) applications.
It works by extending standard TOON syntax with Linked Data semantics, meaning every valid TOON-LD document is also a valid TOON document. Base TOON parsers can process it natively, while TOON-LD processors unlock the full semantic graph.
The Problem: Knowledge Graphs (JSON-LD) are incredibly verbose. Using them in RAG pipelines burns through token budgets and hits context limits fast.
The Solution: TOON-LD acts as a compression layer. It combines the semantic expressiveness of RDF with radical token efficiency through tabular arrays. By eliminating repetitive keys and using CSV-like rows for uniform data, TOON-LD fits significantly more information into LLM context windows without losing structure.
- Pure TOON Extension: Every TOON-LD document is valid TOON (like JSON-LD extends JSON)
- Tabular Arrays: Serialize arrays of objects as CSV-like rows with shared headers
- 40-60% Token Reduction: Fewer tokens means lower costs and more data in context
- Full JSON-LD Compatibility: Round-trip conversion without data loss
- All JSON-LD 1.1 Keywords: Complete support for
@context,@graph,@id,@type, value nodes, etc. - Cross-Platform: Rust, WebAssembly (npm), and Python (PyPI) implementations
- High Performance: Optimized serialization with automatic tabular array detection
Real-world token savings across different dataset sizes:
| Records | JSON-LD Size | TOON-LD Size | Size Saved | Tokens Saved |
|---|---|---|---|---|
| 10 | 862 B | 518 B | 39.9% | 54.2% |
| 100 | 8,782 B | 5,109 B | 41.8% | 56.3% |
| 1,000 | 90,682 B | 53,710 B | 40.8% | 56.5% |
| 10,000 | 936,682 B | 566,711 B | 39.5% | 53.4% |
Key takeaway: Token savings scale well and are especially valuable for LLM context windows.
TOON-LD's efficiency depends on data sparsity. Shape-based partitioning (enabled by default) ensures TOON-LD remains efficient even for highly heterogeneous data.
- Low Sparsity (0-30%): Both Union and Partition approaches save ~40-50% tokens.
- High Sparsity (60%+): Partitioning significantly outperforms the Union schema, maintaining efficiency where standard tabular formats fail.
Union Schema: High cost when null_count is large (sparse data).
Partitioned Schema: Low cost when partitions have dense, non-overlapping fields.
Break-even point: ~30% sparsity threshold balances both approaches.
Partitioning excels when:
- High field diversity (heterogeneous graphs)
- Large datasets
- Mixed entity types
JSON-LD:
{
"@context": {
"foaf": "http://xmlns.com/foaf/0.1/"
},
"@graph": [
{"@id": "ex:1", "@type": "foaf:Person", "foaf:name": "Alice", "foaf:age": 30},
{"@id": "ex:2", "@type": "foaf:Person", "foaf:name": "Bob", "foaf:age": 25}
]
}TOON-LD:
@context:
foaf: http://xmlns.com/foaf/0.1/
@graph[2]{@id,@type,foaf:age,foaf:name}:
ex:1, foaf:Person, 30, Alice
ex:2, foaf:Person, 25, Bob
Notice how object keys appear once in the header instead of repeating for each object.
Just as JSON-LD extends JSON by adding semantic meaning to certain key names (those starting with @), TOON-LD extends TOON the same way:
- No new syntax: TOON-LD uses only standard TOON syntax (objects, arrays, tabular format)
- Semantic interpretation: Keys like
@context,@id,@typehave special JSON-LD meaning - Full compatibility: Any TOON parser can parse TOON-LD documents
- Value nodes: Language tags and datatypes use tabular format for efficiency
Example value node with language tag:
title[2]{@value,@language}:
The Hobbit,en
Der Hobbit,de
This is standard TOON tabular syntax that base TOON parsers handle natively, while TOON-LD processors interpret it as JSON-LD value nodes.
[dependencies]
toon-ld = "0.2"cargo install toon-clipip install toon-ldnpm install toon-ld# Convert JSON-LD to TOON-LD
toon-ld convert -i data.jsonld -o data.toon
# Convert back to JSON-LD
toon-ld convert -i data.toon -o data.jsonld
# Run benchmark
toon-ld benchmark --max-records 10000use toon_ld::{jsonld_to_toonld, toonld_to_jsonld};
let json_ld = r#"{"@context": {"foaf": "http://xmlns.com/foaf/0.1/"}, "foaf:name": "Alice"}"#;
let toon = jsonld_to_toonld(json_ld)?;
let back = toonld_to_jsonld(&toon)?;import toon_ld
json_ld = '{"@context": {"foaf": "http://xmlns.com/foaf/0.1/"}, "foaf:name": "Alice"}'
toon_str = toon_ld.convert_jsonld_to_toonld(json_ld)
json_str = toon_ld.convert_toonld_to_jsonld(toon_str)import { convert_jsonld_to_toonld, convert_toonld_to_jsonld } from 'toon-ld';
const jsonLd = '{"@context": {"foaf": "http://xmlns.com/foaf/0.1/"}, "foaf:name": "Alice"}';
const toon = convert_jsonld_to_toonld(jsonLd);
const json = convert_toonld_to_jsonld(toon);Arrays of objects share a header with field names, followed by CSV-like rows:
@context:
foaf: http://xmlns.com/foaf/0.1/
vcard: http://www.w3.org/2006/vcard/ns#
foaf:knows[3]{foaf:name,foaf:age,vcard:locality}:
Alice, 30, null
Bob, null, Portland
Carol, 28, Seattle
Language tags and datatypes use standard TOON object or tabular syntax:
@context:
dc: http://purl.org/dc/terms/
schema: http://schema.org/
xsd: http://www.w3.org/2001/XMLSchema#
dc:title:
@value: Bonjour
@language: fr
schema:datePublished:
@value: "2024-01-15"
@type: xsd:date
Or using tabular format for multiple values:
dc:titles[2]{@value,@language}:
Bonjour,fr
Hello,en
Automatic URI compaction using @context:
@context:
foaf: http://xmlns.com/foaf/0.1/
foaf:name: Alice
toon-core/- Core Rust implementationtoon-cli/- Command-line tooltoon-wasm/- WebAssembly bindings (npm)toon-py/- Python bindings (PyPI)
# Build all workspace members
cargo build --release
# Run tests
cargo test --workspace
# Build WASM package
cd toon-wasm && wasm-pack build --target web
# Build Python wheel
cd toon-py && maturin build --releaseMIT License - See LICENSE for details.