Skip to content

argahsuknesib/toon-ld

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TOON-LD

npm PyPI Crates.io

Token-Oriented Object Notation for Linked Data — A lossless Knowledge Graph Compression format for LLM Context Windows.

TOON-LD reduces token usage by 40-60% compared to JSON-LD, allowing you to fit twice as much structured data into your prompts for RAG (Retrieval-Augmented Generation) applications.

It works by extending standard TOON syntax with Linked Data semantics, meaning every valid TOON-LD document is also a valid TOON document. Base TOON parsers can process it natively, while TOON-LD processors unlock the full semantic graph.

Why TOON-LD?

The Problem: Knowledge Graphs (JSON-LD) are incredibly verbose. Using them in RAG pipelines burns through token budgets and hits context limits fast.

The Solution: TOON-LD acts as a compression layer. It combines the semantic expressiveness of RDF with radical token efficiency through tabular arrays. By eliminating repetitive keys and using CSV-like rows for uniform data, TOON-LD fits significantly more information into LLM context windows without losing structure.

Features

  • Pure TOON Extension: Every TOON-LD document is valid TOON (like JSON-LD extends JSON)
  • Tabular Arrays: Serialize arrays of objects as CSV-like rows with shared headers
  • 40-60% Token Reduction: Fewer tokens means lower costs and more data in context
  • Full JSON-LD Compatibility: Round-trip conversion without data loss
  • All JSON-LD 1.1 Keywords: Complete support for @context, @graph, @id, @type, value nodes, etc.
  • Cross-Platform: Rust, WebAssembly (npm), and Python (PyPI) implementations
  • High Performance: Optimized serialization with automatic tabular array detection

Benchmarks

Real-world token savings across different dataset sizes:

Records JSON-LD Size TOON-LD Size Size Saved Tokens Saved
10 862 B 518 B 39.9% 54.2%
100 8,782 B 5,109 B 41.8% 56.3%
1,000 90,682 B 53,710 B 40.8% 56.5%
10,000 936,682 B 566,711 B 39.5% 53.4%

Key takeaway: Token savings scale well and are especially valuable for LLM context windows.

Sparsity Analysis

TOON-LD's efficiency depends on data sparsity. Shape-based partitioning (enabled by default) ensures TOON-LD remains efficient even for highly heterogeneous data.

Token Efficiency Graph

Savings Percentage Graph

  • Low Sparsity (0-30%): Both Union and Partition approaches save ~40-50% tokens.
  • High Sparsity (60%+): Partitioning significantly outperforms the Union schema, maintaining efficiency where standard tabular formats fail.

Token Cost Analysis

Union Schema: High cost when null_count is large (sparse data). Partitioned Schema: Low cost when partitions have dense, non-overlapping fields.

Break-even point: ~30% sparsity threshold balances both approaches.

Partitioning excels when:

  • High field diversity (heterogeneous graphs)
  • Large datasets
  • Mixed entity types

Quick Example

JSON-LD:

{
  "@context": {
    "foaf": "http://xmlns.com/foaf/0.1/"
  },
  "@graph": [
    {"@id": "ex:1", "@type": "foaf:Person", "foaf:name": "Alice", "foaf:age": 30},
    {"@id": "ex:2", "@type": "foaf:Person", "foaf:name": "Bob", "foaf:age": 25}
  ]
}

TOON-LD:

@context:
  foaf: http://xmlns.com/foaf/0.1/
@graph[2]{@id,@type,foaf:age,foaf:name}:
  ex:1, foaf:Person, 30, Alice
  ex:2, foaf:Person, 25, Bob

Notice how object keys appear once in the header instead of repeating for each object.

How TOON-LD Extends TOON

Just as JSON-LD extends JSON by adding semantic meaning to certain key names (those starting with @), TOON-LD extends TOON the same way:

  • No new syntax: TOON-LD uses only standard TOON syntax (objects, arrays, tabular format)
  • Semantic interpretation: Keys like @context, @id, @type have special JSON-LD meaning
  • Full compatibility: Any TOON parser can parse TOON-LD documents
  • Value nodes: Language tags and datatypes use tabular format for efficiency

Example value node with language tag:

title[2]{@value,@language}:
  The Hobbit,en
  Der Hobbit,de

This is standard TOON tabular syntax that base TOON parsers handle natively, while TOON-LD processors interpret it as JSON-LD value nodes.

Installation

Rust

[dependencies]
toon-ld = "0.2"

CLI

cargo install toon-cli

Python

pip install toon-ld

JavaScript/TypeScript

npm install toon-ld

Quick Start

CLI

# Convert JSON-LD to TOON-LD
toon-ld convert -i data.jsonld -o data.toon

# Convert back to JSON-LD
toon-ld convert -i data.toon -o data.jsonld

# Run benchmark
toon-ld benchmark --max-records 10000

Rust

use toon_ld::{jsonld_to_toonld, toonld_to_jsonld};

let json_ld = r#"{"@context": {"foaf": "http://xmlns.com/foaf/0.1/"}, "foaf:name": "Alice"}"#;
let toon = jsonld_to_toonld(json_ld)?;
let back = toonld_to_jsonld(&toon)?;

Python

import toon_ld

json_ld = '{"@context": {"foaf": "http://xmlns.com/foaf/0.1/"}, "foaf:name": "Alice"}'
toon_str = toon_ld.convert_jsonld_to_toonld(json_ld)
json_str = toon_ld.convert_toonld_to_jsonld(toon_str)

JavaScript

import { convert_jsonld_to_toonld, convert_toonld_to_jsonld } from 'toon-ld';

const jsonLd = '{"@context": {"foaf": "http://xmlns.com/foaf/0.1/"}, "foaf:name": "Alice"}';
const toon = convert_jsonld_to_toonld(jsonLd);
const json = convert_toonld_to_jsonld(toon);

Key Concepts

Tabular Arrays

Arrays of objects share a header with field names, followed by CSV-like rows:

@context:
  foaf: http://xmlns.com/foaf/0.1/
  vcard: http://www.w3.org/2006/vcard/ns#
foaf:knows[3]{foaf:name,foaf:age,vcard:locality}:
  Alice, 30, null
  Bob, null, Portland
  Carol, 28, Seattle

Value Nodes

Language tags and datatypes use standard TOON object or tabular syntax:

@context:
  dc: http://purl.org/dc/terms/
  schema: http://schema.org/
  xsd: http://www.w3.org/2001/XMLSchema#
dc:title:
  @value: Bonjour
  @language: fr
schema:datePublished:
  @value: "2024-01-15"
  @type: xsd:date

Or using tabular format for multiple values:

dc:titles[2]{@value,@language}:
  Bonjour,fr
  Hello,en

Context Support

Automatic URI compaction using @context:

@context:
  foaf: http://xmlns.com/foaf/0.1/
foaf:name: Alice

Project Structure

  • toon-core/ - Core Rust implementation
  • toon-cli/ - Command-line tool
  • toon-wasm/ - WebAssembly bindings (npm)
  • toon-py/ - Python bindings (PyPI)

Building from Source

# Build all workspace members
cargo build --release

# Run tests
cargo test --workspace

# Build WASM package
cd toon-wasm && wasm-pack build --target web

# Build Python wheel
cd toon-py && maturin build --release

License

MIT License - See LICENSE for details.