5 unstable releases
| 0.4.0 | Feb 3, 2026 |
|---|---|
| 0.3.0 | Jan 8, 2026 |
| 0.2.3 | Dec 24, 2025 |
| 0.2.2 | Dec 22, 2025 |
| 0.2.0 | Dec 22, 2025 |
#2608 in Database interfaces
Used in frame-thoughtchain
275KB
6K
SLoC
CML - Content Markup Language (Rust Implementation)
CML (Content Markup Language) is an XML-based markup language with profile-based extensibility for representing structured knowledge. This is the reference Rust implementation of CML v0.1, providing parsing, generation, and embedding storage for structured documents.
Overview
CML v0.1 provides:
- ✅ Standardized structure -
<cml>/<header>/<body>/<footer>for all documents - 📋 Profile system - Domain-specific vocabularies (code, legal, wiki, etc.)
- 🗜️ Byte Punch compression - 40-70% size reduction with profile-aware dictionaries
- 🔍 Semantic search - Vector keywords and full-text indexing
- 📝 XSD schemas - Strict validation for all profiles
- 🔄 Round-trip fidelity - Parse → Generate → Parse yields identical results
Quick Start
use cml::{CmlParser, CmlGenerator, CmlDocument, Profile, CodeBody};
// Parse a CML document
let xml = std::fs::read_to_string("example.cml")?;
let doc = CmlParser::parse_cml(&xml)?;
// Generate CML
let generator = CmlGenerator;
let output = generator.generate_cml(&doc)?;
// Create a new document
let doc = CmlDocument {
version: "0.1".to_string(),
encoding: "utf-8".to_string(),
profile: Profile::Code,
header: Header {
title: "My API Docs".to_string(),
// ...
},
body: Body::Code(CodeBody { /* ... */ }),
footer: Footer::default(),
};
Profiles
code:api (v1.0 - Ratified)
For API documentation with semantic search.
Namespace: https://schemas.continuity.org/profiles/code/1.0
Elements:
<module>- Code modules/packages<struct>- Data structures<enum>- Enumerations<trait>- Traits/interfaces<function>- Free functions<method>- Methods on types<field>- Struct/enum fields
Example:
<cml version="0.1" encoding="utf-8" profile="code:api"
xmlns="https://schemas.continuity.org/cml/0.1"
xmlns:code="https://schemas.continuity.org/profiles/code/1.0">
<header>
<title>Rust Standard Library: Vec<T></title>
<identifier scheme="continuity">std.collections.vec</identifier>
</header>
<body>
<code:struct id="std.vec.Vec" name="Vec" generic="T">
<code:description vector="vector array dynamic">
A contiguous growable array type.
</code:description>
<code:method id="std.vec.Vec.push" name="push">
<code:signature>pub fn push(&mut self, value: T)</code:signature>
<code:description vector="append add">
Appends an element to the back.
</code:description>
<code:complexity>amortized O(1)</code:complexity>
</code:method>
</code:struct>
</body>
</cml>
See examples/cml/code-api-example.cml for full example.
legal:constitution (v1.0 - Ratified)
For constitutional and statutory documents.
Namespace: https://schemas.continuity.org/profiles/legal/1.0
Elements:
<preamble>- Document preamble<article>- Top-level articles<section>- Sections within articles<clause>- Individual clauses<paragraph>- Subdivisions<amendment>- Amendments to the document
Example:
<cml version="0.1" encoding="utf-8" profile="legal:constitution"
xmlns="https://schemas.continuity.org/cml/0.1"
xmlns:legal="https://schemas.continuity.org/profiles/legal/1.0">
<header>
<title>Constitution of the United States</title>
<identifier scheme="continuity">us.federal.constitution</identifier>
</header>
<body>
<legal:preamble>
We the People of the United States...
</legal:preamble>
<legal:article num="I" title="Legislative Branch" id="article-1">
<legal:section num="1" id="article-1-section-1">
<legal:clause num="1" id="article-1-section-1-clause-1">
All legislative Powers herein granted...
</legal:clause>
</legal:section>
</legal:article>
</body>
</cml>
See examples/cml/legal-constitution-example.cml for full example.
bookstack:wiki (v0.1 - Local Namespace)
For knowledge base / wiki content.
Namespace: https://local.namespace/continuity/bookstack/0.1 (pending ratification)
Elements:
<book>- Top-level book<chapter>- Chapters within books<page>- Individual pages<shelf>- Collections of books<content>- Page content (markdown/html/plain)<tags>- Metadata tags
Example:
<cml version="0.1" encoding="utf-8" profile="bookstack:wiki"
xmlns="https://schemas.continuity.org/cml/0.1"
xmlns:bookstack="https://local.namespace/continuity/bookstack/0.1">
<header>
<title>Engineering Documentation</title>
<identifier scheme="continuity">company.engineering.rust-guide</identifier>
</header>
<body>
<bookstack:book id="book-1" title="Rust Development Guide">
<bookstack:chapter id="ch-1" title="Getting Started" num="1">
<bookstack:page id="page-1" title="Setup">
<bookstack:content format="markdown"><![CDATA[
# Development Environment Setup
...
]]></bookstack:content>
<bookstack:tags>
<tag name="rust"/>
<tag name="setup"/>
</bookstack:tags>
</bookstack:page>
</bookstack:chapter>
</bookstack:book>
</body>
</cml>
See examples/cml/bookstack-wiki-example.cml for full example.
CML Structure
Root Element
All CML documents start with:
<cml version="0.1" encoding="utf-8" profile="namespace:profile">
Attributes:
version- CML version (currently "0.1")encoding- Character encoding (always "utf-8")profile- Profile identifier (e.g., "code:api", "legal:constitution")
Header Section
Required metadata about the document:
<header>
<title>Document Title</title>
<author role="author">Name</author>
<date type="created" when="2025-11-07"/>
<identifier scheme="continuity">unique.document.id</identifier>
<description>Optional summary</description>
<meta name="key" value="value"/>
<link rel="related" href="https://example.com"/>
</header>
Body Section
Profile-specific content. Structure depends on the profile.
Footer Section (Optional)
Signatures, provenance, and annotations:
<footer>
<signatures>
<signature>
<signer>Alice</signer>
<timestamp>2025-11-07T10:00:00Z</timestamp>
<algorithm>ed25519</algorithm>
<value>base64-encoded-sig</value>
</signature>
</signatures>
<provenance>
<change>
<timestamp>2025-11-07T10:00:00Z</timestamp>
<author>Bob</author>
<description>Initial creation</description>
<commit>abc123</commit>
</change>
</provenance>
<annotations>
<annotation author="Carol" target="element-id">
Note about this element
</annotation>
</annotations>
</footer>
Inline Semantic Elements
Available in all profiles:
<em>- Emphasis<strong>- Strong importance<ref target="id" type="cross">- Cross-reference<term>- Defined term<abbr>- Abbreviation<date when="2025-11-07">- Date/time reference<currency code="USD" value="100.00">- Currency amount<snip reason="redacted">- Elided content
Validation
XSD schemas are provided for strict validation:
use sam_cml::validate_document;
let doc = CmlParser::parse_cml(&xml)?;
validate_document(&doc)?; // Validates against schema
Schemas:
schemas/cml-core-0.1.xsd- Core CML structureschemas/profiles/code-api-1.0.xsd- Code profileschemas/profiles/legal-constitution-1.0.xsd- Legal profileschemas/profiles/bookstack-wiki-0.1.xsd- Bookstack profile
Byte Punch Compression
CML integrates with Byte Punch for profile-aware compression:
use byte_punch::{Compressor, Dictionary};
// Load profile dictionary
let dict = Dictionary::from_file("dictionaries/code-api.json")?;
let compressor = Compressor::new(dict);
// Compress
let compressed = compressor.compress(&cml_xml)?;
// Decompress
let decompressed = compressor.decompress(&compressed)?;
assert_eq!(cml_xml, decompressed); // 100% fidelity
Compression Results:
- Legal documents: ~65% compression
- Code documentation: ~50-60% compression
- Wiki content: ~55% compression
Testing
# Run all tests
cargo test -p sam-cml
# Run with output
cargo test -p sam-cml -- --nocapture
# Run specific test
cargo test -p sam-cml test_code_profile_roundtrip
Test Coverage:
- 42/42 tests passing ✅
- Unit tests for parser, generator, schema
- Integration tests for round-trip fidelity
- Profile-specific tests for each supported profile
Development
Project Structure
crates/sam-cml/
├── src/
│ ├── lib.rs # Public API
│ ├── types.rs # CML document types
│ ├── parser.rs # XML → Rust parsing
│ ├── generator.rs # Rust → XML generation
│ └── schema.rs # Validation logic
├── tests/
│ ├── integration_test.rs # Integration tests
│ ├── v01_tests.rs # CML v0.1 tests
│ └── v01_roundtrip_tests.rs # Round-trip tests
└── Cargo.toml
Adding a New Profile
- Define the profile in
types.rs:
pub enum Profile {
Code,
Legal,
Bookstack,
MyProfile, // Add here
}
pub enum Body {
Code(CodeBody),
Legal(LegalBody),
Bookstack(BookstackBody),
MyProfile(MyProfileBody), // Add here
}
pub struct MyProfileBody {
// Your profile structure
}
- Create XSD schema:
Create schemas/profiles/my-profile-1.0.xsd following the pattern of existing schemas.
- Add parser support:
Update parser.rs to handle your profile's elements.
- Add generator support:
Update generator.rs to output your profile's XML.
- Add tests:
Create tests in tests/ directory.
- Create dictionary:
Add crates/byte-punch/dictionaries/my-profile.json for compression.
- Create example:
Add examples/cml/my-profile-example.cml.
Migration from Legacy Format
Old <document> format is deprecated but still supported:
<!-- OLD (deprecated) -->
<document id="..." version="1.0">
<metadata>
<title>...</title>
</metadata>
<section>...</section>
</document>
<!-- NEW (CML v0.1) -->
<cml version="0.1" encoding="utf-8" profile="code:api">
<header>
<title>...</title>
<identifier scheme="continuity">...</identifier>
</header>
<body>
<!-- Profile-specific content -->
</body>
</cml>
Parser auto-detects and upgrades legacy format internally.
Related Projects
- byte-punch - Profile-aware compression (sister crate)
- sam-engram - Engram packaging (coming soon)
- rustdoc-to-cml - Generate CML from Rust docs
Documentation
- MASTER_PLAN.md - Complete implementation plan
- STATUS.md - Current project status
- XSD Schemas - Validation schemas
- Examples - Example documents
License
MIT OR Apache-2.0
Dependencies
~34MB
~553K SLoC