#xpath #xml #streaming-parser #xsd #parser

bin+lib fastxml

A fast, memory-efficient XML library with XPath and XSD validation support

15 releases (7 breaking)

Uses new Rust 2024

0.8.0 Feb 5, 2026
0.7.1 Feb 5, 2026
0.6.6 Feb 4, 2026
0.5.0 Feb 3, 2026
0.1.0 Feb 1, 2026

#378 in Parser implementations

MIT/Apache

1.5MB
31K SLoC

fastxml

CI Crates.io docs.rs License

A fast, memory-efficient XML library for Rust with XPath and schema validation support. Designed for processing large XML documents like CityGML files used in PLATEAU.

Features

  • 🦀 Pure Rust — No C dependencies, no unsafe code
  • 🔄 libxml Compatible — Consistent parsing/XPath results
  • 💾 Memory Efficient — Parse and validate gigabyte-scale XML with ~1 MB memory footprint
  • 🔍 Full XPath 1.0 — Complete XPath 1.0 support with namespace handling
  • 📋 XSD Support — Schema parsing with import resolution, built-in GML types
  • Async Support — Async schema fetching and resolution with tokio

⚠️ Early Development (v0.x): API may change. Limited production experience. Not recommended for business-critical systems. Use at your own risk.

Performance

Benchmark on PLATEAU DEM GML (907 MB, 31M nodes) — benchmark code:

Parse only:

Mode Time Throughput Memory
libxml DOM 7.11s 128 MB/s 4.19 GB
fastxml DOM 8.0s 114 MB/s 805 MB
fastxml Streaming 4.75s 191 MB/s ~1 MB

Parse + Schema Validation:

Mode Time Throughput Memory
libxml DOM + validate 11.10s 82 MB/s 3.64 GB
fastxml DOM + validate 38.2s 24 MB/s 1.96 GB
fastxml Streaming + validate 15.9s 57 MB/s ~25 MB
  • DOM: 5.2x less memory than libxml
  • Streaming parse + validate: 57 MB/s throughput with ~25 MB memory regardless of file size

Installation

[dependencies]
fastxml = "0.8"

Cargo Features

Feature Description
ureq Sync HTTP client for schema fetching (recommended)
tokio Async HTTP client for schema fetching (reqwest + tokio)
async-trait Async trait support for custom implementations
compare-libxml Enable libxml2 comparison tests
# Recommended: sync schema fetching
fastxml = { version = "0.7", features = ["ureq"] }

# Async schema fetching
fastxml = { version = "0.7", features = ["tokio"] }

Schema Fetchers

Fetcher Description
FileFetcher Local filesystem
UreqFetcher Sync HTTP (requires ureq)
ReqwestFetcher Async HTTP (requires tokio)
DefaultFetcher File + sync HTTP combined with built-in caching (requires ureq for HTTP)
AsyncDefaultFetcher File + async HTTP combined with built-in caching (requires tokio)
CachingFetcher Wraps any sync fetcher with in-memory caching
AsyncCachingFetcher Wraps any async fetcher with in-memory caching (requires tokio)
FileCachingFetcher Wraps any sync fetcher with file-based caching (temp directory)
AsyncFileCachingFetcher Wraps any async fetcher with file-based caching (requires tokio)

Traits:

Trait Description
SchemaFetcher Sync fetcher trait
AsyncSchemaFetcher Async fetcher trait (requires tokio)
use fastxml::schema::{DefaultFetcher, SchemaFetcher};

let fetcher = DefaultFetcher::with_base_dir("/path/to/schemas");
let result = fetcher.fetch("schema.xsd")?;

Quick Start

DOM Parsing

use fastxml::{parse, evaluate};

let xml = r#"<root><item id="1">Hello</item><item id="2">World</item></root>"#;

let doc = parse(xml.as_bytes())?;
let result = evaluate(&doc, "//item")?;
for node in result.into_nodes() {
    println!("{}: {}", node.get_attribute("id").unwrap(), node.get_content().unwrap());
}

Streaming Parser

Process large files with minimal memory:

use fastxml::event::{StreamingParser, XmlEvent, XmlEventHandler};
use std::io::BufReader;
use std::fs::File;

struct Counter { count: usize }

impl XmlEventHandler for Counter {
    fn handle(&mut self, event: &XmlEvent) -> fastxml::error::Result<()> {
        if let XmlEvent::StartElement { .. } = event {
            self.count += 1;
        }
        Ok(())
    }
}

let file = File::open("large_file.xml")?;
let mut parser = StreamingParser::new(BufReader::new(file));
parser.add_handler(Box::new(Counter { count: 0 }));
parser.parse()?;

Stream Transform

Transform XML with XPath-based element selection:

use fastxml::transform::StreamTransformer;

let xml = r#"<root><item id="1">A</item><item id="2">B</item></root>"#;

// Modify elements (supports multiple handlers)
let result = StreamTransformer::new(xml)
    .on("//item[@id='2']", |node| node.set_attribute("modified", "true"))
    .run()?
    .to_string()?;

// Extract data (single XPath)
let ids: Vec<String> = StreamTransformer::new(xml)
    .collect("//item", |node| node.get_attribute("id").unwrap_or_default())?;

// Extract data from multiple XPaths in a single pass
let (ids, contents): (Vec<String>, Vec<String>) = StreamTransformer::new(xml)
    .collect_multi((
        ("//item", |node| node.get_attribute("id").unwrap_or_default()),
        ("//item", |node| node.get_content().unwrap_or_default()),
    ))?;

// Iterate for side effects (no output transformation)
let mut ids = Vec::new();
StreamTransformer::new(xml)
    .on("//item", |node| {
        ids.push(node.get_attribute("id").unwrap_or_default());
    })
    .for_each()?;

Reader-based Transform (Large Files)

For large XML files, use StreamTransformerReader to avoid loading the entire file into memory. It reads from any BufRead source and writes results incrementally:

use fastxml::transform::StreamTransformerReader;
use std::io::{BufReader, BufWriter};
use std::fs::File;

let reader = BufReader::new(File::open("large_file.xml")?);
let mut output = BufWriter::new(File::create("output.xml")?);

// Transform and write to output
let count = StreamTransformerReader::new(reader)
    .on("//item[@id='2']", |node| node.set_attribute("modified", "true"))
    .run_to_writer(&mut output)?;

println!("Transformed {} elements", count);

// Or iterate for side effects only (no output)
let reader = BufReader::new(File::open("large_file.xml")?);
let mut ids = Vec::new();
StreamTransformerReader::new(reader)
    .on("//item", |node| {
        ids.push(node.get_attribute("id").unwrap_or_default());
    })
    .for_each()?;

Auto-detect Namespaces

Extract namespace declarations from the root element without DOM parsing:

let xml = r#"<root xmlns:gml="http://www.opengis.net/gml"><gml:point/></root>"#;

StreamTransformer::new(xml)
    .with_root_namespaces()?  // Auto-registers namespaces from root element
    .on("//gml:point", |node| node.set_attribute("found", "true"))
    .run()?;

Namespace URI Matching

Match elements by namespace URI instead of prefix (useful when different prefixes map to the same URI):

// Matches both gml:feature and g:feature if they have the same namespace URI
StreamTransformer::new(xml)
    .namespace("gml", "http://www.opengis.net/gml")
    .on("//*[namespace-uri()='http://www.opengis.net/gml'][local-name()='feature']", |node| {
        // Matches any prefix that maps to this URI
    })
    .run()?;

Parent Context Access

Access ancestor elements' information during streaming transformation:

StreamTransformer::new(xml)
    .on_with_context("//item", |node, ctx| {
        // Get parent element info
        if let Some(parent) = ctx.parent() {
            node.set_attribute("parent_name", &parent.name);
        }

        // Get path-based ID (e.g., "root/items/item[2]")
        let path = ctx.path_id();
        node.set_attribute("path", &format!("{}/item[{}]", path, ctx.position()));
    })
    .run()?;

XPath Streamability Check

Check if an XPath can be processed in a single streaming pass:

use fastxml::transform::{is_streamable, analyze_xpath_str, XPathAnalysis};

// Quick check
if is_streamable("//item[@id='1']") {
    println!("Single-pass streaming OK");
}

// Detailed analysis
match analyze_xpath_str("//item[last()]")? {
    XPathAnalysis::Streamable(_) => println!("Streamable"),
    XPathAnalysis::NotStreamable(reason) => {
        println!("Not streamable: {}", reason);
        // Output: "Not streamable: uses last() function which requires knowing total count"
    }
}

Fallback Control

By default, non-streamable XPath expressions return an error. Enable fallback for two-pass processing:

// Default: error on non-streamable XPath
let result = StreamTransformer::new(xml)
    .on("//item[last()]", |_| {})
    .run();
// => Err(NotStreamable { ... })

// Enable fallback (loads entire document into memory)
let result = StreamTransformer::new(xml)
    .allow_fallback()
    .on("//item[last()]", |_| {})
    .run()?;

Async Schema Resolution

Parse XSD schemas with async import/include resolution (requires tokio feature):

use fastxml::schema::{
    AsyncDefaultFetcher,
    parse_xsd_with_imports_async,
};

#[tokio::main]
async fn main() -> fastxml::error::Result<()> {
    let xsd_content = std::fs::read("schema.xsd")?;

    // Create async fetcher
    let fetcher = AsyncDefaultFetcher::new()?;

    // Parse schema with async import resolution
    let schema = parse_xsd_with_imports_async(
        &xsd_content,
        "http://example.com/schema.xsd",
        &fetcher,
    ).await?;

    println!("Parsed {} types", schema.types.len());
    Ok(())
}

The async resolver:

  • Fetches imported schemas asynchronously via HTTP
  • Resolves nested imports (A → B → C)
  • Detects circular dependencies

See examples/async_schema_resolution.rs for more examples.

Schema Validation

DOM Validation

use fastxml::{parse, validate_document_by_schema};

let doc = parse(std::fs::read("document.xml")?.as_slice())?;
let errors = validate_document_by_schema(&doc, "schema.xsd".to_string())?;

if errors.is_empty() {
    println!("Valid!");
}

Streaming Validation

Validate during parsing with minimal memory:

use fastxml::schema::StreamValidator;
use std::sync::Arc;

let schema = Arc::new(fastxml::schema::parse_xsd(&std::fs::read("schema.xsd")?)?);
let reader = std::io::BufReader::new(file);

let errors = StreamValidator::new(schema)
    .with_max_errors(100)
    .validate(reader)?;

Auto-detect Schema

Fetch schemas from xsi:schemaLocation automatically (requires ureq feature):

use fastxml::{parse, validate_with_schema_location};

let doc = parse(xml_bytes)?;
let errors = validate_with_schema_location(&doc)?;

For streaming:

use fastxml::streaming_validate_with_schema_location;

let errors = streaming_validate_with_schema_location(reader)?;

Async Validation

Validate with async schema fetching (requires tokio feature):

use fastxml::{parse, validate_with_schema_location_async};

#[tokio::main]
async fn main() -> fastxml::error::Result<()> {
    let doc = parse(xml_bytes)?;
    let errors = validate_with_schema_location_async(&doc).await?;
    Ok(())
}

Or get the compiled schema for reuse:

use fastxml::get_schema_from_schema_location_async;

let schema = get_schema_from_schema_location_async(&xml_bytes).await?;

Validation Errors

use fastxml::ErrorLevel;

for error in &errors {
    match error.level {
        ErrorLevel::Warning => print!("[WARN] "),
        ErrorLevel::Error => print!("[ERROR] "),
        ErrorLevel::Fatal => print!("[FATAL] "),
    }
    if let Some(line) = error.line {
        print!("line {}: ", line);
    }
    println!("{}", error.message);
}

XPath

Basic Usage

use fastxml::{parse, evaluate};

let doc = parse(xml)?;
let result = evaluate(&doc, "//item[@id='1']/text()")?;

With Namespaces

let xml = r#"
<core:CityModel xmlns:core="http://www.opengis.net/citygml/2.0"
                xmlns:bldg="http://www.opengis.net/citygml/building/2.0">
    <bldg:Building gml:id="bldg_001">
        <bldg:measuredHeight>25.5</bldg:measuredHeight>
    </bldg:Building>
</core:CityModel>"#;

let doc = parse(xml.as_bytes())?;
let buildings = evaluate(&doc, "//bldg:Building")?;

Supported Specifications

XPath 1.0

Feature Examples
Paths /root/child, //element, //*
Predicates [@id='1'], [position()=1], [name()='foo']
Axes ancestor::, following-sibling::, namespace::
Operators and, or, not(), =, !=, <, >, +, -, *, div, mod
Functions count(), contains(), string(), number(), sum(), etc.
Namespaces //ns:element, namespace::*
Variables $var
Union `//a

XSD Schema

Feature Support
Element/attribute definitions
Complex types (sequence/choice/all)
Simple types (restriction/list/union)
Type inheritance
Facets
Attribute/model groups
import/include/redefine
Built-in XSD and GML types
Identity constraints (unique/key/keyref)
Substitution groups

Not Supported

  • XQuery, XSLT, XInclude
  • DTD validation
  • XML Signature/Encryption
  • Catalog support
  • Full entity expansion

Development

cargo test                              # Run tests
cargo test --features tokio             # With async tests
cargo test --features compare-libxml    # With libxml comparison
cargo bench                             # Benchmarks

Examples

# Async schema resolution
cargo run --example async_schema_resolution --features tokio

# Schema validation
cargo run --example schema_validation --features ureq

# Benchmark CLI
cargo run --release --example bench -- ./file.xml
cargo run --release --features ureq --example bench -- ./file.xml --validate

License

MIT OR Apache-2.0

Dependencies

~14–35MB
~440K SLoC