15 releases (7 breaking)
Uses new Rust 2024
| 0.8.0 | Feb 5, 2026 |
|---|---|
| 0.7.1 | Feb 5, 2026 |
| 0.6.6 | Feb 4, 2026 |
| 0.5.0 | Feb 3, 2026 |
| 0.1.0 | Feb 1, 2026 |
#378 in Parser implementations
1.5MB
31K
SLoC
fastxml
A fast, memory-efficient XML library for Rust with XPath and schema validation support. Designed for processing large XML documents like CityGML files used in PLATEAU.
Features
- 🦀 Pure Rust — No C dependencies, no unsafe code
- 🔄 libxml Compatible — Consistent parsing/XPath results
- 💾 Memory Efficient — Parse and validate gigabyte-scale XML with ~1 MB memory footprint
- 🔍 Full XPath 1.0 — Complete XPath 1.0 support with namespace handling
- 📋 XSD Support — Schema parsing with import resolution, built-in GML types
- ⚡ Async Support — Async schema fetching and resolution with tokio
⚠️ Early Development (v0.x): API may change. Limited production experience. Not recommended for business-critical systems. Use at your own risk.
Performance
Benchmark on PLATEAU DEM GML (907 MB, 31M nodes) — benchmark code:
Parse only:
| Mode | Time | Throughput | Memory |
|---|---|---|---|
| libxml DOM | 7.11s | 128 MB/s | 4.19 GB |
| fastxml DOM | 8.0s | 114 MB/s | 805 MB |
| fastxml Streaming | 4.75s | 191 MB/s | ~1 MB |
Parse + Schema Validation:
| Mode | Time | Throughput | Memory |
|---|---|---|---|
| libxml DOM + validate | 11.10s | 82 MB/s | 3.64 GB |
| fastxml DOM + validate | 38.2s | 24 MB/s | 1.96 GB |
| fastxml Streaming + validate | 15.9s | 57 MB/s | ~25 MB |
- DOM: 5.2x less memory than libxml
- Streaming parse + validate: 57 MB/s throughput with ~25 MB memory regardless of file size
Installation
[dependencies]
fastxml = "0.8"
Cargo Features
| Feature | Description |
|---|---|
ureq |
Sync HTTP client for schema fetching (recommended) |
tokio |
Async HTTP client for schema fetching (reqwest + tokio) |
async-trait |
Async trait support for custom implementations |
compare-libxml |
Enable libxml2 comparison tests |
# Recommended: sync schema fetching
fastxml = { version = "0.7", features = ["ureq"] }
# Async schema fetching
fastxml = { version = "0.7", features = ["tokio"] }
Schema Fetchers
| Fetcher | Description |
|---|---|
FileFetcher |
Local filesystem |
UreqFetcher |
Sync HTTP (requires ureq) |
ReqwestFetcher |
Async HTTP (requires tokio) |
DefaultFetcher |
File + sync HTTP combined with built-in caching (requires ureq for HTTP) |
AsyncDefaultFetcher |
File + async HTTP combined with built-in caching (requires tokio) |
CachingFetcher |
Wraps any sync fetcher with in-memory caching |
AsyncCachingFetcher |
Wraps any async fetcher with in-memory caching (requires tokio) |
FileCachingFetcher |
Wraps any sync fetcher with file-based caching (temp directory) |
AsyncFileCachingFetcher |
Wraps any async fetcher with file-based caching (requires tokio) |
Traits:
| Trait | Description |
|---|---|
SchemaFetcher |
Sync fetcher trait |
AsyncSchemaFetcher |
Async fetcher trait (requires tokio) |
use fastxml::schema::{DefaultFetcher, SchemaFetcher};
let fetcher = DefaultFetcher::with_base_dir("/path/to/schemas");
let result = fetcher.fetch("schema.xsd")?;
Quick Start
DOM Parsing
use fastxml::{parse, evaluate};
let xml = r#"<root><item id="1">Hello</item><item id="2">World</item></root>"#;
let doc = parse(xml.as_bytes())?;
let result = evaluate(&doc, "//item")?;
for node in result.into_nodes() {
println!("{}: {}", node.get_attribute("id").unwrap(), node.get_content().unwrap());
}
Streaming Parser
Process large files with minimal memory:
use fastxml::event::{StreamingParser, XmlEvent, XmlEventHandler};
use std::io::BufReader;
use std::fs::File;
struct Counter { count: usize }
impl XmlEventHandler for Counter {
fn handle(&mut self, event: &XmlEvent) -> fastxml::error::Result<()> {
if let XmlEvent::StartElement { .. } = event {
self.count += 1;
}
Ok(())
}
}
let file = File::open("large_file.xml")?;
let mut parser = StreamingParser::new(BufReader::new(file));
parser.add_handler(Box::new(Counter { count: 0 }));
parser.parse()?;
Stream Transform
Transform XML with XPath-based element selection:
use fastxml::transform::StreamTransformer;
let xml = r#"<root><item id="1">A</item><item id="2">B</item></root>"#;
// Modify elements (supports multiple handlers)
let result = StreamTransformer::new(xml)
.on("//item[@id='2']", |node| node.set_attribute("modified", "true"))
.run()?
.to_string()?;
// Extract data (single XPath)
let ids: Vec<String> = StreamTransformer::new(xml)
.collect("//item", |node| node.get_attribute("id").unwrap_or_default())?;
// Extract data from multiple XPaths in a single pass
let (ids, contents): (Vec<String>, Vec<String>) = StreamTransformer::new(xml)
.collect_multi((
("//item", |node| node.get_attribute("id").unwrap_or_default()),
("//item", |node| node.get_content().unwrap_or_default()),
))?;
// Iterate for side effects (no output transformation)
let mut ids = Vec::new();
StreamTransformer::new(xml)
.on("//item", |node| {
ids.push(node.get_attribute("id").unwrap_or_default());
})
.for_each()?;
Reader-based Transform (Large Files)
For large XML files, use StreamTransformerReader to avoid loading the entire file into memory. It reads from any BufRead source and writes results incrementally:
use fastxml::transform::StreamTransformerReader;
use std::io::{BufReader, BufWriter};
use std::fs::File;
let reader = BufReader::new(File::open("large_file.xml")?);
let mut output = BufWriter::new(File::create("output.xml")?);
// Transform and write to output
let count = StreamTransformerReader::new(reader)
.on("//item[@id='2']", |node| node.set_attribute("modified", "true"))
.run_to_writer(&mut output)?;
println!("Transformed {} elements", count);
// Or iterate for side effects only (no output)
let reader = BufReader::new(File::open("large_file.xml")?);
let mut ids = Vec::new();
StreamTransformerReader::new(reader)
.on("//item", |node| {
ids.push(node.get_attribute("id").unwrap_or_default());
})
.for_each()?;
Auto-detect Namespaces
Extract namespace declarations from the root element without DOM parsing:
let xml = r#"<root xmlns:gml="http://www.opengis.net/gml"><gml:point/></root>"#;
StreamTransformer::new(xml)
.with_root_namespaces()? // Auto-registers namespaces from root element
.on("//gml:point", |node| node.set_attribute("found", "true"))
.run()?;
Namespace URI Matching
Match elements by namespace URI instead of prefix (useful when different prefixes map to the same URI):
// Matches both gml:feature and g:feature if they have the same namespace URI
StreamTransformer::new(xml)
.namespace("gml", "http://www.opengis.net/gml")
.on("//*[namespace-uri()='http://www.opengis.net/gml'][local-name()='feature']", |node| {
// Matches any prefix that maps to this URI
})
.run()?;
Parent Context Access
Access ancestor elements' information during streaming transformation:
StreamTransformer::new(xml)
.on_with_context("//item", |node, ctx| {
// Get parent element info
if let Some(parent) = ctx.parent() {
node.set_attribute("parent_name", &parent.name);
}
// Get path-based ID (e.g., "root/items/item[2]")
let path = ctx.path_id();
node.set_attribute("path", &format!("{}/item[{}]", path, ctx.position()));
})
.run()?;
XPath Streamability Check
Check if an XPath can be processed in a single streaming pass:
use fastxml::transform::{is_streamable, analyze_xpath_str, XPathAnalysis};
// Quick check
if is_streamable("//item[@id='1']") {
println!("Single-pass streaming OK");
}
// Detailed analysis
match analyze_xpath_str("//item[last()]")? {
XPathAnalysis::Streamable(_) => println!("Streamable"),
XPathAnalysis::NotStreamable(reason) => {
println!("Not streamable: {}", reason);
// Output: "Not streamable: uses last() function which requires knowing total count"
}
}
Fallback Control
By default, non-streamable XPath expressions return an error. Enable fallback for two-pass processing:
// Default: error on non-streamable XPath
let result = StreamTransformer::new(xml)
.on("//item[last()]", |_| {})
.run();
// => Err(NotStreamable { ... })
// Enable fallback (loads entire document into memory)
let result = StreamTransformer::new(xml)
.allow_fallback()
.on("//item[last()]", |_| {})
.run()?;
Async Schema Resolution
Parse XSD schemas with async import/include resolution (requires tokio feature):
use fastxml::schema::{
AsyncDefaultFetcher,
parse_xsd_with_imports_async,
};
#[tokio::main]
async fn main() -> fastxml::error::Result<()> {
let xsd_content = std::fs::read("schema.xsd")?;
// Create async fetcher
let fetcher = AsyncDefaultFetcher::new()?;
// Parse schema with async import resolution
let schema = parse_xsd_with_imports_async(
&xsd_content,
"http://example.com/schema.xsd",
&fetcher,
).await?;
println!("Parsed {} types", schema.types.len());
Ok(())
}
The async resolver:
- Fetches imported schemas asynchronously via HTTP
- Resolves nested imports (A → B → C)
- Detects circular dependencies
See examples/async_schema_resolution.rs for more examples.
Schema Validation
DOM Validation
use fastxml::{parse, validate_document_by_schema};
let doc = parse(std::fs::read("document.xml")?.as_slice())?;
let errors = validate_document_by_schema(&doc, "schema.xsd".to_string())?;
if errors.is_empty() {
println!("Valid!");
}
Streaming Validation
Validate during parsing with minimal memory:
use fastxml::schema::StreamValidator;
use std::sync::Arc;
let schema = Arc::new(fastxml::schema::parse_xsd(&std::fs::read("schema.xsd")?)?);
let reader = std::io::BufReader::new(file);
let errors = StreamValidator::new(schema)
.with_max_errors(100)
.validate(reader)?;
Auto-detect Schema
Fetch schemas from xsi:schemaLocation automatically (requires ureq feature):
use fastxml::{parse, validate_with_schema_location};
let doc = parse(xml_bytes)?;
let errors = validate_with_schema_location(&doc)?;
For streaming:
use fastxml::streaming_validate_with_schema_location;
let errors = streaming_validate_with_schema_location(reader)?;
Async Validation
Validate with async schema fetching (requires tokio feature):
use fastxml::{parse, validate_with_schema_location_async};
#[tokio::main]
async fn main() -> fastxml::error::Result<()> {
let doc = parse(xml_bytes)?;
let errors = validate_with_schema_location_async(&doc).await?;
Ok(())
}
Or get the compiled schema for reuse:
use fastxml::get_schema_from_schema_location_async;
let schema = get_schema_from_schema_location_async(&xml_bytes).await?;
Validation Errors
use fastxml::ErrorLevel;
for error in &errors {
match error.level {
ErrorLevel::Warning => print!("[WARN] "),
ErrorLevel::Error => print!("[ERROR] "),
ErrorLevel::Fatal => print!("[FATAL] "),
}
if let Some(line) = error.line {
print!("line {}: ", line);
}
println!("{}", error.message);
}
XPath
Basic Usage
use fastxml::{parse, evaluate};
let doc = parse(xml)?;
let result = evaluate(&doc, "//item[@id='1']/text()")?;
With Namespaces
let xml = r#"
<core:CityModel xmlns:core="http://www.opengis.net/citygml/2.0"
xmlns:bldg="http://www.opengis.net/citygml/building/2.0">
<bldg:Building gml:id="bldg_001">
<bldg:measuredHeight>25.5</bldg:measuredHeight>
</bldg:Building>
</core:CityModel>"#;
let doc = parse(xml.as_bytes())?;
let buildings = evaluate(&doc, "//bldg:Building")?;
Supported Specifications
XPath 1.0
| Feature | Examples |
|---|---|
| Paths | /root/child, //element, //* |
| Predicates | [@id='1'], [position()=1], [name()='foo'] |
| Axes | ancestor::, following-sibling::, namespace:: |
| Operators | and, or, not(), =, !=, <, >, +, -, *, div, mod |
| Functions | count(), contains(), string(), number(), sum(), etc. |
| Namespaces | //ns:element, namespace::* |
| Variables | $var |
| Union | `//a |
XSD Schema
| Feature | Support |
|---|---|
| Element/attribute definitions | ✅ |
| Complex types (sequence/choice/all) | ✅ |
| Simple types (restriction/list/union) | ✅ |
| Type inheritance | ✅ |
| Facets | ✅ |
| Attribute/model groups | ✅ |
| import/include/redefine | ✅ |
| Built-in XSD and GML types | ✅ |
| Identity constraints (unique/key/keyref) | ✅ |
| Substitution groups | ✅ |
Not Supported
- XQuery, XSLT, XInclude
- DTD validation
- XML Signature/Encryption
- Catalog support
- Full entity expansion
Development
cargo test # Run tests
cargo test --features tokio # With async tests
cargo test --features compare-libxml # With libxml comparison
cargo bench # Benchmarks
Examples
# Async schema resolution
cargo run --example async_schema_resolution --features tokio
# Schema validation
cargo run --example schema_validation --features ureq
# Benchmark CLI
cargo run --release --example bench -- ./file.xml
cargo run --release --features ureq --example bench -- ./file.xml --validate
License
MIT OR Apache-2.0
Dependencies
~14–35MB
~440K SLoC