ripx is a small, single-binary, high-performance, non‑validating XML query tool and library focused on streaming single‑pass processing of very large XML files (multi‑GB / TB). It emphasizes low memory use, simplicity, and predictable performance.
- Stream XML from any BufRead source; no DOM, no strict validation.
- Support simple, fast queries (e.g.
//item,/root/items/item) and callbacks for matched subtrees. - Minimal XML features initially: start/end elements, attributes, text, comments, CDATA.
- Robust best‑effort error handling: skip malformed regions and attempt to recover and continue.
- Minimal decoding in the parser - operate on bytes directly - not UTF8
- Streaming reader that emits events: StartElement, EndElement, Text, Comment, CData, Eof.
- Simple attribute parsing; parser now emits raw bytes for textual payloads and does not decode entities (consumers must decode).
- Path stack + small query language: Anywhere (
//name) and Absolute (/a/b/c) selectors. - Designed to be single‑pass and able to handle huge files.
IMPORTANT: This project manages tasks using the beans CLI. All contributors and automated agents are required to run beans prime and follow the directives it prints (create and update beans for work, include bean files in commits, and update statuses as work progresses).
Build and run tests with Rust/Cargo:
cargo build --release
cargo testExample CLI usage (when built):
# Example (hypothetical) invocation: query `large-file.xml` for an element named `item` with an attribute `id` equal to 12345 and stop when found.
ripx large-file.xml --path "//item" --attr-eq-id="12345" --short-circuitSee PLAN.md for the implementation plan. Main modules will include:
- src/lib.rs
- src/reader.rs (streaming reader)
- src/query.rs (selectors + path stack)
- src/main.rs (CLI / entry point for main binary)
- Parallel reading of input file (might require two-pass or the below indexing)
- Indexing to improve performance
GPLv3