A Rust library for parsing org-mode files.
Live Demo: https://tao3k.github.io/orgize/
To parse an org-mode string, call Org::parse and then project the owned
semantic AST with Org::document():
use orgize::{ast::ElementData, Org};
let org = Org::parse("* DONE Title :tag:");
let document = org.document();
assert_eq!(document.sections[0].level, 1);
assert_eq!(document.sections[0].raw_title, "Title ");
assert_eq!(document.sections[0].tags, ["tag"]);
assert!(document.sections[0].children.iter().all(|element| {
!matches!(element.data, ElementData::Unknown { .. })
}));Use ParseConfig::parse to specify a custom parse config. Low-level typed
syntax wrappers live under orgize::syntax_ast:
use orgize::{syntax_ast::Headline, Org, ParseConfig};
let config = ParseConfig {
// custom todo keywords
todo_keywords: (vec!["TASK".to_string()], vec![]),
..Default::default()
};
let org = config.parse("* TASK Title 1");
let hdl = org.first_node::<Headline>().unwrap();
assert_eq!(hdl.todo_keyword().unwrap(), "TASK");Org::document() returns ast::ParsedAst, an owned semantic tree with source
annotations and projection diagnostics. Use document.to_bare() when tests,
snapshots, or serialization do not need source ranges.
Semantic timestamps include parsed metadata for date/time start, range end,
repeater, and warning cookies, while retaining the original raw timestamp text.
Semantic links include owned path/target data, parsed description objects,
caption metadata, and image-link detection.
Radio links keep the lightweight plain-text projection by default. Set
radio_link_projection: RadioLinkProjection::Semantic in ParseConfig to run
the opt-in semantic pass that can link parsed object spans such as
<<<*Radio*>>> against *Radio*.
Semantic source/example blocks include parsed line-numbering metadata for
-n and +n switches, optional starting offsets, preserve-indentation
metadata for -i, code-reference cookies from the default (ref:name) format
or custom -l label formats, line-level source/value/normalized-value records,
typed -k/-r/-l switch metadata, and structured source block header
arguments while retaining the raw parameter text. Fixed-width areas use the
same semantic line record shape, and inline Babel source/call contexts now keep
nested bracket, brace, and parenthesis bodies balanced.
Semantic tables expose column alignment metadata from <l>, <c>, and <r>
property cookies while preserving the original row and cell contents.
Per-file TODO declarations from #+TODO:, #+SEQ_TODO:, and #+TYP_TODO:
are applied before parsing headlines, so custom TODO/DONE states are projected
into semantic headline metadata for that document.
Inlinetasks use ParseConfig::inlinetask_min_level, defaulting to Org's level
15 convention, and project as semantic elements with parsed title objects,
planning, properties, optional END markers, and body elements.
Quote punctuation is not modeled as a semantic object; it remains plain text,
while normal objects inside quote punctuation still project independently.
Preprocessing directives stay explicit in parser v2. #+INCLUDE: and
#+MACRO: remain normal keyword elements in the lossless tree, and the
semantic document also collects include directives and macro definitions into
document-level side tables. Macro calls are parsed as objects without expanding
their templates by default. Use document.macro_expansions() for opt-in macro
substitution side tables when an exporter or indexer wants expanded macro text.
Document-local link targets are collected into document.targets, covering
headlines, CUSTOM_ID and org-id ID properties, explicit targets, radio
targets, footnote definitions, and source/example block coderefs. Link
projection resolves these targets into LinkTarget::Internal while keeping the
original link.path(), and reports diagnostics for ambiguous or missing strict
internal links.
Parser v2 also collects exporter/indexer side tables without mutating the
lossless source tree: document.metadata, document.filetags,
document.export_settings, document.link_abbreviations, and
document.footnotes. Keyword values keep their raw source text and expose
parsed object values for metadata-style keywords such as #+TITLE: and
#+CAPTION:; ATTR_* affiliated keywords expose shell-like structured
attributes. Links without an explicit description can use target-derived
fallback objects through Link::description_or_default(), and id:ID::*search
paths retain their search suffix in Link::search.
Use document.link_protocol_records() to inspect built-in link families,
custom protocols, #+LINK abbreviations, executable shell:/elisp: links,
and inert org-protocol: calls without opening files or dispatching handlers.
Use document.column_summary_plans() to inspect non-mutating Column View
summary behavior for COLUMNS declarations, including common Org operators
such as +, :, X%, and est+.
Use document.project_for_export(&ExportProjectionOptions::default()) as the
opt-in semantic projection hook for exporter-oriented pruning and transformations
such as COMMENT/:ARCHIVE:/tag pruning, link abbreviation expansion, and
special-string conversion. Org::to_html(), Org::to_markdown(), and
Org::to_latex() keep their existing default output stable; the corresponding
*_with_options methods expose opt-in special-string and entity handling.
Use document.agenda_entries(&AgendaQuery::new(start, end)) when an indexer or
UI wants an Org Agenda-style semantic view over planning and plain active
timestamps. The projection derives scheduled, deadline, warning, overdue,
closed, active timestamp, repeated, and tag-filtered rows, including timestamp
range display days and start/end times, headline time-of-day ranges, scheduled
delay cookies, and CATEGORY keyword/property metadata without changing the
parsed document or exporter defaults. Use
document.agent_planning_snapshot(&AgentPlanningQuery::new(query)) when an
agent wants compact decision cards over those agenda rows. The snapshot is a
renderer-friendly projection of official Org agenda semantics; its PLANxxx
codes are output diagnostics, not Org source syntax.
Use AgendaWorkspaceBuilder with AgendaWorkspaceQuery when a caller already
has multiple parsed documents and wants built-in Agenda-style command plans for
daily agenda rows, TODO lists, tag/property matches, text search, and stuck
projects without letting orgize scan agenda files itself. Agenda view cards now
carry AgendaUrgencyScore ingredients for explainable ranking. Use
document.citation_export_plan() for Org Cite bibliography/processor/print
bibliography side tables, agent_capture_plan(&AgentCaptureRequest::new(...))
for non-mutating Agent capture previews over native Org entries,
publishing_project_plan() for explicit blog/site publishing graphs,
export_dependency_graph() for combined include/setupfile/bibliography/macro
and publishing-output dependency graphs, document.table_visualization_plans()
for non-executing Org Plot/radio-table metadata, and
document.attachment_inventory() for opt-in filesystem attachment evidence.
Use Org::syntax_document() when you need the lossless rowan-backed syntax tree:
use orgize::{rowan::ast::AstNode, syntax_ast::Headline, Org};
let org = Org::parse("* Title");
let syntax_doc = org.syntax_document();
let headline = syntax_doc.syntax().children().find_map(Headline::cast).unwrap();
assert_eq!(headline.title_raw(), "Title");Use org.traverse(&mut traversal) to walk through the syntax tree.
use orgize::{
export::{from_fn, Container, Event},
Org,
};
let mut hdl_count = 0;
let mut handler = from_fn(|event| {
if matches!(event, Event::Enter(Container::Headline(_))) {
hdl_count += 1;
}
});
Org::parse("* 1\n** 2\n*** 3\n****4").traverse(&mut handler);
assert_eq!(hdl_count, 3);Use org.replace_range(TextRange::new(start, end), "new_text") to modify the syntax tree:
use orgize::{syntax_ast::Headline, Org, TextRange};
let mut org = Org::parse("hello\n* world");
let hdl = org.first_node::<Headline>().unwrap();
org.replace_range(hdl.text_range(), "** WORLD!");
let hdl = org.first_node::<Headline>().unwrap();
assert_eq!(hdl.level(), 2);
org.replace_range(TextRange::up_to(hdl.start()), "");
assert_eq!(org.to_org(), "** WORLD!");Call the Org::to_html function to export org element tree to html:
use orgize::Org;
assert_eq!(
Org::parse("* title\n*section*").to_html(),
"<main><h1>title</h1><section><p><b>section</b></p></section></main>"
);Checkout examples/html-slugify.rs on how to customizing html export process.
Call the Org::to_markdown function to export the org element tree to
Markdown:
use orgize::Org;
assert_eq!(
Org::parse("* title\n*section*").to_markdown(),
"# title\n**section**\n"
);Call the Org::to_latex function to export the org element tree to LaTeX body
text:
use orgize::Org;
assert_eq!(
Org::parse("* title\n*section* and $a_b$").to_latex(),
"\\section{title}\n\\textbf{section} and $a_b$\n\n"
);The orgize binary exposes parser-backed lint, conservative fmt, and
read-only SDD status commands:
orgize lint notes.org
orgize lint --format text notes.org
orgize lint --json notes.org
orgize lint --priority-highest 0 --priority-default 5 --priority-lowest 9 notes.org
orgize fmt --check notes.org
orgize fmt notes.org docs/
orgize sdd status notes.orglint reports semantic projection diagnostics, document-local target
uniqueness issues such as duplicate ID/CUSTOM_ID targets, missing local
macro definitions, duplicate #+MACRO: definitions, malformed or duplicate
#+LINK: abbreviation definitions, invalid supported #+OPTIONS: values,
duplicate or conflicting per-file TODO keyword declarations, and missing or
non-file #+INCLUDE: paths when linting real files. Priority-cookie checks use
Org's default A..C profile unless callers pass
--priority-highest/--priority-default/--priority-lowest, which also supports
numeric profiles such as 0..9. By default, lint prints a
compact agent-facing repair report with location, source line, fix hint, and
contract; --format text keeps the stable line-oriented form, and --json
keeps structured machine output as an explicit mode. fmt starts with
source-safe whitespace normalization: it trims trailing spaces and tabs, aligns
contiguous Org tables outside blocks, normalizes final blank lines, and ensures
one final newline for non-empty documents. When paths are provided, fmt
writes files by default; with no path it reads stdin and writes stdout. Both
commands accept multiple file and directory paths; directory paths are expanded
recursively to .org files, and explicit file operands must be .org files.
Formatter behavior is covered by snapshot tests so future formatting expansions
review as explicit output diffs.
sdd status projects Org-native SDD headings from ordinary tags and property
drawers without writing files. It treats ID as the stable machine identity,
SDD_PARENT as a semantic Org id: link edge, and reports compact
architecture/audit cards for Agent design-review surfaces. SDD headings describe
system boundaries, capabilities, views, decisions, and audits; implementation
checklists belong in linked Org task or ExecPlan files.
-
chrono: adds the ability to convertTimestampintochrono::NaiveDateTime, disabled by default. -
indexmap: adds the ability to convertPropertyDrawerproperties intoIndexMap, disabled by default.
Parser v2 mounts rust-lang-project-harness from root build.rs and the
src/lib.rs cargo-test gate. The wasm package is a standalone
tao3k/orgize-wasm repository mounted at wasm/ as a git submodule; its own
wasm/build.rs keeps the same build-time harness policy inside that repository.
The build-time gates prevent filtered cargo test runs from bypassing blocking
project policy, while the test gate keeps compact agent advice visible during
normal local validation. All gates use the current standalone harness repository
instead of the retired monorepo-local xiuxian-testing crate. No rule pack or
rule severity is downgraded:
RUST-MOD-* and project layout findings stay blocking. AGENT-* info
findings remain visible as repair advice while this legacy crate burns them down
separately. New tests should still use explicit imports: RUST-MOD-R010
reports parent-scope glob imports.
The build-time gate ignores generated environment/data roots such as .devenv/
and .data/ so research checkouts stay outside Cargo, CI, and published
package boundaries.
Fresh checkouts that need the browser demo or npm package should initialize the submodule first:
git submodule update --init --recursive wasm
just wasm-buildParser v2 makes a breaking API boundary explicit:
orgize::astis the owned semantic AST. Its primary public types areDocument<A>,Section<A>,Element<A>,Object<A>,ParsedAnnotation, andDiagnostic.ast::ParsedAstisDocument<ParsedAnnotation>.ast::BareAstisDocument<()>.Org::document()returnsast::ParsedAst.Org::syntax_document()andorgize::syntax_ast::*expose the old typed wrappers around the lossless rowan syntax tree. Wrapper names that would collide with semantic AST types use aSyntaxprefix, for examplesyntax_ast::SyntaxDocumentandsyntax_ast::SyntaxLink.Document<A>::project_for_export(&ExportProjectionOptions)returns an exporter-oriented semantic projection without changing the parsed AST.Document<A>::agenda_entries(&AgendaQuery)returns an opt-in agenda projection over semantic planning and active timestamps without mutatingParsedAst.
Code that previously imported rowan-backed wrappers from orgize::ast::*
should import them from orgize::syntax_ast::* instead.
Org::syntax_document() and the syntax_ast module expose access to the internal syntax tree,
along with rowan low-level APIs. This can be useful for intricate tasks or for
HTML/export integrations that need byte-for-byte source preservation.
However, the structure of the internal syntax tree can change between different versions of the library.
Because of this, the result of element.syntax() doesn't follow semantic versioning,
which means updates might break your code if it relies on this method.
Semantic AST traversal is exposed through Rust-style APIs on Document<A>:
visit, visit_mut, fold, map_ann, and try_map_ann. The semantic layer is
projected from the rowan substrate; it does not replace rowan as the lossless
parser representation.
See docs/20_parser/20.01_parser_v2_release_readiness.org for the parser v2 closeout checklist, intentional gaps, and validation gate.