Skip to content

kalavt/marklas

 
 

Repository files navigation

Marklas

CI PyPI Python License

Bidirectional converter between Markdown and Atlassian Document Format (ADF).

한국어 · 日本語


Why Marklas?

Confluence and Jira store documents in ADF — a verbose JSON structure. Marklas converts it to readable Markdown and back:

Markdown ⇄ ADF

ADF-only features (panels, mentions, colored text, etc.) are preserved as HTML elements with adf attributes, so the full structure survives a roundtrip:

<aside adf="panel" params='{"panelType":"info"}'>

This is an info panel — readable as plain Markdown.

</aside>

User <span adf="mention" params='{"id":"abc123"}'>@John</span> approved this.

Pass plain=True to strip roundtrip metadata and get clean Markdown for LLM consumption.

Installation

pip install marklas

Usage

from marklas import to_adf, to_md

# Markdown → ADF
adf = to_adf("## Hello\n\nThis is **bold**.")

# ADF → Markdown (with roundtrip metadata)
md = to_md(adf_document)

# ADF → Markdown (clean, no metadata)
plain_md = to_md(adf_document, plain=True)

# Roundtrip
original_adf = fetch_confluence_page()
markdown = to_md(original_adf)          # edit in any Markdown editor
restored_adf = to_adf(markdown)         # push back — structure preserved

Advanced Usage

For pipelines that need to modify the AST between parsing and rendering, use Transformer:

from marklas import Transformer, parse_md, render_adf
from marklas.ast import CodeBlock, Expand, Extension, Media, Node

t = Transformer()

# Replace: return a Node to substitute the original
@t.register(Media)
def _(node: Media) -> Media | None:
    if node.type == "external":
        uploaded = upload_attachment(page_id, node.url)
        return Media(type="file", id=uploaded.media_id, collection=uploaded.collection)
    return None

# Splice: return a list[Node] to expand one node into many
@t.register(CodeBlock)
def _(node: CodeBlock) -> list[Node] | None:
    if node.language == "mermaid":
        return [
            Expand(title="mermaid source", content=[node]),
            Extension(
                extension_key="mermaid-macro",
                extension_type="com.example.mermaid",
                parameters={"code": "".join(c.text for c in node.content)},
            ),
        ]
    return None

doc = parse_md(markdown)
new_doc = t(doc)
adf = render_adf(new_doc)

A handler returns one of three values:

Return Effect
None Skip — pass to the next handler, or leave unchanged
Node Replace the original node
list[Node] Splice multiple nodes in place of the original

Multiple handlers can be registered for the same type; they run in registration order and the first non-None result wins. The tree is traversed bottom-up, and nodes returned by a handler are not revisited.

Function Description
parse_md(md) Markdown → AST
parse_adf(adf) ADF JSON → AST
render_md(doc) AST → Markdown
render_adf(doc) AST → ADF JSON
Transformer Registry of typed visitors for AST rewriting

Token Efficiency

Markdown is significantly more compact than ADF JSON — critical for LLM-based workflows where every token counts.

ADF JSON Markdown Markdown (plain)
Tokens 2,173,468 858,970 560,765
Reduction 2.5x 3.9x

Measured on 204 real Confluence pages (compact JSON) using GPT-4o tokenizer (tiktoken).

Documentation

Development

uv sync --extra dev
uv run pytest -v

About

Bidirectional converter between Markdown and Atlassian Document Format (ADF)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%