pdf table data parser

Marker

Marker is a OCR + LLM lib with propietary table transformer

some key features:

llm mode that augments marker with models like gemini flash
improved math, w/inline math
links and references
better tables and forms

LLM mode iterates on marker output for certain blocks. You can use gemini, or local models via ollama. More models coming soon.

Marker + llms is faster and hallucination-free vs using llms alone. Here marker

gemini flash beats gemini flash alone on a fintabnet benchmark.

![marker

gemini](https://pbs.twimg.com/media/GkKzzzbWYAERp56?format=jpg&name=small)

Table improvements are:

new table recognition model
table merging across pages
math and formatting inside tables
output in html, markdown, or json

Up next is:

More formats (docx, pptx, xlsx, etc)
Improved layout detection on scientific papers, engineering documents, newspapers
Structured data extraction

llamaindex / llamaparse

It is really good at the following:

✅ Broad file type support: Parsing a variety of unstructured file types (.pdf, .pptx, .docx, .xlsx, .html) with text, tables, visual elements, weird layouts, and more. ✅ Table recognition: Parsing embedded tables accurately into text and semi-structured representations. ✅ Multimodal parsing and chunking: Extracting visual elements (images/diagrams) into structured formats and return image chunks using the latest multimodal models. ✅ Custom parsing: Input custom prompt instructions to customize the output the way you want it.

Overall more easily integrated into LLAMAIndex if needed.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
llamaparse.py		llamaparse.py
marker.py		marker.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdf table data parser

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pdf table data parser

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages