LlamaIndex

LlamaIndex · 2026-06-04T16:08:04.133Z

Most AI pipelines are only as good as the data we provide them with, and that usually means PDFs or other unstructured documents. Contracts, invoices, reports... All have special layout, language, and context mixed together, and getting reliable structured data out of them is still one of the hardest unsolved problems in enterprise AI. Parse-Flow is an open-source project we built to tackle this head-on. It puts four document processing primitives at the center of a visual workflow designer: 📄 Parse — clean markdown and text from raw documents 🔍️ Classify — assign documents to user-defined categories ✂️ Split — segment documents into typed chunks 🪏 Extract — pull structured JSON against a schema You drag steps onto a canvas, drop in a document, and watch events stream back as the pipeline runs. Under the hood it's powered by a LlamaAgents workflow that walks your flow one step at a time, making every transition observable and every failure a first-class value. 📚️ Full write-up on the architecture here: https://lnkd.in/g8g58Wxw 👩💻 Source code: https://lnkd.in/gtMT9Y-D

Technology, Information and Internet

San Francisco, California 285,335 followers

AI agents for document OCR + workflows

See jobs Follow

View all 103 employees

About us

LlamaIndex delivers the world's most accurate agentic document processing platform. We bring together industry-leading agentic OCR with a natural language workflow builder to power intelligent agents that read and extract over complex documents, adapt to business logic, and scale reliably to production. Our SDK is downloaded more than 25M+ every month and used by the fastest growing AI companies and the Fortune 50.

Website: https://www.llamaindex.ai/
External link for LlamaIndex
Industry: Technology, Information and Internet
Company size: 11-50 employees
Headquarters: San Francisco, California
Type: Public Company

Locations

Primary

San Francisco, California, US

Get directions
447 Sutter St

San Francisco, California 94108, US

Get directions

Employees at LlamaIndex

See all employees

Updates

LlamaIndex reposted this
Jerry Liu
1d Edited
Report this post
Claude Fable 5 thinks document parsing is beneath it It is absolutely crushing on all reasoning-intensive/long horizon benchmarks: SWE-Bench Pro, FrontierCode, GDPval, Runescape, etc. But for document understanding tasks, it is roughly equivalent with Gemini 3 Flash in performance, at roughly 10-15x the token cost. We benchmarked the model on ParseBench and compared it against all other frontier models. It is definitely up there compared to other frontier models, but falls far short of specialized OCR providers. What we found interesting is that Fable 5 is self-aware about this. When we ask the model what tasks it enjoys the last, it actively said that it dislikes tasks "where the request is fully specified and the answer is fully known" - implying part of it being bad is due to laziness and lack of willingness to actually solve the task at hand. For a full list of results across different frontier models, check out ParseBench! https://www.parsebench.ai/
41 Comments

Like Comment Share
LlamaIndex reposted this
Jerry Liu
1d Edited
Report this post
Claude Fable 5 thinks document parsing is beneath it It is absolutely crushing on all reasoning-intensive/long horizon benchmarks: SWE-Bench Pro, FrontierCode, GDPval, Runescape, etc. But for document understanding tasks, it is roughly equivalent with Gemini 3 Flash in performance, at roughly 10-15x the token cost. We benchmarked the model on ParseBench and compared it against all other frontier models. It is definitely up there compared to other frontier models, but falls far short of specialized OCR providers. What we found interesting is that Fable 5 is self-aware about this. When we ask the model what tasks it enjoys the last, it actively said that it dislikes tasks "where the request is fully specified and the answer is fully known" - implying part of it being bad is due to laziness and lack of willingness to actually solve the task at hand. For a full list of results across different frontier models, check out ParseBench! https://www.parsebench.ai/
41 Comments

Like Comment Share
LlamaIndex reposted this
Jerry Liu
1d
Report this post
LiteParse, our open-source/Rust-based doc parser, runs so quickly that Claude Fable 5 doesn't think it's real 🔥 It is the fastest document parsing solution on the planet and a great choice for your AI document workloads. Check it out: https://lnkd.in/gSTesBxD (creds Logan Markewich for the screenshot)
8 Comments

Like Comment Share
LlamaIndex

285,335 followers
1d Edited
Report this post
Day 0 Anthropic Fable 5 in ParseBench: We tested the model's advancements when it comes to document understanding. The model clearly peaks when it comes to adherence to the original text: 📃 Content faithfulness: 90.02% vs 86.19% (Gemini 3 Flash) and 86.81% (GPT-5.5) 🔢 Semantic formatting: 72.62% vs 58.35% and 60.12%, a 12+ point lead These are two of the most important metrics for SOTA document understanding: does the output preserve what the document actually says, and does it preserve formatting that carries meaning? But ... it's not a clean sweep. There continues to be a lot of alpha in unlocking document understanding for frontier models. Full results below 👇
2 Comments

Like Comment Share
LlamaIndex

285,335 followers
2d
Report this post
Parsing a document accurately is one thing. Proving where every value came from is another. When a compliance team reviews an AI extraction, or an auditor needs to sign off on a figure pulled from a financial filing, "it came from this document" isn't enough. They need to see exactly where. The specific cell in the table, the exact line on the page, the precise word the agent used. Most parsers can get you to a paragraph or a table block. That's where the trail ends. Today we're shipping Granular Bounding Boxes in LlamaParse — word, line, and cell level coordinates for every value in your document. The result is a complete, verifiable trail from every extracted value back to its exact source in the document. Built for audit workflows, compliance review, and any pipeline where verification isn't optional. Read the full announcement → https://lnkd.in/gfYWpzqy

2 Comments

Like Comment Share
LlamaIndex reposted this
Jerry Liu
3d
Report this post
The Agent Open 🎾🏓 Everyone loves pickleball. We’re hosting a massive pickleball tournament during the AI Engineer World Fair. I’m so excited to see this event come together. This is an AWESOME collaboration between 7 companies: - Braintrust - Browserbase - Cursor - Modal - Parallel Web Systems - turbopuffer - us @ LlamaIndex Come find out which AI Engineer, founder, tech influencer, VC, or anyone in between is the most cracked. Come participate in our tournaments, or come hangout for drinks, conversations, and casual games! Sign up here: https://lnkd.in/ggMwic6d
13 Comments

Like Comment Share
LlamaIndex

285,335 followers
3d Edited
Report this post
The Agent Open: AI's Pickleball Tournament 🏓 Come put your code and backhand to the test and embrace the full Open experience. Custom built out courts. Stadium seating. Exhibition matches by AI leaders. Fresh agent merch. Every infra startup you love, all in one place. Brought to you in collaboration with: Braintrust, Browserbase, Cursor, Modal, Parallel Web Systems, turbopuffer Where were you during the first Agent Open? Come make history. SF Edition. 👇 https://lnkd.in/gMmpg8QH
4 Comments

Like Comment Share
LlamaIndex reposted this
Render

18,070 followers
5d
Report this post
Gauthami P. is coming to Render localhost. Document pipelines tend to work fine right up until a single large PDF blocks your server and takes down unrelated work. Gauthami, Head of Product Marketing at LlamaIndex, knows this failure mode well. LlamaIndex is the most accurate agentic document OCR platform, and she's seen what happens when processing pipelines aren't built to be isolated and retryable. On June 18 in San Francisco, she'll walk through a reference architecture that pairs LlamaParse with Render Workflows for resilient, scalable document processing and close with a live demo. Seats are limited! Save your spot here: https://lnkd.in/g8vC_zcn #DocumentProcessing #LlamaParse #AI #DeveloperInfrastructure
1 Comment

Like Comment Share
LlamaIndex reposted this
Jerry Liu
1w
Report this post
We're presenting ParseBench at CVPR 2026! ParseBench is the most comprehensive document understanding benchmark for VLMs. ✅ It contains 2k pages of real-world enterprise documents ✅ It has comprehensive evaluation metrics around tables, charts, visual grounding, semantic formatting, and content faithfulness The core goal is measuring whether models can semantically interpret a document in the right way, without having models overfit to our precise benchmark. Parsing 100% of PDFs to 100% accuracy is the final boss for document OCR. In general, the latest frontier models have been tuned for coding, math, and scientific reasoning as opposed to precise visual understanding; hope more benchmarks that these will encourage overall progress towards solving this problem! Poster is below. If you want to learn more come check out our site or 30-page ArXiv paper: ParseBench: https://www.parsebench.ai/ ArXiv: https://lnkd.in/gpp3Jgj9
8 Comments

Like Comment Share
LlamaIndex

285,335 followers
1w
Report this post
Most AI pipelines are only as good as the data we provide them with, and that usually means PDFs or other unstructured documents. Contracts, invoices, reports... All have special layout, language, and context mixed together, and getting reliable structured data out of them is still one of the hardest unsolved problems in enterprise AI. Parse-Flow is an open-source project we built to tackle this head-on. It puts four document processing primitives at the center of a visual workflow designer: 📄 Parse — clean markdown and text from raw documents 🔍️ Classify — assign documents to user-defined categories ✂️ Split — segment documents into typed chunks 🪏 Extract — pull structured JSON against a schema You drag steps onto a canvas, drop in a document, and watch events stream back as the pipeline runs. Under the hood it's powered by a LlamaAgents workflow that walks your flow one step at a time, making every transition observable and every failure a first-class value. 📚️ Full write-up on the architecture here: https://lnkd.in/g8g58Wxw 👩💻 Source code: https://lnkd.in/gtMT9Y-D
1 Comment

Like Comment Share

Browse jobs

Funding

LlamaIndex 4 total rounds

Last Round

Series unknown Jun 1, 2025

Investors

KPMG ventures Databricks Ventures

See more info on crunchbase

LlamaIndex

Technology, Information and Internet

San Francisco, California 285,335 followers

AI agents for document OCR + workflows

About us

Locations

Employees at LlamaIndex

Jerry Chen

Donald Tucker

Dave Zilberman

Antonio Jose Jimeno Yepes

Updates

Join now to see what you are missing

Similar pages

LangChain

Hugging Face

Ollama

CrewAI

Perplexity

Anthropic

Mistral

Qdrant

n8n

DeepLearning.AI

Browse jobs

Engineer jobs

Scientist jobs

Machine Learning Engineer jobs

Software Engineer jobs

Developer jobs

Analyst jobs

Senior Software Engineer jobs

Python Developer jobs

Intern jobs

Full Stack Engineer jobs

Solutions Engineer jobs

Associate jobs

Specialist jobs

Director jobs

Product Manager jobs

Frontend Developer jobs

Manager jobs

Researcher jobs

Junior Developer jobs

Data Engineer jobs

Funding