LlamaIndex’s cover photo
LlamaIndex

LlamaIndex

Technology, Information and Internet

San Francisco, California 285,335 followers

AI agents for document OCR + workflows

About us

LlamaIndex delivers the world's most accurate agentic document processing platform. We bring together industry-leading agentic OCR with a natural language workflow builder to power intelligent agents that read and extract over complex documents, adapt to business logic, and scale reliably to production. Our SDK is downloaded more than 25M+ every month and used by the fastest growing AI companies and the Fortune 50.

Website
https://www.llamaindex.ai/
Industry
Technology, Information and Internet
Company size
11-50 employees
Headquarters
San Francisco, California
Type
Public Company

Locations

Employees at LlamaIndex

Updates

  • LlamaIndex reposted this

    Claude Fable 5 thinks document parsing is beneath it It is absolutely crushing on all reasoning-intensive/long horizon benchmarks: SWE-Bench Pro, FrontierCode, GDPval, Runescape, etc. But for document understanding tasks, it is roughly equivalent with Gemini 3 Flash in performance, at roughly 10-15x the token cost. We benchmarked the model on ParseBench and compared it against all other frontier models. It is definitely up there compared to other frontier models, but falls far short of specialized OCR providers. What we found interesting is that Fable 5 is self-aware about this. When we ask the model what tasks it enjoys the last, it actively said that it dislikes tasks "where the request is fully specified and the answer is fully known" - implying part of it being bad is due to laziness and lack of willingness to actually solve the task at hand. For a full list of results across different frontier models, check out ParseBench! https://www.parsebench.ai/

    • No alternative text description for this image
  • LlamaIndex reposted this

    Claude Fable 5 thinks document parsing is beneath it It is absolutely crushing on all reasoning-intensive/long horizon benchmarks: SWE-Bench Pro, FrontierCode, GDPval, Runescape, etc. But for document understanding tasks, it is roughly equivalent with Gemini 3 Flash in performance, at roughly 10-15x the token cost. We benchmarked the model on ParseBench and compared it against all other frontier models. It is definitely up there compared to other frontier models, but falls far short of specialized OCR providers. What we found interesting is that Fable 5 is self-aware about this. When we ask the model what tasks it enjoys the last, it actively said that it dislikes tasks "where the request is fully specified and the answer is fully known" - implying part of it being bad is due to laziness and lack of willingness to actually solve the task at hand. For a full list of results across different frontier models, check out ParseBench! https://www.parsebench.ai/

    • No alternative text description for this image
  • View organization page for LlamaIndex

    285,335 followers

    Day 0 Anthropic Fable 5 in ParseBench: We tested the model's advancements when it comes to document understanding. The model clearly peaks when it comes to adherence to the original text: 📃 Content faithfulness: 90.02% vs 86.19% (Gemini 3 Flash) and 86.81% (GPT-5.5) 🔢 Semantic formatting: 72.62% vs 58.35% and 60.12%, a 12+ point lead These are two of the most important metrics for SOTA document understanding: does the output preserve what the document actually says, and does it preserve formatting that carries meaning? But ... it's not a clean sweep. There continues to be a lot of alpha in unlocking document understanding for frontier models. Full results below 👇

    • No alternative text description for this image
  • Parsing a document accurately is one thing. Proving where every value came from is another. When a compliance team reviews an AI extraction, or an auditor needs to sign off on a figure pulled from a financial filing, "it came from this document" isn't enough. They need to see exactly where. The specific cell in the table, the exact line on the page, the precise word the agent used. Most parsers can get you to a paragraph or a table block. That's where the trail ends. Today we're shipping Granular Bounding Boxes in LlamaParse — word, line, and cell level coordinates for every value in your document. The result is a complete, verifiable trail from every extracted value back to its exact source in the document. Built for audit workflows, compliance review, and any pipeline where verification isn't optional. Read the full announcement → https://lnkd.in/gfYWpzqy

  • LlamaIndex reposted this

    The Agent Open 🎾🏓 Everyone loves pickleball. We’re hosting a massive pickleball tournament during the AI Engineer World Fair. I’m so excited to see this event come together. This is an AWESOME collaboration between 7 companies: - Braintrust - Browserbase - Cursor - Modal - Parallel Web Systems - turbopuffer - us @ LlamaIndex Come find out which AI Engineer, founder, tech influencer, VC, or anyone in between is the most cracked. Come participate in our tournaments, or come hangout for drinks, conversations, and casual games! Sign up here: https://lnkd.in/ggMwic6d

    • No alternative text description for this image
  • View organization page for LlamaIndex

    285,335 followers

    The Agent Open: AI's Pickleball Tournament 🏓 Come put your code and backhand to the test and embrace the full Open experience. Custom built out courts. Stadium seating. Exhibition matches by AI leaders. Fresh agent merch. Every infra startup you love, all in one place. Brought to you in collaboration with:   Braintrust, Browserbase, Cursor, Modal, Parallel Web Systems, turbopuffer Where were you during the first Agent Open? Come make history. SF Edition. 👇 https://lnkd.in/gMmpg8QH

    • No alternative text description for this image
  • LlamaIndex reposted this

    View organization page for Render

    18,070 followers

    Gauthami P. is coming to Render localhost. Document pipelines tend to work fine right up until a single large PDF blocks your server and takes down unrelated work. Gauthami, Head of Product Marketing at LlamaIndex, knows this failure mode well. LlamaIndex is the most accurate agentic document OCR platform, and she's seen what happens when processing pipelines aren't built to be isolated and retryable. On June 18 in San Francisco, she'll walk through a reference architecture that pairs LlamaParse with Render Workflows for resilient, scalable document processing and close with a live demo. Seats are limited! Save your spot here: https://lnkd.in/g8vC_zcn #DocumentProcessing #LlamaParse #AI #DeveloperInfrastructure

    • No alternative text description for this image
  • LlamaIndex reposted this

    We're presenting ParseBench at CVPR 2026! ParseBench is the most comprehensive document understanding benchmark for VLMs. ✅ It contains 2k pages of real-world enterprise documents ✅ It has comprehensive evaluation metrics around tables, charts, visual grounding, semantic formatting, and content faithfulness The core goal is measuring whether models can semantically interpret a document in the right way, without having models overfit to our precise benchmark. Parsing 100% of PDFs to 100% accuracy is the final boss for document OCR. In general, the latest frontier models have been tuned for coding, math, and scientific reasoning as opposed to precise visual understanding; hope more benchmarks that these will encourage overall progress towards solving this problem! Poster is below. If you want to learn more come check out our site or 30-page ArXiv paper: ParseBench: https://www.parsebench.ai/ ArXiv: https://lnkd.in/gpp3Jgj9

    • No alternative text description for this image
  • Most AI pipelines are only as good as the data we provide them with, and that usually means PDFs or other unstructured documents. Contracts, invoices, reports... All have special layout, language, and context mixed together, and getting reliable structured data out of them is still one of the hardest unsolved problems in enterprise AI. Parse-Flow is an open-source project we built to tackle this head-on. It puts four document processing primitives at the center of a visual workflow designer: 📄 Parse — clean markdown and text from raw documents 🔍️ Classify — assign documents to user-defined categories ✂️ Split — segment documents into typed chunks 🪏 Extract — pull structured JSON against a schema You drag steps onto a canvas, drop in a document, and watch events stream back as the pipeline runs. Under the hood it's powered by a LlamaAgents workflow that walks your flow one step at a time, making every transition observable and every failure a first-class value. 📚️ Full write-up on the architecture here: https://lnkd.in/g8g58Wxw 👩💻 Source code: https://lnkd.in/gtMT9Y-D

    • No alternative text description for this image

Similar pages

Browse jobs

Funding