Stars
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
UltraRAG 2.0: Less Code, Lower Barrier, Faster Deployment! MCP-based low-code RAG framework, enabling researchers to build complex pipelines to creative innovation.
MS-Agent: Lightweight Framework for Empowering Agents with Autonomous Exploration in Complex Task Scenarios
[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website …
Toolkit for linearizing PDFs for LLM datasets/training
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
🦛 CHONK docs with Chonkie ✨ — The no-nonsense RAG library
⏰ Collaboratively track worldwide conference deadlines (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~
Multilingual Document Layout Parsing in a Single Vision-Language Model
A lightweight LMM-based Document Parsing Model
Awesome Deep Research list! For more details, please refer to our survey paper -- A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications
Python tool for converting files and office documents to Markdown.
An Open-Source Package for Information Retrieval
This is the code repo for our paper "Enhancing Knowledge Integration and Utilization of Large Language Models via Constructivist Cognition Modeling".
UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition
[ACL 2025] Towards Text-Image Interleaved Retrieval
TexTeller can convert image to latex formulas (image2latex, latex OCR) with higher accuracy and exhibits superior generalization ability, enabling it to cover most usage scenarios.
A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).
🆓免费的 ChatGPT 镜像网站列表,持续更新。List of free ChatGPT mirror sites, continuously updated.
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
The hub for EleutherAI's work on interpretability and learning dynamics
[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
A Comprehensive Toolkit for High-Quality PDF Content Extraction
ProxyExplainer for Graph Neural Networks
Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)