Skip to content
View whale-z's full-sized avatar

Block or report whale-z

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs

TypeScript 67,248 7,165 Updated Nov 7, 2025

UltraRAG 2.0: Less Code, Lower Barrier, Faster Deployment! MCP-based low-code RAG framework, enabling researchers to build complex pipelines to creative innovation.

Python 1,790 153 Updated Nov 7, 2025

MS-Agent: Lightweight Framework for Empowering Agents with Autonomous Exploration in Complex Task Scenarios

Python 3,553 406 Updated Nov 7, 2025

[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation

Python 1,141 108 Updated Nov 7, 2025

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website …

HTML 13,136 1,077 Updated Nov 7, 2025

Toolkit for linearizing PDFs for LLM datasets/training

Python 15,831 1,198 Updated Nov 6, 2025

Go ahead and axolotl questions

Python 10,744 1,184 Updated Nov 6, 2025

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

Python 48,256 3,990 Updated Nov 6, 2025

🦛 CHONK docs with Chonkie ✨ — The no-nonsense RAG library

Python 3,165 198 Updated Nov 5, 2025

⏰ Collaboratively track worldwide conference deadlines (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~

Rust 8,124 545 Updated Nov 5, 2025

Multilingual Document Layout Parsing in a Single Vision-Language Model

Python 5,607 563 Updated Oct 31, 2025

A lightweight LMM-based Document Parsing Model

Python 6,161 428 Updated Oct 25, 2025

Awesome Deep Research list! For more details, please refer to our survey paper -- A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications

352 25 Updated Oct 22, 2025

Python tool for converting files and office documents to Markdown.

Python 82,659 4,670 Updated Oct 20, 2025

An Open-Source Package for Information Retrieval

Python 166 20 Updated Oct 13, 2025

This is the code repo for our paper "Enhancing Knowledge Integration and Utilization of Large Language Models via Constructivist Cognition Modeling".

Python 109 11 Updated Oct 9, 2025

UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition

Python 424 33 Updated Sep 28, 2025

[ACL 2025] Towards Text-Image Interleaved Retrieval

Python 16 Updated Sep 3, 2025

TexTeller can convert image to latex formulas (image2latex, latex OCR) with higher accuracy and exhibits superior generalization ability, enabling it to cover most usage scenarios.

Python 630 67 Updated Aug 22, 2025

A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).

Python 2,942 163 Updated Jul 9, 2025

🆓免费的 ChatGPT 镜像网站列表,持续更新。List of free ChatGPT mirror sites, continuously updated.

Python 20,555 1,396 Updated Jun 23, 2025

DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding

Python 65 1 Updated Jun 10, 2025

The hub for EleutherAI's work on interpretability and learning dynamics

Jupyter Notebook 2,661 195 Updated Jun 9, 2025
156 6 Updated May 8, 2025

[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

Python 402 14 Updated Apr 25, 2025

Using GPT to parse PDF

Python 3,545 266 Updated Apr 17, 2025

A Comprehensive Toolkit for High-Quality PDF Content Extraction

Python 8,890 669 Updated Jan 3, 2025

ProxyExplainer for Graph Neural Networks

Python 15 2 Updated Oct 24, 2024

东北大学校园网关客户端

Go 176 37 Updated Oct 7, 2024

Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)

Python 167 17 Updated Oct 1, 2024
Next