Skip to content
View whale-z's full-sized avatar

Block or report whale-z

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
44 results for source starred repositories
Clear filter

Go ahead and axolotl questions

Python 10,736 1,183 Updated Nov 4, 2025

ProxyExplainer for Graph Neural Networks

Python 15 2 Updated Oct 24, 2024

[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation

Python 1,133 107 Updated Nov 5, 2025

TexTeller can convert image to latex formulas (image2latex, latex OCR) with higher accuracy and exhibits superior generalization ability, enabling it to cover most usage scenarios.

Python 628 67 Updated Aug 22, 2025
156 6 Updated May 8, 2025

DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis

395 23 Updated Feb 1, 2023

Toolkit for linearizing PDFs for LLM datasets/training

Python 15,803 1,195 Updated Nov 4, 2025

UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition

Python 423 33 Updated Sep 28, 2025

A lightweight LMM-based Document Parsing Model

Python 6,155 427 Updated Oct 25, 2025

A Comprehensive Toolkit for High-Quality PDF Content Extraction

Python 8,882 670 Updated Jan 3, 2025

Multilingual Document Layout Parsing in a Single Vision-Language Model

Python 5,583 562 Updated Oct 31, 2025

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website …

HTML 13,122 1,075 Updated Nov 5, 2025

Python tool for converting files and office documents to Markdown.

Python 82,572 4,657 Updated Oct 20, 2025

Using GPT to parse PDF

Python 3,545 266 Updated Apr 17, 2025

Awesome Deep Research list! For more details, please refer to our survey paper -- A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications

352 25 Updated Oct 22, 2025

RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs

Python 67,137 7,150 Updated Nov 5, 2025

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

Python 48,151 3,975 Updated Nov 4, 2025

🦛 CHONK docs with Chonkie ✨ — The no-nonsense RAG library

Python 3,133 197 Updated Nov 5, 2025

[ACL 2025] Towards Text-Image Interleaved Retrieval

Python 16 Updated Sep 3, 2025

DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding

Python 65 1 Updated Jun 10, 2025

UltraRAG 2.0: Less Code, Lower Barrier, Faster Deployment! MCP-based low-code RAG framework, enabling researchers to build complex pipelines to creative innovation.

Python 1,787 153 Updated Nov 5, 2025

A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).

Python 2,941 163 Updated Jul 9, 2025

[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

Python 403 14 Updated Apr 25, 2025

Compute FID scores with PyTorch.

Python 3,772 524 Updated Jul 3, 2024

⏰ Collaboratively track worldwide conference deadlines (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~

Rust 8,117 545 Updated Nov 5, 2025

东北大学校园网关客户端

Go 175 37 Updated Oct 7, 2024
Python 965 353 Updated Sep 25, 2023
Jupyter Notebook 1,201 548 Updated May 13, 2024

The hub for EleutherAI's work on interpretability and learning dynamics

Jupyter Notebook 2,660 195 Updated Jun 9, 2025

Masked Autoencoder meets GANs

Python 28 Updated Dec 27, 2023
Next