MinerU Modified (3.1.5)

A fork of MinerU (upstream 3.1.5) with enhanced content ordering to ensure chunk order matches PDF layout.

Introduction

This modified version ensures content list chunk order is consistent with PDF layout order. The key improvement is that extracted content blocks are ordered by their physical position on the page (top-to-bottom, left-to-right), making it easier to reconstruct document structure for downstream tasks.

Based on upstream MinerU 3.1.5, which builds on 3.0.9's SEAL/CHART recognition, DOCX/PPTX/XLSX parsing and mineru-router multi-GPU routing with further Office document fidelity improvements and multi-process API refinements.

Content List Data Structure

The extraction output is saved as *_content_list.json. Each item in the list has the following structure:

{
    "type": "text",           // Content type: "text", "image", "table", "chart", "seal", "code"
    "text": "...",            // Text content (for type="text")
    "text_level": 1,          // Heading level: 1=h1, 2=h2, 3=h3 (optional)
    "bbox": [x0, y0, x1, y1], // Bounding box coordinates (normalized to 1000)
    "page_idx": 0,            // Page index (0-based)
    "id": 1,                  // Sequential content ID consistent with pdf layout order

    // Image-specific fields (for type="image"):
    "img_path": "images/xxx.jpg",
    "image_caption": [],
    "image_footnote": []
}

ID Conventions

Numeric IDs (1, 2, 3...): Main content blocks, ordered by layout position
D-prefixed IDs ("D1", "D2"...): Discarded/auxiliary blocks (headers, footers, page numbers, etc.)

Deploy

Quick Start (Local Build)

# Build and start API service
docker compose -f docker/compose.yml --profile api up -d

# Build and start Gradio UI
docker compose -f docker/compose.yml --profile gradio up -d

# Build and start OpenAI-compatible VLM server
docker compose -f docker/compose.yml --profile openai-server up -d

Multi-GPU with mineru-router (3.0.9 New)

mineru-router is a load-balancing layer that manages multiple mineru-api workers across GPUs:

# Auto-detect all GPUs, one worker per card
mineru-router --host 0.0.0.0 --port 8002 --local-gpus auto

# Specify GPUs
mineru-router --host 0.0.0.0 --port 8002 --local-gpus 0,1,2

# Aggregate existing mineru-api instances
mineru-router --host 0.0.0.0 --port 8002 \
  --local-gpus none \
  --upstream-url http://api1:8000 \
  --upstream-url http://api2:8000

Available Services

Service	Command	Default Port	Description
API Server	`mineru-api`	8000	FastAPI REST service for PDF parsing
OpenAI Server	`mineru-openai-server`	30000	vLLM OpenAI-compatible inference server
Router	`mineru-router`	8002	Multi-GPU load balancer (3.0.9 new)
Gradio UI	`mineru-gradio`	7860	Web UI for interactive use
Model Download	`mineru-models-download`	-	Download required models

Architecture

Single GPU:
  User -> mineru-api (GPU 0) -> vllm engine

Multi GPU (via router):
                          ┌─ mineru-api (GPU 0) -> vllm engine
  User -> mineru-router --├─ mineru-api (GPU 1) -> vllm engine
                          └─ mineru-api (GPU 2) -> vllm engine

Configuration

Environment Variables

Variable	Default	Description
`MINERU_MODEL_SOURCE`	-	Set to `local` for local model files
`MINERU_TABLE_MERGE_ENABLE`	`true`	Set `false` to disable cross-page table merging (important for layout tracking)
`MINERU_API_MAX_CONCURRENT_REQUESTS`	`3` (Mac=`1`)	Max concurrent requests per `mineru-api` instance
`MINERU_PROCESSING_WINDOW_SIZE`	`64`	Max pages processed per task

Concurrency

Single mineru-api: controlled by MINERU_API_MAX_CONCURRENT_REQUESTS (default 3)
mineru-openai-server: vLLM native batching, concurrency depends on GPU VRAM
For higher throughput: use mineru-router to scale across multiple GPUs

What's New in 3.1.5 (vs 3.0.9)

Office document parsing: chart rendering via cached HTML / Excel-bytes fallback; DOCX/PPTX OMML→LaTeX with extended Unicode mapping; PPTX shape-type caching; DOCX broken-link sanitization
Async PDF image loading and Windows process termination support
API hardening: async model retrieval, configurable health-failure restart threshold, local API launch modes, timeout handling for result downloads
VLM: chart image content extraction, embedded table HTML formatting
Misumi fix: make_page_to_content_list content-IDs now strictly track draw_bbox numbering for IMAGE/TABLE/CHART/CODE composite blocks (previously misaligned for caption-below figures and silently dropped CHART blocks in vlm)

What's New in 3.0.9 (vs 2.7)

SEAL recognition: Stamp/seal detection and content extraction
CHART recognition: Separate chart type (previously grouped with images)
DOCX/PPTX parsing: Direct Office document support
mineru-router: Multi-GPU load balancing
CONTENT_LIST_V2: Span-level structured output format
VLM preload: Faster cold start
vLLM v0.11.2: Updated inference engine
Improved OCR: Dynamic batch sizing, better VRAM management

Name		Name	Last commit message	Last commit date
Latest commit History 5,302 Commits
.claude/skills/upgrade-mineru-fork		.claude/skills/upgrade-mineru-fork
.github		.github
demo		demo
docker		docker
docs		docs
mineru		mineru
projects		projects
scripts		scripts
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
FORK_VERSIONING.md		FORK_VERSIONING.md
LICENSE.md		LICENSE.md
MinerU_CLA.md		MinerU_CLA.md
README.md		README.md
README_zh-CN.md		README_zh-CN.md
SECURITY.md		SECURITY.md
fork-version.txt		fork-version.txt
mineru.template.json		mineru.template.json
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
update_version.py		update_version.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MinerU Modified (3.1.5)

Introduction

Content List Data Structure

ID Conventions

Deploy

Quick Start (Local Build)

Multi-GPU with mineru-router (3.0.9 New)

Available Services

Architecture

Configuration

Environment Variables

Concurrency

What's New in 3.1.5 (vs 3.0.9)

What's New in 3.0.9 (vs 2.7)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MinerU Modified (3.1.5)

Introduction

Content List Data Structure

ID Conventions

Deploy

Quick Start (Local Build)

Multi-GPU with mineru-router (3.0.9 New)

Available Services

Architecture

Configuration

Environment Variables

Concurrency

What's New in 3.1.5 (vs 3.0.9)

What's New in 3.0.9 (vs 2.7)

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages