A fork of MinerU (upstream 3.1.5) with enhanced content ordering to ensure chunk order matches PDF layout.
This modified version ensures content list chunk order is consistent with PDF layout order. The key improvement is that extracted content blocks are ordered by their physical position on the page (top-to-bottom, left-to-right), making it easier to reconstruct document structure for downstream tasks.
Based on upstream MinerU 3.1.5, which builds on 3.0.9's SEAL/CHART recognition, DOCX/PPTX/XLSX parsing and mineru-router multi-GPU routing with further Office document fidelity improvements and multi-process API refinements.
The extraction output is saved as *_content_list.json. Each item in the list has the following structure:
{
"type": "text", // Content type: "text", "image", "table", "chart", "seal", "code"
"text": "...", // Text content (for type="text")
"text_level": 1, // Heading level: 1=h1, 2=h2, 3=h3 (optional)
"bbox": [x0, y0, x1, y1], // Bounding box coordinates (normalized to 1000)
"page_idx": 0, // Page index (0-based)
"id": 1, // Sequential content ID consistent with pdf layout order
// Image-specific fields (for type="image"):
"img_path": "images/xxx.jpg",
"image_caption": [],
"image_footnote": []
}- Numeric IDs (1, 2, 3...): Main content blocks, ordered by layout position
- D-prefixed IDs ("D1", "D2"...): Discarded/auxiliary blocks (headers, footers, page numbers, etc.)
# Build and start API service
docker compose -f docker/compose.yml --profile api up -d
# Build and start Gradio UI
docker compose -f docker/compose.yml --profile gradio up -d
# Build and start OpenAI-compatible VLM server
docker compose -f docker/compose.yml --profile openai-server up -dmineru-router is a load-balancing layer that manages multiple mineru-api workers across GPUs:
# Auto-detect all GPUs, one worker per card
mineru-router --host 0.0.0.0 --port 8002 --local-gpus auto
# Specify GPUs
mineru-router --host 0.0.0.0 --port 8002 --local-gpus 0,1,2
# Aggregate existing mineru-api instances
mineru-router --host 0.0.0.0 --port 8002 \
--local-gpus none \
--upstream-url http://api1:8000 \
--upstream-url http://api2:8000| Service | Command | Default Port | Description |
|---|---|---|---|
| API Server | mineru-api |
8000 | FastAPI REST service for PDF parsing |
| OpenAI Server | mineru-openai-server |
30000 | vLLM OpenAI-compatible inference server |
| Router | mineru-router |
8002 | Multi-GPU load balancer (3.0.9 new) |
| Gradio UI | mineru-gradio |
7860 | Web UI for interactive use |
| Model Download | mineru-models-download |
- | Download required models |
Single GPU:
User -> mineru-api (GPU 0) -> vllm engine
Multi GPU (via router):
┌─ mineru-api (GPU 0) -> vllm engine
User -> mineru-router --├─ mineru-api (GPU 1) -> vllm engine
└─ mineru-api (GPU 2) -> vllm engine
| Variable | Default | Description |
|---|---|---|
MINERU_MODEL_SOURCE |
- | Set to local for local model files |
MINERU_TABLE_MERGE_ENABLE |
true |
Set false to disable cross-page table merging (important for layout tracking) |
MINERU_API_MAX_CONCURRENT_REQUESTS |
3 (Mac=1) |
Max concurrent requests per mineru-api instance |
MINERU_PROCESSING_WINDOW_SIZE |
64 |
Max pages processed per task |
- Single
mineru-api: controlled byMINERU_API_MAX_CONCURRENT_REQUESTS(default 3) mineru-openai-server: vLLM native batching, concurrency depends on GPU VRAM- For higher throughput: use
mineru-routerto scale across multiple GPUs
- Office document parsing: chart rendering via cached HTML / Excel-bytes fallback; DOCX/PPTX OMML→LaTeX with extended Unicode mapping; PPTX shape-type caching; DOCX broken-link sanitization
- Async PDF image loading and Windows process termination support
- API hardening: async model retrieval, configurable health-failure restart threshold, local API launch modes, timeout handling for result downloads
- VLM: chart image content extraction, embedded table HTML formatting
- Misumi fix:
make_page_to_content_listcontent-IDs now strictly trackdraw_bboxnumbering for IMAGE/TABLE/CHART/CODE composite blocks (previously misaligned for caption-below figures and silently dropped CHART blocks in vlm)
- SEAL recognition: Stamp/seal detection and content extraction
- CHART recognition: Separate chart type (previously grouped with images)
- DOCX/PPTX parsing: Direct Office document support
- mineru-router: Multi-GPU load balancing
- CONTENT_LIST_V2: Span-level structured output format
- VLM preload: Faster cold start
- vLLM v0.11.2: Updated inference engine
- Improved OCR: Dynamic batch sizing, better VRAM management