🤗 Hugging Face 📜 Paper 🏁 Benchmark
This repository contains the code of paper From Context to EDUs, which introduces the EDU-based Context Compressor, a novel explicit compression framework designed to preserve both global structure and fine-grained details. Empirical results demonstrate that our method achieves state-of-the-art structural prediction accuracy and significantly outperforms frontier LLMs while reducing costs. Furthermore, our structure-aware compression substantially enhances performance across downstream tasks ranging from long-context tasks to complex Deep Search scenarios.
We evaluated our method on StructBench along with frontier LLMs and commercial parsing APIs. Our method achieves SOTA structural accuracy with significantly lower costs.
| Method | Type | TED (Structure) ↓ | DLA (Accuracy) ↑ | Cost ($/doc) ↓ | Latency (s) ↓ |
|---|---|---|---|---|---|
| GPT-4o | General LLM* | 8.53 | 36.29% | 0.0210 | - |
| GPT-4.1 | 9.14 | 37.90% | 0.0168 | - | |
| OpenAI o3 | 8.01 | 35.48% | 0.0168 | - | |
| OpenAI o4-mini | 8.45 | 36.29% | 0.0092 | - | |
| Claude-3.7-Sonnet | 9.98 | 35.48% | 0.0286 | - | |
| Claude-4 | 7.98 | 41.53% | 0.0286 | - | |
| Gemini-2.5-flash | 8.12 | 33.74% | 0.0040 | - | |
| Gemini-2.5-pro | 8.15 | 35.89% | 0.0162 | - | |
| DeepSeek-V3 | 9.12 | 34.68% | 0.0012 | - | |
| DeepSeek-R1 | 8.44 | 35.08% | 0.0046 | - | |
| Qwen3-32B | 8.55 | 34.01% | 0.0012 | 10.17† | |
| Qwen3-235B | 9.81 | 27.02% | 0.0012 | - | |
| Jina-Reader | Parser API | 17.04 | - | 0.0004 | - |
| Firecrawl | 16.81 | - | 0.0007 | - | |
| Our Method (LingoEDU) | Specialized | 5.67 | 46.77% | 0.0007 | 1.20† |
To address whether structure-aware compression tangibly enhance performance for downstream tasks, we conducted experiments on several long-context task benchmarks, including LongBench, HLE and Browse-Comp-ZH, across senarios of standard long-context benchmarks and complex Deep Search pipelines. Our method achieved SOTA in all these benchmarks. See the details of implemention and evaluation in directory experiments/.
- General Long-Context Understanding
| Task Type | Dataset | Glyph | Gemini-2.5-Pro | GPT-4.1 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Standard | Self-Sum | Ours (LingoEDU) | Δ | Standard | Self-Sum | Ours (LingoEDU) | Δ | |||
| Multi-Doc QA | HotpotQA | 66.42 | 35.20 | 37.78 | 40.46 | +14.94% | 65.83 | 67.89 | 70.11 | +6.50% |
| 2WikiMQA | 72.98 | 38.10 | 39.90 | 40.91 | +7.38% | 72.98 | 74.39 | 74.68 | +2.33% | |
| Musique | - | 28.55 | 30.77 | 31.22 | +9.35% | 51.90 | 53.48 | 54.86 | +5.70% | |
| DuReader | - | 7.15 | 7.79 | 8.12 | +7.69% | 21.80 | 23.51 | 25.34 | +16.24% | |
| Summarization | GovReport | 25.53 | 4.10 | 4.34 | 4.25 | +2.44% | 29.97 | 30.98 | 31.56 | +2.94% |
| QMSum | 19.78 | 15.80 | 16.53 | 16.17 | +2.34% | 22.84 | 22.53 | 23.30 | +0.61% | |
| MultiNews | - | 4.05 | 4.44 | 4.85 | +19.75% | 20.85 | 22.06 | 23.50 | +5.80% | |
| VCSum | - | 5.80 | 6.17 | 6.36 | +9.66% | 12.50 | 13.71 | 14.62 | +8.96% | |
| Few-shot | TREC | 82.62 | 46.50 | 49.00 | 57.50 | +23.66% | 77.00 | 80.50 | 80.00 | +3.90% |
| TriviaQA | 88.54 | 59.85 | 62.31 | 63.25 | +1.25% | 90.07 | 93.69 | 93.76 | +4.10% | |
| SAMSum | - | 20.45 | 21.89 | 23.80 | +11.39% | 39.20 | 40.79 | 41.68 | +6.33% | |
| LSHT | - | 26.10 | 29.50 | 35.48 | +3.45% | 48.60 | 50.50 | 52.50 | +8.02% | |
- Deep Search
| Model Backbone | HLE (Academic Reasoning) | BrowseComp-ZH (Noisy Web) | ||||||
|---|---|---|---|---|---|---|---|---|
| Base | Self-Sum | Ours (LingoEDU) | Δ | Base | Self-Sum | Ours (LingoEDU) | Δ | |
| DeepSeek-R1 | 9.0 | 9.5 | 13.6 | +51.11% | 18.8 | 19.4 | 20.4 | +8.51% |
| Qwen3-235B-Thinking | 14.2 | 14.7 | 15.5 | +9.15% | 8.5 | 9.0 | 12.8 | +50.59% |
| DeepSeek-V3.1 | 14.5 | 14.8 | 15.6 | +7.59% | 29.2 | 29.7 | 38.7 | +32.53% |
| Closed-Source Models | ||||||||
| GPT-5 | 25.0 | 25.9 | 27.1 | +8.40% | 29.0 | 29.8 | 31.8 | +9.66% |
| Claude Opus 4.1 | 14.0 | 14.8 | 15.5 | +10.71% | 20.8 | 21.5 | 23.2 | +11.54% |
| Gemini 3 Pro | 26.1 | 26.7 | 30.1 | +15.33% | 47.5 | 48.0 | 49.0 | +3.16% |
Preprocess the input article into sentences, with a data structure below:
You may do this via:
-
Parse the document and extract text content using Beautiful Soup (for web pages), poppler-utils (for PDF files), or OCR tools. We recommend keeping paragraphs segmented at this step.
-
Split the text into sentences. A simple regex-based splitter is often sufficient; NLP toolkits such as spaCy and Stanza are also good options.
Example regex splitter:
import re raw_sents = re.split( r'(?<=[;:,.!?;:,。!?…])\s*', text.strip() ) sentences = [s for s in raw_sents if len(s.strip()) > 0]
💖 For web pages, we gladly present our wonderful tool WCD(Web Content Distill) which can convert your web pages as urls directly to inputs of the format above, open-sourced at: WCD.
pip install -e inference
python inference/infer.py --data_dir deeplang-ai/StructBench --inference_dir edu_outputInference outputs will be generated under directory edu_output, each as a list of (level, start_sentence_index, end_sentence_index) tuples.
pip install -e evaluation
python inference/evaluate.py --data_dir deeplang-ai/StructBench --inference_dir edu_outputTED and DLA scores will be printed in the terminal.
If you find our work helpful, feel free to give us a cite.
@misc{zhou2025contextedusfaithfulstructured,
title={From Context to EDUs: Faithful and Structured Context Compression via Elementary Discourse Unit Decomposition},
author={Yiqing Zhou and Yu Lei and Shuzheng Si and Qingyan Sun and Wei Wang and Yifei Wu and Hao Wen and Gang Chen and Fanchao Qi and Maosong Sun},
year={2025},
eprint={2512.14244},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.14244},
}
{ "type": "string", "infos": [ { "txt": "string", // text of sentence part "position": {}, // position dict, can be empty "tags": [], // tags list, can be empty "label": "" // label string, can be empty } ] }