LingoEDU

Introduction

This repository contains the code of paper From Context to EDUs, which introduces the EDU-based Context Compressor, a novel explicit compression framework designed to preserve both global structure and fine-grained details. Empirical results demonstrate that our method achieves state-of-the-art structural prediction accuracy and significantly outperforms frontier LLMs while reducing costs. Furthermore, our structure-aware compression substantially enhances performance across downstream tasks ranging from long-context tasks to complex Deep Search scenarios.

Performance

We evaluated our method on StructBench along with frontier LLMs and commercial parsing APIs. Our method achieves SOTA structural accuracy with significantly lower costs.

Method	Type	TED (Structure) ↓	DLA (Accuracy) ↑	Cost ($/doc) ↓	Latency (s) ↓
GPT-4o	General LLM*	8.53	36.29%	0.0210	-
GPT-4.1		9.14	37.90%	0.0168	-
OpenAI o3		8.01	35.48%	0.0168	-
OpenAI o4-mini		8.45	36.29%	0.0092	-
Claude-3.7-Sonnet		9.98	35.48%	0.0286	-
Claude-4		7.98	41.53%	0.0286	-
Gemini-2.5-flash		8.12	33.74%	0.0040	-
Gemini-2.5-pro		8.15	35.89%	0.0162	-
DeepSeek-V3		9.12	34.68%	0.0012	-
DeepSeek-R1		8.44	35.08%	0.0046	-
Qwen3-32B		8.55	34.01%	0.0012	10.17^†
Qwen3-235B		9.81	27.02%	0.0012	-
Jina-Reader	Parser API	17.04	-	0.0004	-
Firecrawl	Parser API	16.81	-	0.0007	-
Our Method (LingoEDU)	Specialized	5.67	46.77%	0.0007	1.20^†

Experiments

To address whether structure-aware compression tangibly enhance performance for downstream tasks, we conducted experiments on several long-context task benchmarks, including LongBench, HLE and Browse-Comp-ZH, across senarios of standard long-context benchmarks and complex Deep Search pipelines. Our method achieved SOTA in all these benchmarks. See the details of implemention and evaluation in directory experiments/.

General Long-Context Understanding

Task Type	Dataset	Glyph	Gemini-2.5-Pro				GPT-4.1
Task Type	Dataset	Glyph	Standard	Self-Sum	Ours (LingoEDU)	Δ	Standard	Self-Sum	Ours (LingoEDU)	Δ
Multi-Doc QA	HotpotQA	66.42	35.20	37.78	40.46	+14.94%	65.83	67.89	70.11	+6.50%
	2WikiMQA	72.98	38.10	39.90	40.91	+7.38%	72.98	74.39	74.68	+2.33%
	Musique	-	28.55	30.77	31.22	+9.35%	51.90	53.48	54.86	+5.70%
	DuReader	-	7.15	7.79	8.12	+7.69%	21.80	23.51	25.34	+16.24%
Summarization	GovReport	25.53	4.10	4.34	4.25	+2.44%	29.97	30.98	31.56	+2.94%
	QMSum	19.78	15.80	16.53	16.17	+2.34%	22.84	22.53	23.30	+0.61%
	MultiNews	-	4.05	4.44	4.85	+19.75%	20.85	22.06	23.50	+5.80%
	VCSum	-	5.80	6.17	6.36	+9.66%	12.50	13.71	14.62	+8.96%
Few-shot	TREC	82.62	46.50	49.00	57.50	+23.66%	77.00	80.50	80.00	+3.90%
	TriviaQA	88.54	59.85	62.31	63.25	+1.25%	90.07	93.69	93.76	+4.10%
	SAMSum	-	20.45	21.89	23.80	+11.39%	39.20	40.79	41.68	+6.33%
	LSHT	-	26.10	29.50	35.48	+3.45%	48.60	50.50	52.50	+8.02%

Deep Search

Model Backbone	HLE (Academic Reasoning)				BrowseComp-ZH (Noisy Web)
Model Backbone	Base	Self-Sum	Ours (LingoEDU)	Δ	Base	Self-Sum	Ours (LingoEDU)	Δ
DeepSeek-R1	9.0	9.5	13.6	+51.11%	18.8	19.4	20.4	+8.51%
Qwen3-235B-Thinking	14.2	14.7	15.5	+9.15%	8.5	9.0	12.8	+50.59%
DeepSeek-V3.1	14.5	14.8	15.6	+7.59%	29.2	29.7	38.7	+32.53%
Closed-Source Models
GPT-5	25.0	25.9	27.1	+8.40%	29.0	29.8	31.8	+9.66%
Claude Opus 4.1	14.0	14.8	15.5	+10.71%	20.8	21.5	23.2	+11.54%
Gemini 3 Pro	26.1	26.7	30.1	+15.33%	47.5	48.0	49.0	+3.16%

Usage

Prepare

Preprocess the input article into sentences, with a data structure below:

{
  "type": "string",
  "infos": [
    {
      "txt": "string",      // text of sentence part
      "position": {},       // position dict, can be empty
      "tags": [],           // tags list, can be empty
      "label": ""           // label string, can be empty
    }
  ]
}

You may do this via:

Parse the document and extract text content using Beautiful Soup (for web pages), poppler-utils (for PDF files), or OCR tools. We recommend keeping paragraphs segmented at this step.

Split the text into sentences. A simple regex-based splitter is often sufficient; NLP toolkits such as spaCy and Stanza are also good options.

Example regex splitter:

import re

raw_sents = re.split(
    r'(?<=[;:,.!?；：，。！？…])\s*',
    text.strip()
)

sentences = [s for s in raw_sents if len(s.strip()) > 0]

💖 For web pages, we gladly present our wonderful tool WCD(Web Content Distill) which can convert your web pages as urls directly to inputs of the format above, open-sourced at: WCD.

Infer

pip install -e inference
python inference/infer.py --data_dir deeplang-ai/StructBench --inference_dir edu_output

Inference outputs will be generated under directory edu_output, each as a list of (level, start_sentence_index, end_sentence_index) tuples.

Evaluate

pip install -e evaluation
python inference/evaluate.py --data_dir deeplang-ai/StructBench --inference_dir edu_output

TED and DLA scores will be printed in the terminal.

Citation

If you find our work helpful, feel free to give us a cite.

@misc{zhou2025contextedusfaithfulstructured,
      title={From Context to EDUs: Faithful and Structured Context Compression via Elementary Discourse Unit Decomposition}, 
      author={Yiqing Zhou and Yu Lei and Shuzheng Si and Qingyan Sun and Wei Wang and Yifei Wu and Hao Wen and Gang Chen and Fanchao Qi and Maosong Sun},
      year={2025},
      eprint={2512.14244},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.14244}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
evaluation		evaluation
experiments		experiments
inference		inference
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LingoEDU

Introduction

Performance

Experiments

Usage

Prepare

Infer

Evaluate

Citation

About

Uh oh!

Releases

Packages

Languages

DeepLangAI/LingoEDU

Folders and files

Latest commit

History

Repository files navigation

LingoEDU

Introduction

Performance

Experiments

Usage

Prepare

Infer

Evaluate

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages