Skip to content

DeepLangAI/LingoEDU

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LingoEDU

🤗 Hugging Face 📜 Paper 🏁 Benchmark

Introduction

This repository contains the code of paper From Context to EDUs, which introduces the EDU-based Context Compressor, a novel explicit compression framework designed to preserve both global structure and fine-grained details. Empirical results demonstrate that our method achieves state-of-the-art structural prediction accuracy and significantly outperforms frontier LLMs while reducing costs. Furthermore, our structure-aware compression substantially enhances performance across downstream tasks ranging from long-context tasks to complex Deep Search scenarios.

Performance

We evaluated our method on StructBench along with frontier LLMs and commercial parsing APIs. Our method achieves SOTA structural accuracy with significantly lower costs.

Method Type TED (Structure) ↓ DLA (Accuracy) ↑ Cost ($/doc) ↓ Latency (s) ↓
GPT-4o General LLM* 8.53 36.29% 0.0210 -
GPT-4.1 9.14 37.90% 0.0168 -
OpenAI o3 8.01 35.48% 0.0168 -
OpenAI o4-mini 8.45 36.29% 0.0092 -
Claude-3.7-Sonnet 9.98 35.48% 0.0286 -
Claude-4 7.98 41.53% 0.0286 -
Gemini-2.5-flash 8.12 33.74% 0.0040 -
Gemini-2.5-pro 8.15 35.89% 0.0162 -
DeepSeek-V3 9.12 34.68% 0.0012 -
DeepSeek-R1 8.44 35.08% 0.0046 -
Qwen3-32B 8.55 34.01% 0.0012 10.17
Qwen3-235B 9.81 27.02% 0.0012 -
Jina-Reader Parser API 17.04 - 0.0004 -
Firecrawl 16.81 - 0.0007 -
Our Method (LingoEDU) Specialized 5.67 46.77% 0.0007 1.20

Experiments

To address whether structure-aware compression tangibly enhance performance for downstream tasks, we conducted experiments on several long-context task benchmarks, including LongBench, HLE and Browse-Comp-ZH, across senarios of standard long-context benchmarks and complex Deep Search pipelines. Our method achieved SOTA in all these benchmarks. See the details of implemention and evaluation in directory experiments/.

  • General Long-Context Understanding
Task Type Dataset Glyph Gemini-2.5-Pro GPT-4.1
Standard Self-Sum Ours (LingoEDU) Δ Standard Self-Sum Ours (LingoEDU) Δ
Multi-Doc QA HotpotQA 66.42 35.20 37.78 40.46 +14.94% 65.83 67.89 70.11 +6.50%
2WikiMQA 72.98 38.10 39.90 40.91 +7.38% 72.98 74.39 74.68 +2.33%
Musique - 28.55 30.77 31.22 +9.35% 51.90 53.48 54.86 +5.70%
DuReader - 7.15 7.79 8.12 +7.69% 21.80 23.51 25.34 +16.24%
Summarization GovReport 25.53 4.10 4.34 4.25 +2.44% 29.97 30.98 31.56 +2.94%
QMSum 19.78 15.80 16.53 16.17 +2.34% 22.84 22.53 23.30 +0.61%
MultiNews - 4.05 4.44 4.85 +19.75% 20.85 22.06 23.50 +5.80%
VCSum - 5.80 6.17 6.36 +9.66% 12.50 13.71 14.62 +8.96%
Few-shot TREC 82.62 46.50 49.00 57.50 +23.66% 77.00 80.50 80.00 +3.90%
TriviaQA 88.54 59.85 62.31 63.25 +1.25% 90.07 93.69 93.76 +4.10%
SAMSum - 20.45 21.89 23.80 +11.39% 39.20 40.79 41.68 +6.33%
LSHT - 26.10 29.50 35.48 +3.45% 48.60 50.50 52.50 +8.02%
  • Deep Search
Model Backbone HLE (Academic Reasoning) BrowseComp-ZH (Noisy Web)
Base Self-Sum Ours (LingoEDU) Δ Base Self-Sum Ours (LingoEDU) Δ
DeepSeek-R1 9.0 9.5 13.6 +51.11% 18.8 19.4 20.4 +8.51%
Qwen3-235B-Thinking 14.2 14.7 15.5 +9.15% 8.5 9.0 12.8 +50.59%
DeepSeek-V3.1 14.5 14.8 15.6 +7.59% 29.2 29.7 38.7 +32.53%
Closed-Source Models
GPT-5 25.0 25.9 27.1 +8.40% 29.0 29.8 31.8 +9.66%
Claude Opus 4.1 14.0 14.8 15.5 +10.71% 20.8 21.5 23.2 +11.54%
Gemini 3 Pro 26.1 26.7 30.1 +15.33% 47.5 48.0 49.0 +3.16%

Usage

Prepare

Preprocess the input article into sentences, with a data structure below:

{
  "type": "string",
  "infos": [
    {
      "txt": "string",      // text of sentence part
      "position": {},       // position dict, can be empty
      "tags": [],           // tags list, can be empty
      "label": ""           // label string, can be empty
    }
  ]
}

You may do this via:

  • Parse the document and extract text content using Beautiful Soup (for web pages), poppler-utils (for PDF files), or OCR tools. We recommend keeping paragraphs segmented at this step.

  • Split the text into sentences. A simple regex-based splitter is often sufficient; NLP toolkits such as spaCy and Stanza are also good options.

    Example regex splitter:

    import re
    
    raw_sents = re.split(
        r'(?<=[;:,.!?;:,。!?…])\s*',
        text.strip()
    )
    
    sentences = [s for s in raw_sents if len(s.strip()) > 0]

💖 For web pages, we gladly present our wonderful tool WCD(Web Content Distill) which can convert your web pages as urls directly to inputs of the format above, open-sourced at: WCD.

Infer

pip install -e inference
python inference/infer.py --data_dir deeplang-ai/StructBench --inference_dir edu_output

Inference outputs will be generated under directory edu_output, each as a list of (level, start_sentence_index, end_sentence_index) tuples.

Evaluate

pip install -e evaluation
python inference/evaluate.py --data_dir deeplang-ai/StructBench --inference_dir edu_output

TED and DLA scores will be printed in the terminal.

Citation

If you find our work helpful, feel free to give us a cite.

@misc{zhou2025contextedusfaithfulstructured,
      title={From Context to EDUs: Faithful and Structured Context Compression via Elementary Discourse Unit Decomposition}, 
      author={Yiqing Zhou and Yu Lei and Shuzheng Si and Qingyan Sun and Wei Wang and Yifei Wu and Hao Wen and Gang Chen and Fanchao Qi and Maosong Sun},
      year={2025},
      eprint={2512.14244},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.14244}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published