SAVER: Selective Visual Evidence Routing for Multimodal NER/RE

The default model in this repository is SAVER-SIS (main model), with an optional SAVER-RL (RES) extension.

1. Method Overview

SAVER follows three core principles:

Use vision only when the current entity (MNER) or marked entity pair (MRE) is likely to be visually groundable.
When vision is activated, acquire only a small and complementary multi-image evidence set.
Use a unified scoring head to combine text and optional visual evidence.

The full pipeline has four stages:

Encoding: text encoder produces token/span representations; vision encoder first produces global image vectors.
CGG (Conformal Groundability Gate): unit-level routing decision for whether visual evidence is needed.
Evidence Constructor (SIS or RES) + Set Transformer: after activation, select up to K images and aggregate region evidence.
Energy-Inspired Joint Scoring: unified scoring for MNER (span/type) or MRE (relation).

2. Task Settings

MNER

Input: text + image set.
Output: entity spans and entity types.
SAVER performs CGG and evidence selection at candidate-span granularity.

MRE

Input: text with a marked head/tail entity pair + image set.
Output: relation label of the marked pair.
SAVER performs pair-level routing and evidence selection (pair gate is derived from entity gates by default).

3. Key Components

3.1 CGG (Conformal Groundability Gate)

Computes groundability score g(u) using global image vectors and text-image similarity statistics.
Uses threshold-based hard routing at inference: γ(u)=1[g(u)≥τ].
Chooses τ on a calibration split via a Clopper-Pearson upper-bound constraint under target risk α.

3.2 SIS (Submodular Image Selector, default)

When gate is active, selects at most K images from N candidates.
Objective balances relevance and coverage/diversity.
Uses greedy approximation (1-1/e) for efficient selection.

3.3 RES (Reinforced Evidence Selector, optional)

Formulates evidence acquisition as sequential decision making with a STOP action.
Uses cost-aware rewards with CGG-based action masking.
Reported as an extension in ablations; SIS remains the default model in main tables.

3.4 Energy-Inspired Joint Scoring

Uses standard cross-entropy training (energy notation is used for unified formulation).
Combines task score, text-vision consistency term, and gate sparsity term.

4. Experiments

4.1 Datasets

MRE: MNRE, MRE-MI
MNER: Twitter-2015, Twitter-2017, MNER-MI, MNER-MI-Plus

Core dataset scales used in the main text:

Dataset	Task	Train / Dev / Test	Avg. images
MNRE (v2)	RE	12,247 / 1,624 / 1,614	1.00
MRE-MI	RE	13,504 / 4,500 / 4,500	2.80
Twitter-2017	MNER	3,373 / 723 / 723	1.00
MNER-MI-Plus	MNER	10,229 / 1,583 / 1,583	2.15

4.2 Metrics

MRE: micro-F1 / Precision / Recall
MNER: strict entity-level F1 (boundary + type)
Selectivity: Risk-Activation-Coverage curves, AURC, ActCov@0.10
Efficiency: FLOPs/sample, end-to-end P90 latency

5. Main Results (SAVER)

5.1 MRE-MI Main Benchmark

Method	P↑	R↑	F1↑	AURC↓	ActCov@0.10↑	FLOPs (G/sample)↓	P90 (ms)↓
ModernBERT-only	82.37	79.84	81.09	0.147	0.68	13	17
DeBERTa-v3-only	81.53	79.48	80.49	0.153	0.66	18	27
HVPNeT	73.87	76.82	75.32	0.168	0.63	66	99
RSRNeT	84.78	83.06	83.89	0.129	0.74	60	90
All-Images Attn.	83.47	82.18	82.82	0.142	0.72	62	93
Top-K by relevance	85.31	83.62	84.45	0.119	0.77	51	77
GLRA	85.23	83.81	84.51	0.117	0.78	56	84
Retrieval-Aug.	84.27	82.86	83.56	0.124	0.75	55	82
SAVER (full)	85.93	84.57	85.24	0.104	0.82	36	54
SAVER w/o CGG	84.46	83.18	83.81	0.124	0.76	51	77
SAVER w/o SIS	84.13	82.74	83.43	0.136	0.74	62	93
SAVER w/o J.Score	85.28	84.12	84.70	0.111	0.80	37	56
SAVER (CGG+SIS)	85.14	84.33	84.73	0.107	0.81	35	53

5.2 Additional Benchmarks (F1)

Dataset	Strong baseline	SAVER	Gain
MNRE	83.9 (RSRNeT)	84.7	+0.8
Twitter-2015	76.5 (RSRNeT)	77.0	+0.5
Twitter-2017	87.9 (RSRNeT)	88.0	+0.1
MNER-MI	76.9 (GLRA-adapt)	77.3	+0.4
MNER-MI-Plus	83.1 (GLRA-adapt)	83.7	+0.6

5.3 CGG Calibration (α=0.10, 1-δ=0.95)

Dataset	Act. Coverage	Emp. Error	CP Upper
MRE-MI	0.82	0.087	0.097
MNRE	0.82	0.083	0.098
Twitter-2017	0.80	0.077	0.099
MNER-MI-Plus	0.81	0.084	0.099

6. Environment Setup

pip install -r requirements.txt

7. Data Preparation

Twitter2015 / Twitter2017

Place data under data/NER_data, or update data paths in run.py.

MNRE

Use links in the project documentation or prepare data with the expected directory structure.

Note: SAVER supports both single-image and multi-image inputs. In multi-image settings, SIS/RES builds compact evidence subsets only when gate activation occurs.

8. Training and Testing

NER

bash run_twitter15.sh
bash run_twitter17.sh

RE

bash run_re_task.sh

Test-only Example

python -u run.py \
  --dataset_name="MRE" \
  --bert_name="bert-base-uncased" \
  --seed=1234 \
  --only_test \
  --max_seq=80 \
  --use_prompt \
  --prompt_len=4 \
  --sample_ratio=1.0 \
  --load_path='your_re_ckpt_path'

9. Notes

This README now presents SAVER as the primary method and main result set.
If updated SAVER architecture/risk-coverage figures are available, replace placeholders in this README accordingly.

10. Acknowledgement

Twitter15/Twitter17 data processing follows UMT.
MNRE data sourcing follows MEGA.
Early implementation inspirations include HVPNeT.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
models		models
modules		modules
processor		processor
tests		tests
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
requirements.txt		requirements.txt
run.py		run.py
run_re_task.sh		run_re_task.sh
run_twitter15.sh		run_twitter15.sh
run_twitter17.sh		run_twitter17.sh
test_re_task.sh		test_re_task.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SAVER: Selective Visual Evidence Routing for Multimodal NER/RE

1. Method Overview

2. Task Settings

MNER

MRE

3. Key Components

3.1 CGG (Conformal Groundability Gate)

3.2 SIS (Submodular Image Selector, default)

3.3 RES (Reinforced Evidence Selector, optional)

3.4 Energy-Inspired Joint Scoring

4. Experiments

4.1 Datasets

4.2 Metrics

5. Main Results (SAVER)

5.1 MRE-MI Main Benchmark

5.2 Additional Benchmarks (F1)

5.3 CGG Calibration (α=0.10, 1-δ=0.95)

6. Environment Setup

7. Data Preparation

Twitter2015 / Twitter2017

MNRE

8. Training and Testing

NER

RE

Test-only Example

9. Notes

10. Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SAVER: Selective Visual Evidence Routing for Multimodal NER/RE

1. Method Overview

2. Task Settings

MNER

MRE

3. Key Components

3.1 CGG (Conformal Groundability Gate)

3.2 SIS (Submodular Image Selector, default)

3.3 RES (Reinforced Evidence Selector, optional)

3.4 Energy-Inspired Joint Scoring

4. Experiments

4.1 Datasets

4.2 Metrics

5. Main Results (SAVER)

5.1 MRE-MI Main Benchmark

5.2 Additional Benchmarks (F1)

5.3 CGG Calibration (α=0.10, 1-δ=0.95)

6. Environment Setup

7. Data Preparation

Twitter2015 / Twitter2017

MNRE

8. Training and Testing

NER

RE

Test-only Example

9. Notes

10. Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages