The official implementation of "SinkTrack: Attention Sink based Context Anchoring for Large Language Models" (ICLR 2026).
Large language models (LLMs) suffer from hallucination and context forgetting. These problems are caused by attention drift, where LLMs’ focus shifts towards newly generated tokens and away from the initial input context. To address this, we make use of a related, intrinsic characteristic of LLMs: attention sink – the tendency to consistently allocate high attention to the very first token (i.e., ⟨BOS⟩) of a sequence. Concretely, we propose an advanced context anchoring method, SINKTRACK, which treats ⟨BOS⟩ as an information anchor and injects key contextual features (such as those derived from the input image or instruction) into its representation. As such, LLM remains anchored to the initial input context throughout the entire generation process. SINKTRACK is training-free, plug-and-play, and introduces negligible inference overhead. Experiments demonstrate that SINKTRACK mitigates hallucination and context forgetting across both textual (e.g., +18.9% on QuAC with Llama3.1-8B-Instruct) and multi-modal (e.g., +23.0% on M3CoT with Qwen2.5-VL-7B-Instruct) tasks. Its consistent gains across different architectures and scales underscore the robustness and generalizability. We also analyze its underlying working mechanism from the perspective of information delivery.
Follow these steps to set up the environment and install the required dependencies.
-
Create and activate the Conda environment
# Create the conda environment with Python 3.10 conda create -n sinkTrack python=3.10 # Activate the newly created environment conda activate sinkTrack
-
Install dependencies
# Install all required packages from requirements.txt pip install -r requirements.txt
Note: All operations in this section should be performed within the all_inference_codes directory unless otherwise specified.
This section guides you through the complete pipeline for running Qwen2.5-VL on the M3CoT dataset, including data preparation, inference, and evaluation.
Download the M3CoT dataset from HuggingFace:
- URL: https://huggingface.co/datasets/LightChen2333/M3CoT
- Destination: Place the downloaded files into the
all_inference_codes/datasetsdirectory.
We provide a script to format the data.
- Requirement: The M3CoT dataset must contain the following keys:
id(question id),image(image path),question,choices, andanswer. - Action: Navigate to the datasets directory and run the processing script:
cd datasets python process_m3cot.py - Output: This will generate:
- An image folder:
datasets/M3CoT/data/images(images named by ID). - A JSON file:
datasets/M3CoT/data/test.jsoncontaining the required keys.
- An image folder:
Download the Qwen2.5-VL-7B-Instruct model.
- URL: https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct
- Destination: Place the model files into the
all_inference_codes/modelsdirectory.
Important: Ensure you are back in the all_inference_codes directory before proceeding.
Navigate to the Qwen2.5-VL directory and run the inference scripts for different methods (Direct, CoT, and SinkTrack).
cd qwen25vl
# Run Direct Inference
python direct.py
# Run Chain-of-Thought (CoT) Inference
python cot.py
# Run SinkTrack Inference
python run.pyThe results will be saved in the following directories:
- Direct:
qwen25vl/direct/qwen7b/m3cot - CoT:
cot/qwen7b/m3cot - SinkTrack:
sinktrack/qwen7b/m3cot
To evaluate the performance of the generated results, run the evaluation script:
python eval.pyThis will output the evaluation metrics for Direct, CoT, and SinkTrack methods.
Note: All operations in this section should be performed within the all_inference_codes/llama31 directory unless otherwise specified.
This section guides you through the complete pipeline for running Llama 3.1-8B-Instruct on the QuAC dataset, including data preparation, inference, and evaluation.
- Requirement: The validation JSON must follow the QuAC format (e.g.,
val_v0.2.jsonwith adatakey). - Action: Place the file at
all_inference_codes/llama31/quac/val_v0.2.json(or updatedata.val_data_pathinconfig.yaml). - Optional: For default evaluation settings, you may also copy it to
all_inference_codes/datasets/val_v0.2.json.
Download the Llama-3.1-8B-Instruct model.
- URL: https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct
- Destination: Place the model files into the
all_inference_codes/modelsdirectory (or the path set inconfig.yaml→model.model_path).
Important: Run the commands below from the all_inference_codes/llama31 directory.
Navigate to the llama31 directory and run the inference scripts for different methods (Direct, CoT, and SinkTrack).
cd all_inference_codes/llama31 # from repo root; omit if already there
# Run Direct Inference
python direct.py
# Run Chain-of-Thought (CoT) Inference
python cot.py
# Run SinkTrack Inference
python run.pyThe results are saved in a single directory:
- Output directory:
all_inference_codes/llama31/quac/ - Files:
direct_323.jsonl,direct_500.jsonl,direct_900.jsonl, and similarlycot_*.jsonlandsinktrack_*.jsonlfor each method.
To evaluate the performance of the generated results, run the evaluation script:
python new_scorer.pyThis will output the evaluation metrics (F1, HEQ, DHEQ, etc.) for the selected baseline across seeds. Use --baseline direct, --baseline cot, or the default SinkTrack; see python new_scorer.py --help for more options.
This section explains how to apply this framework to different datasets or models.
To use a custom dataset:
- Format Requirements: Ensure your dataset is processed to include the following keys:
id: Unique question identifier.image: Path to the image file.question: The text query.answer: Ground truth answer.choices: (Optional) Required if the task is multiple-choice.
- Placement: Place the dataset in the
datasetsfolder. - Configuration: Open
direct.py,cot.py, andrun.py. Modify the dataset loading paths to point to your new dataset directory.
To use a different model:
- Download: Download the desired model and place it in the
modelsdirectory. - Configuration: Open
direct.py,cot.py, andrun.py. Update the model loading paths to point to the new model in themodelsdirectory.
If you wish to assess the performance of the inference results we have already uploaded, follow the instructions below based on the modality.
To evaluate SinkTrack results for a specific model on a specific dataset (across 3 random seeds: 323, 500, 900):
- Navigate to the specific results folder.
- Run
python t.py.
Example: Evaluating Gemma3-4B using SinkTrack on the MMStar dataset.
cd all_inference_results/gemma3/sinktrack/4b/mmstar
python t.pyOutput: This will display the results for each seed file, as well as the mean and variance across the three runs.
To evaluate all methods (Direct, CoT, SinkTrack) for a specific model on a textual dataset (across 3 random seeds: 323, 500, 900):
- Navigate to the specific dataset folder within the model directory.
- Run
python eval.py.
Example: Evaluating Llama3.1-8B-Instruct on the QuAC dataset (results for Direct, CoT, and SinkTrack).
cd all_inference_results/llama3_1/quac
python eval.pyOutput: This will display the evaluation metrics for all methods located in that directory.
This project builds upon several excellent open-source works. We sincerely thank the authors for their contributions:
If you find this work useful in your research, please star our repository and consider citing:
@inproceedings{liu2026sinktrack,
title={SinkTrack: Attention Sink based Context Anchoring for Large Language Models},
author={Liu, Xu and Chen, Guikun and Wang, Wenguan},
booktitle={ICLR},
year={2026}
}