DiffEye: Diffusion-Based Continuous Eye-Tracking Data Generation Conditioned on Natural Images

NeurIPS 2025

Ozgur Kara*, Harris Nisar*, James M. Rehg

(* denotes equal contribution)

Overview

DiffEye is the first diffusion-based model that generates continuous eye-tracking trajectories conditioned on natural images. It models gaze behavior directly from raw eye movement trajectories rather than processed scanpaths, achieving state-of-the-art scanpath generation while enabling high-fidelity continuous gaze trajectory synthesis.

TL;DR

DiffEye learns to generate realistic, diverse, continuous eye-tracking trajectories directly from raw gaze data using a diffusion-based generative model with a novel Corresponding Positional Embedding (CPE).

Abstract

Click to expand full abstract

Numerous models have been developed for scanpath and saliency prediction, which are typically trained on scanpaths, which model eye movement as a sequence of discrete fixation points connected by saccades, while the rich information contained in the raw trajectories is often discarded. Moreover, most existing approaches fail to capture the variability observed among human subjects viewing the same image. They generally predict a single scanpath of fixed, pre-defined length, which conflicts with the inherent diversity and stochastic nature of real-world visual attention. To address these challenges, we propose DiffEye, a diffusion-based training framework designed to model continuous and diverse eye movement trajectories during free viewing of natural images. Our method builds on a diffusion model conditioned on visual stimuli and introduces a novel component, namely Corresponding Positional Embedding (CPE), which aligns spatial gaze information with the patch-based semantic features of the visual input. By leveraging raw eye-tracking trajectories rather than relying on scanpaths, DiffEye captures the inherent variability in human gaze behavior and generates high-quality, realistic eye movement patterns, despite being trained on a comparatively small dataset. The generated trajectories can also be converted into scanpaths and saliency maps, resulting in outputs that more accurately reflect the distribution of human visual attention. DiffEye is the first method to tackle this task on natural images using a diffusion model while fully leveraging the richness of raw eye-tracking data. Our extensive evaluation shows that DiffEye not only achieves state-of-the-art performance in scanpath generation but also enables, for the first time, the generation of continuous eye movement trajectories. Project webpage: https://diff-eye.github.io/

Examples

Stimulus vs. Saliency vs. Scanpath vs. Trajectory

DiffEye generates full, continuous eye movement trajectories (d), which can be converted into scanpaths (c) and saliency maps (b). This captures the rich temporal dynamics of human attention that are often lost in discrete scanpath representations.

Generated Scanpaths vs. Ground Truth

Comparison of generated scanpaths against baseline models and ground truth across different stimuli. DiffEye generates scanpaths that closely resemble the ground truth distribution, unlike baselines which often fail to capture the natural variability or focus on incorrect regions.

Instructions for Running Demo Notebook

Setup Environment

Please create and activate a conda environment using Python 3.10.15:

conda create -n diffeye python=3.10.15
conda activate diffeye

Install the required packages:

pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
pip install ipykernel matplotlib scikit-learn pandas pyyaml einops natsort diffusers timm

Model Weights

Download the model weights to the project root from here:

https://uofi.box.com/s/f8gykva7ucnx2e3z26hhfxji49txbwh1

Run the Demo

Run through the demo notebook:

./demo.ipynb

Feel free to change image_path in Cell 2 Line 1 to a custom image path.

Instructions for Training/Evaluation

Additional Setup (MATLAB + Extra Packages)

The training and evaluation scripts require MATLAB to convert generated trajectories to scanpaths and saliency maps for evaluation. We used MATLAB R2024b:

https://www.mathworks.com/help/install/ug/install-products-with-internet-connection.html

Install the following additional Python packages on top of the demo requirements:

pip install opencv-python
pip install matlabengine==24.2.2
pip install scikit-image fastdtw python-Levenshtein

Prepare Artifacts Folder for Dataset and Training Results

Create the artifacts folders in the project root:

mkdir artifacts
mkdir artifacts/datasets
mkdir artifacts/experiments

Download the MIT1003 dataset and our split files and stimuli embeddings from:

https://uofi.box.com/s/xvwr77f1xrlbsggbpwz8nv377d8fnn34

After download, place the folder inside artifacts/datasets. The final structure should look like:

diffeye/
    artifacts/
        datasets/
            MIT1003_original/
                ALLFIXATIONMAPS/
                ALLSTIMULI/
                ...
        experiments/
    ...

Run the Training

Run the training with the full model:

python main.py --root_dir=artifacts --exp_name=<EXP_NAME> --config_file=config/full.yaml

This command trains the full model and generates the demo weights used in the demo notebook.

You can also run ablated versions of the model by selecting one of the other configuration files in the ./config folder. Outputs are saved to:

artifacts/experiments/<EXP_NAME>/<timestamp>

Note

This code utilizes the FeatUp codebase, which we include directly in this repository for convenience. The original FeatUp repository can be found here:

https://github.com/mhamilton723/FeatUp

Citation

If you use DiffEye in your research, please cite:

@inproceedings{kara2025diffeye,
    title={DiffEye: Diffusion-Based Continuous Eye-Tracking Data Generation Conditioned on Natural Images},
    author={Ozgur Kara and Harris Nisar and James Matthew Rehg},
    booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
    year={2025},
    url={https://openreview.net/forum?id=P5yoTfwyyD}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
config		config
datasets		datasets
featup		featup
models		models
runnables		runnables
sbatches		sbatches
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.ipynb		demo.ipynb
demo_config.yaml		demo_config.yaml
demo_img1.jpeg		demo_img1.jpeg
demo_img2.jpg		demo_img2.jpg
main.py		main.py
main2.py		main2.py
main_global.py		main_global.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DiffEye: Diffusion-Based Continuous Eye-Tracking Data Generation Conditioned on Natural Images

NeurIPS 2025

Overview

TL;DR

Abstract

Examples

Stimulus vs. Saliency vs. Scanpath vs. Trajectory

Generated Scanpaths vs. Ground Truth

Instructions for Running Demo Notebook

Setup Environment

Model Weights

Run the Demo

Instructions for Training/Evaluation

Additional Setup (MATLAB + Extra Packages)

Prepare Artifacts Folder for Dataset and Training Results

Run the Training

Note

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

RehgLab/diffeye

Folders and files

Latest commit

History

Repository files navigation

DiffEye: Diffusion-Based Continuous Eye-Tracking Data Generation Conditioned on Natural Images

NeurIPS 2025

Overview

TL;DR

Abstract

Examples

Stimulus vs. Saliency vs. Scanpath vs. Trajectory

Generated Scanpaths vs. Ground Truth

Instructions for Running Demo Notebook

Setup Environment

Model Weights

Run the Demo

Instructions for Training/Evaluation

Additional Setup (MATLAB + Extra Packages)

Prepare Artifacts Folder for Dataset and Training Results

Run the Training

Note

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages