Ozgur Kara*, Harris Nisar*, James M. Rehg
(* denotes equal contribution)
DiffEye is the first diffusion-based model that generates continuous eye-tracking trajectories conditioned on natural images. It models gaze behavior directly from raw eye movement trajectories rather than processed scanpaths, achieving state-of-the-art scanpath generation while enabling high-fidelity continuous gaze trajectory synthesis.
DiffEye learns to generate realistic, diverse, continuous eye-tracking trajectories directly from raw gaze data using a diffusion-based generative model with a novel Corresponding Positional Embedding (CPE).
Click to expand full abstract
Numerous models have been developed for scanpath and saliency prediction, which are typically trained on scanpaths, which model eye movement as a sequence of discrete fixation points connected by saccades, while the rich information contained in the raw trajectories is often discarded. Moreover, most existing approaches fail to capture the variability observed among human subjects viewing the same image. They generally predict a single scanpath of fixed, pre-defined length, which conflicts with the inherent diversity and stochastic nature of real-world visual attention. To address these challenges, we propose DiffEye, a diffusion-based training framework designed to model continuous and diverse eye movement trajectories during free viewing of natural images. Our method builds on a diffusion model conditioned on visual stimuli and introduces a novel component, namely Corresponding Positional Embedding (CPE), which aligns spatial gaze information with the patch-based semantic features of the visual input. By leveraging raw eye-tracking trajectories rather than relying on scanpaths, DiffEye captures the inherent variability in human gaze behavior and generates high-quality, realistic eye movement patterns, despite being trained on a comparatively small dataset. The generated trajectories can also be converted into scanpaths and saliency maps, resulting in outputs that more accurately reflect the distribution of human visual attention. DiffEye is the first method to tackle this task on natural images using a diffusion model while fully leveraging the richness of raw eye-tracking data. Our extensive evaluation shows that DiffEye not only achieves state-of-the-art performance in scanpath generation but also enables, for the first time, the generation of continuous eye movement trajectories. Project webpage: https://diff-eye.github.io/
DiffEye generates full, continuous eye movement trajectories (d), which can be converted into scanpaths (c) and saliency maps (b). This captures the rich temporal dynamics of human attention that are often lost in discrete scanpath representations.
Comparison of generated scanpaths against baseline models and ground truth across different stimuli. DiffEye generates scanpaths that closely resemble the ground truth distribution, unlike baselines which often fail to capture the natural variability or focus on incorrect regions.
Please create and activate a conda environment using Python 3.10.15:
conda create -n diffeye python=3.10.15
conda activate diffeyeInstall the required packages:
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
pip install ipykernel matplotlib scikit-learn pandas pyyaml einops natsort diffusers timmDownload the model weights to the project root from here:
https://uofi.box.com/s/f8gykva7ucnx2e3z26hhfxji49txbwh1
Run through the demo notebook:
./demo.ipynb
Feel free to change image_path in Cell 2 Line 1 to a custom image path.
The training and evaluation scripts require MATLAB to convert generated trajectories to scanpaths and saliency maps for evaluation. We used MATLAB R2024b:
https://www.mathworks.com/help/install/ug/install-products-with-internet-connection.html
Install the following additional Python packages on top of the demo requirements:
pip install opencv-python
pip install matlabengine==24.2.2
pip install scikit-image fastdtw python-LevenshteinCreate the artifacts folders in the project root:
mkdir artifacts
mkdir artifacts/datasets
mkdir artifacts/experimentsDownload the MIT1003 dataset and our split files and stimuli embeddings from:
https://uofi.box.com/s/xvwr77f1xrlbsggbpwz8nv377d8fnn34
After download, place the folder inside artifacts/datasets. The final structure should look like:
diffeye/
artifacts/
datasets/
MIT1003_original/
ALLFIXATIONMAPS/
ALLSTIMULI/
...
experiments/
...
Run the training with the full model:
python main.py --root_dir=artifacts --exp_name=<EXP_NAME> --config_file=config/full.yamlThis command trains the full model and generates the demo weights used in the demo notebook.
You can also run ablated versions of the model by selecting one of the other configuration files in the ./config folder. Outputs are saved to:
artifacts/experiments/<EXP_NAME>/<timestamp>
This code utilizes the FeatUp codebase, which we include directly in this repository for convenience. The original FeatUp repository can be found here:
https://github.com/mhamilton723/FeatUp
If you use DiffEye in your research, please cite:
@inproceedings{kara2025diffeye,
title={DiffEye: Diffusion-Based Continuous Eye-Tracking Data Generation Conditioned on Natural Images},
author={Ozgur Kara and Harris Nisar and James Matthew Rehg},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=P5yoTfwyyD}
}