Official implementation of StableI2I: Spotting Unintended Changes in Image-to-Image Transition (ICML 2026)
Any questions can be consulted -> (Email:lijiayang.cs@gmail.com)
Looking forward to your ⭐!
- release code
- release ckpt
- release pip-pkg
- release arxiv
- ICML version paper
In most real-world image-to-image (I2I) scenarios, existing evaluations primarily focus on instruction following and the perceptual quality or aesthetics of the generated images. However, they largely fail to assess whether the output image preserves the semantic correspondence and spatial structure of the input image. To address this limitation, we propose StableI2I, a unified and dynamic evaluation framework that explicitly measures content fidelity and pre--post consistency across a wide range of I2I tasks without requiring reference images, including image editing and image restoration. In addition, we construct StableI2I-Bench, a benchmark designed to systematically evaluate the accuracy of MLLMs on such fidelity and consistency assessment tasks. Extensive experimental results demonstrate that StableI2I provides accurate, fine-grained, and interpretable evaluations of content fidelity and consistency, with strong correlations to human subjective judgments. Our framework serves as a practical and reliable evaluation tool for diagnosing content consistency and benchmarking model performance in real-world I2I systems.
Install dependencies:
pip install -r requirements.txtThe specific environment is consistent with that of Qwen3-VL.
app.py is the local web demo and API entry. Running it starts a FastAPI service with a browser UI.
Example:
set MODEL_PATH=path/to/ckpt
set GPU_ID=0
set HOST=127.0.0.1
set PORT=10004
python app.pyThen open:
http://127.0.0.1:10004
The demo supports:
- built-in examples
- inference by local image path
- inference by image upload
- summarized semantic / structure / low-level results
See infer.md.
Recommended official references:
- Qwen3-VL: QwenLM/Qwen3-VL
- Swift: modelscope/ms-swift
Notes:
- For SFT, start from the official Qwen3-VL finetuning workflow.
- For GRPO and related alignment training, use Swift.
If you find our work helpful for your research, please consider citing our work.
@article{li2026stablei2i,
title={StableI2I: Spotting Unintended Changes in Image-to-Image Transition},
author={Li, Jiayang and Cao, Shuo and Li, Xiaohui and Zhang, Zhizhen and Zhu, Kaiwen and Duan, Yule and Qiao, Yu and Zhang, Jian and Liu, Yihao},
journal={arXiv preprint arXiv:2605.04453},
year={2026}
}