Jiyuan Wang1 • Chunyu Lin1,† • Cheng Guan1 • Lang Nie4 Jing He3 • Haodong Li3 • Kang Liao2 • Yao Zhao1
1BJTU • 2NTU 3HKUST • 4CQUPT †Corresponding author
- [2025-03] 🎉 Paper released on arXiv!
- [2024-09] 🎉 Jasmine is accepted to NeurIPS 2025!
- [2025-10] 🎉 Code and pretrained models released!
Jasmine is the first framework that successfully integrates Stable Diffusion (SD) into self-supervised monocular depth estimation (SSMDE). Without any high-precision depth supervision, Jasmine achieves remarkably detailed and accurate depth estimation results through zero-shot generalization across diverse scenarios.
Download the pre-configured conda environment from HuggingFace:
# Download the conda-packed environment
wget https://huggingface.co/exander/Jasmine/resolve/main/jasmine.tar.gz
# Create directory and extract
mkdir -p ~/miniconda3/envs/jasmine
tar -xzf jasmine.tar.gz -C ~/miniconda3/envs/jasmine
# Activate the environment
conda activate jasmineTested Environment:
- Python 3.10.12, PyTorch 2.2.0+cu118, CUDA 11.8, Ubuntu 22.04 LTS, GeForce RTX A6000
Download KITTI Raw dataset and depth annotations from the official website.
The dataset should be organized as follows:
kitti/
├── 2011_09_26/
│ ├── 2011_09_26_drive_0002_sync/
│ │ └── image_02/
│ │ └── data/
│ └── ...
├── 2011_09_26_drive_0002_sync/
│ └── proj_depth/
│ └── groundtruth/
│ └── image_02/
├── 2011_09_28/
├── 2011_09_29/
├── 2011_09_30/
├── 2011_10_03/
└── gt_depths.npy
Download DrivingStereo dataset from the official website.
The dataset should be organized as follows:
drivingstereo/
├── foggy/
│ ├── left-image-full-size/
│ └── depth-map-full-size/
├── cloudy/
├── rainy/
└── sunny/
Download the pretrained model from HuggingFace:
# Download the model checkpoint
wget https://huggingface.co/exander/Jasmine/resolve/main/Jasmine.zip
unzip Jasmine.zip -d ckpt/Example command to evaluate on KITTI Eigen split:
python trains.py --only_test --eval_split eigen \
--ug --link_mode first \
--data_path /path/to/your/data \
--resume_from_checkpoint ./ckptTo evaluate on other datasets, simply change the --eval_split argument:
eigen: KITTI Eigen splitfoggy_stereo: DrivingStereo Foggy subsetcloudy_stereo: DrivingStereo Cloudy subsetrainy_stereo: DrivingStereo Rainy subsetsunny_stereo: DrivingStereo Sunny subset
For detailed quantitative and qualitative results, please refer to our paper and project page.
[TBD] Training code and instructions will be released soon.
If you find our work helpful, please consider citing:
@article{wang2025jasmine,
title={Jasmine: Harnessing Diffusion Prior for Self-supervised Depth Estimation},
author={Wang, Jiyuan and Lin, Chunyu and Guan, Cheng and Nie, Lang and He, Jing and Li, Haodong and Liao, Kang and Zhao, Yao},
journal={arXiv preprint arXiv:2503.15905},
year={2025}
}This project builds upon the following excellent works:
We thank the authors for their valuable contributions!
If you have any questions, feel free to contact us via issue or email.
This project is licensed under the MIT License - see the LICENSE file for details.