Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation
Shaocong Xu, Songlin Wei, Qizhe Wei, Zheng Geng, Hong Li, Licheng Shen, Qianpu Sun, Shu Han, Bin Ma, Bohan Li, Chongjie Ye, Yuhang Zheng, Nan Wang, Saining Zhang, and Hao Zhao
DKT is a foundation model for transparent-object 🫙, in-the-wild 🌎, arbitrary-length ⏳ video depth and normal estimation, facilitating downstream applications such as robot manipulation tasks, policy learning, and so forth.
[25-12-04]🔥🔥🔥 DKT is released now, have fun!
Our pretrained models are available on the huggingface hub:
| Version | Hugging Face Model |
|---|---|
| DKT-Depth-1-3B | Daniellesry/DKT-Depth-1-3B |
Please run following commands to build package:
git clone https://github.com/Daniellli/DKT.git
cd DKT
pip install -r requirements.txt
- Online demo: DKT
- Local demo:
python app.py
from dkt.pipelines.pipelines import DKTPipeline
import os
from tools.common_utils import save_video
pipe = DKTPipeline()
demo_path = 'examples/1.mp4'
prediction = pipe(demo_path)
save_dir = 'logs'
os.makedirs(save_dir, exist_ok=True)
output_path = os.path.join(save_dir, 'demo.mp4')
save_video(prediction['colored_depth_map'], output_path, fps=25)
Our code is based on recent fantastic works including MoGe, WAN, and DiffSynth-Studio. We sincerely thank the authors for their excellent contributions. Huge thanks!
...