Skip to content

moqingx52/GemDepth

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GemDepth: Geometry-Embedded Features for 3D-Consistent Video Depth

Yuecheng liu1, Junda Cheng1*, Wenjing Liao1,2, Hanrui Cheng1,2, Yuzhou Wang1, Xin Yang1,3

*Corresponding Author
1Hust, 2Carizon, 3Optics Valley Laboratory

If you like our project, please give us a star ⭐ on GitHub for the latest updates!

Model Paper

📢 News

  • [2026.05.14] Add run_video_pointcloud for pointcloud reconstruction.
  • [2026.05.09] 🔥 GemDepth is out! It effectively recovering fine-grained details and has better 3D temporal consistency.

👋 Introduction

Welcome to the official repository for GemDepth!

GemDepth is a framework built on the insight that an explicit awareness of camera motion and global 3D structure is a prerequisite for 3D consistency. Distinctively, GemDepth introduces a Geometry-Embedding Module (GEM) that predicts inter-frame camera poses to generate implicit geometric embeddings. This injection of motion priors equips the network with intrinsic 3D perception and alignment capabilities. Guided by these geometric cues, our Alternating Spatio-Temporal Transformer (ASTT) captures latent point-level correspondences to simultaneously enhance spatial precision for sharp details and enforce rigorous temporal consistency.

GemDepth achieves stateof-the-art performance across multiple datasets, particularly in complex dynamic scenarios.

network

📝 Benchmarks performance

benchmark

Comparisons with state-of-the-art methods across four of the most widely used benchmarks.

⏳ Usage

Preparation

git clone https://github.com/Yuechengliu919/GemDepth
cd GemDepth
conda create -n gemdepth python=3.10
conda activate gemdepth
pip install -r requirements.txt

Use our model

import torch
from model.gemdepth import GemDepth
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
model_configs = {
    'vits': {'encoder': 'vits''features': 64, 'out_channels': [4896, 192, 384]},
    'vitl': {'encoder': 'vitl''features': 256, 'out_channels'[256, 512, 1024, 1024]},
}
gemdepth = GemDepth(**model_configs[argencoder])
checkpoint = torch.load("./checkpoint/gemdepth.pth",map_location='cpu',weights_only=False)
gemdepth.load_state_dict(checkpoint,strict=True)
gemdepth = gemdepth.to(DEVICE).eval()
frames, target_fps = read_video_frames(video_path, args.max_len, args.target_fps, 1280)
depths, fps = gemdepth.infer_video_depth(frames, target_fps, input_size=args.input_size,device=DEVICE, fp32=args.fp32)

Running script on video

# Only video depth output
python evaluation/inference/run_video.py --input_dir ./assets/example_videos --output_dir ./assets/example_result
# video depth & pointcloud output
python evaluation/inference/run_video_pointcloud.py --input_dir ./assets/example_videos --output_dir ./assets/example_result  

✏️ Training Data

✈️ Model weights

Model Link
GemDepth Download 🤗

✈️ Evaluation

Prepare Evaluation Datasets

Follow VideoDepthAnything, download datasets from the following links: Sintel, KITTI, Bonn, ScanNet

pip install natsort
cd dataset/dataset_extract
python dataset_extrtact${dataset}.py

This script will extract the dataset to the dataset/dataset_extract/dataset folder. It will also generate the json file for the dataset.

Run inference

python evaluation/inference/infer/infer.py \
    --infer_path ${out_path} \
    --json_file ${json_path} \
    --datasets ${dataset}

Options:

  • --infer_path: path to save the output results
  • --json_file: path to the json file for the dataset, like sintel_video.json, kitti_video_500.json, scannet_video_tae.json
  • --datasets: dataset name, choose from sintel, kitti, bonn, scannet

Run evaluation

## ~500frame 
python evaluation/eval/eval.py \
    --infer_path ${pred_root} \
    --benchmark_path ${benchmark_root} \
    --datasets ${dataset}

✈️ Training

To train GemDepth on mix-datasets, run

## stage1
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch train.py --config-name stage1
## stage2
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch train.py --config-name stage2

About

【ICML 2026】GemDepth: Geometry-Embedded Features for 3D-Consistent Video Depth

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 99.1%
  • Other 0.9%