GitHub - moqingx52/GemDepth: 【ICML 2026】GemDepth: Geometry-Embedded Features for 3D-Consistent Video Depth

GemDepth: Geometry-Embedded Features for 3D-Consistent Video Depth

Yuecheng liu¹, Junda Cheng^1*, Wenjing Liao^1,2, Hanrui Cheng^1,2, Yuzhou Wang¹, Xin Yang^1,3

^*Corresponding Author
¹Hust, ²Carizon, ³Optics Valley Laboratory

If you like our project, please give us a star ⭐ on GitHub for the latest updates!

📢 News

[2026.05.14] Add run_video_pointcloud for pointcloud reconstruction.
[2026.05.09] 🔥 GemDepth is out! It effectively recovering fine-grained details and has better 3D temporal consistency.

👋 Introduction

Welcome to the official repository for GemDepth!

GemDepth is a framework built on the insight that an explicit awareness of camera motion and global 3D structure is a prerequisite for 3D consistency. Distinctively, GemDepth introduces a Geometry-Embedding Module (GEM) that predicts inter-frame camera poses to generate implicit geometric embeddings. This injection of motion priors equips the network with intrinsic 3D perception and alignment capabilities. Guided by these geometric cues, our Alternating Spatio-Temporal Transformer (ASTT) captures latent point-level correspondences to simultaneously enhance spatial precision for sharp details and enforce rigorous temporal consistency.

GemDepth achieves stateof-the-art performance across multiple datasets, particularly in complex dynamic scenarios.

📝 Benchmarks performance

Comparisons with state-of-the-art methods across four of the most widely used benchmarks.

⏳ Usage

Preparation

git clone https://github.com/Yuechengliu919/GemDepth
cd GemDepth
conda create -n gemdepth python=3.10
conda activate gemdepth
pip install -r requirements.txt

Use our model

import torch
from model.gemdepth import GemDepth
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
model_configs = {
    'vits': {'encoder': 'vits''features': 64, 'out_channels': [4896, 192, 384]},
    'vitl': {'encoder': 'vitl''features': 256, 'out_channels'[256, 512, 1024, 1024]},
}
gemdepth = GemDepth(**model_configs[argencoder])
checkpoint = torch.load("./checkpoint/gemdepth.pth",map_location='cpu',weights_only=False)
gemdepth.load_state_dict(checkpoint,strict=True)
gemdepth = gemdepth.to(DEVICE).eval()
frames, target_fps = read_video_frames(video_path, args.max_len, args.target_fps, 1280)
depths, fps = gemdepth.infer_video_depth(frames, target_fps, input_size=args.input_size,device=DEVICE, fp32=args.fp32)

Running script on video

# Only video depth output
python evaluation/inference/run_video.py --input_dir ./assets/example_videos --output_dir ./assets/example_result
# video depth & pointcloud output
python evaluation/inference/run_video_pointcloud.py --input_dir ./assets/example_videos --output_dir ./assets/example_result

✏️ Training Data

✈️ Model weights

Model	Link
GemDepth	Download 🤗

✈️ Evaluation

Prepare Evaluation Datasets

Follow VideoDepthAnything, download datasets from the following links: Sintel, KITTI, Bonn, ScanNet

pip install natsort
cd dataset/dataset_extract
python dataset_extrtact${dataset}.py

This script will extract the dataset to the dataset/dataset_extract/dataset folder. It will also generate the json file for the dataset.

Run inference

python evaluation/inference/infer/infer.py \
    --infer_path ${out_path} \
    --json_file ${json_path} \
    --datasets ${dataset}

Options:

--infer_path: path to save the output results
--json_file: path to the json file for the dataset, like sintel_video.json, kitti_video_500.json, scannet_video_tae.json
--datasets: dataset name, choose from sintel, kitti, bonn, scannet

Run evaluation

## ~500frame 
python evaluation/eval/eval.py \
    --infer_path ${pred_root} \
    --benchmark_path ${benchmark_root} \
    --datasets ${dataset}

✈️ Training

To train GemDepth on mix-datasets, run

## stage1
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch train.py --config-name stage1
## stage2
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch train.py --config-name stage2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GemDepth: Geometry-Embedded Features for 3D-Consistent Video Depth

If you like our project, please give us a star ⭐ on GitHub for the latest updates!

📢 News

👋 Introduction

📝 Benchmarks performance

⏳ Usage

Preparation

Use our model

Running script on video

✏️ Training Data

✈️ Model weights

✈️ Evaluation

Prepare Evaluation Datasets

Run inference

Run evaluation

✈️ Training

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
assets		assets
config		config
dataset		dataset
evaluation		evaluation
loss		loss
model		model
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

GemDepth: Geometry-Embedded Features for 3D-Consistent Video Depth

If you like our project, please give us a star ⭐ on GitHub for the latest updates!

📢 News

👋 Introduction

📝 Benchmarks performance

⏳ Usage

Preparation

Use our model

Running script on video

✏️ Training Data

✈️ Model weights

✈️ Evaluation

Prepare Evaluation Datasets

Run inference

Run evaluation

✈️ Training

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages