Skip to content

zju3dv/PostCam

Repository files navigation

PostCam

HuggingFace Project Page License arXiv

Environment

  • Python 3.10
  • CUDA 12.1

Create the environment with:

conda env create -f environment.yml
conda activate postcam

Model Weights

Download the following model checkpoints into the checkpoints/ directory:

Model Purpose Source
PostCam-1.3B v2v inference — 1.3B (Step 3) HuggingFace
Wan2.1-T2V-1.3B Text encoder + VAE + base model for 1.3B (Step 3) HuggingFace
DA3 (Depth-Anything-3) Depth estimation (Step 2) HuggingFace
Florence-2-large Video captioning (Step 1) HuggingFace
bash scripts/download.sh

Expected directory structure after downloading:

checkpoints/
├── PostCam/
│   └── postcam.ckpt
├── Wan2.1-T2V-1.3B/
├── DA3/
├── Florence-2-large/

Supported Entry Points

Inference

The full pipeline runs in three steps:

  1. Step 1 - Generate video captions using Florence-2.
  2. Step 2 - Estimate depth and camera poses with DA3, then convert outputs to the inference format.
  3. Step 3 - Run PostCam v2v inference.

All steps are wrapped in a single script:

bash run_pipeline.sh \
  --input_dir ./test \
  --traj_txt_path ./traj/y_left_30.txt

To run all bundled demo trajectories, use:

bash run_example.sh

run_example.sh runs the full pipeline once for the first trajectory, then reuses the generated captions and depth outputs for the remaining trajectories.

Quick Start

# 1. Place your .mp4 video(s) in a folder
mkdir -p my_videos
cp your_video.mp4 my_videos/

# 2. Run the full pipeline
bash run_pipeline.sh \
  --input_dir ./my_videos \
  --traj_txt_path ./traj/y_left_30.txt \
  --step1_gpu 0 \
  --step2_gpu 0 \
  --step3_gpu 0

For multi-GPU task parallelism, pass comma-separated GPU IDs:

bash run_pipeline.sh \
  --input_dir ./my_videos \
  --traj_txt_path ./traj/y_left_30.txt \
  --step1_gpu 0,1,2,3 \
  --step2_gpu 0,1,2,3 \
  --step3_gpu 0,1,2,3

Trajectory Control

The --traj_txt_path argument controls the camera trajectory for novel-view synthesis. Predefined trajectories are provided in the traj/ directory:

File Motion
y_left_30.txt Arc left 30 degrees
y_right_30.txt Arc right 30 degrees
x_up_30.txt Translate Up 30 degrees
x_down_30.txt Translate Down 30 degrees
zoom_in.txt Zoom in
zoom_out.txt Zoom out

Trajectory File Format

A trajectory file is a plain text file with 3 lines, each containing space-separated keyframe values that are automatically interpolated to match the input video length:

<line 1>  pitch (degrees): positive = orbit up, negative = orbit down
<line 2>  yaw (degrees):   positive = orbit left, negative = orbit right
<line 3>  displacement:    relative camera displacement scale

Line 3 (displacement) is a relative scale multiplied by the scene's estimated foreground depth:

  • When pitch/yaw are non-zero, it controls the orbit radius.
  • When both pitch and yaw are zero, it becomes a dolly zoom.

All Arguments

Argument Required Default Description
--input_dir Yes - Input folder containing .mp4 files
--traj_txt_path Yes - Trajectory file, e.g. ./traj/y_left_30.txt
--checkpoint_path No ./checkpoints/PostCam/postcam.ckpt PostCam checkpoint
--config_path No ./inference.yaml Inference config file
--da3_model_path No ./checkpoints/DA3 DA3 depth model path
--florence_model_path No ./checkpoints/Florence-2-large Florence-2 model path
--step1_gpu No 0 GPU ID(s) for Step 1, comma-separated for parallel captioning
--step2_gpu No 0 GPU ID(s) for Step 2, comma-separated for parallel depth estimation
--step3_gpu No 0 GPU ID(s) for Step 3, comma-separated for parallel inference
--output_dir No ./output Output root directory
--skip_step1 No false Skip caption generation
--skip_step2 No false Skip depth estimation and format conversion
--skip_step3 No false Skip PostCam inference

Skip Already-Completed Steps

If Step 1 or Step 2 outputs already exist, you can skip them:

bash run_pipeline.sh \
  --input_dir ./my_videos \
  --traj_txt_path ./traj/y_right_30.txt \
  --skip_step1 --skip_step2

This is useful when generating multiple camera trajectories for the same input videos.

License

This project is licensed under the Apache-2.0 License. Note that this license only applies to code in our library, the dependencies and submodules of which (Depth-Anything-3, Florence-2) are separate and individually licensed.


Acknowledgement

InSpatio-World utilizes a backbone based on Wan2.1, with its training code referencing ReCamMaster. We sincerely thank the Wan and ReCamMaster team for their foundational work and open-source contribution. We also gratefully acknowledge Depth-Anything-3, Florence-2 for their excellent work that inspired and supported this project.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors