Junsheng Zhou#, Zhifan Yang#, Liang Han, Wenyuan Zhang, Kanle Shi, Shenkun Xu, Yu-Shen Liu*
Tsinghua University
# Equal contribution * Corresponding author
This paper tackles the challenge of recovering 4D dynamic scenes from videos captured by as few as four portable cameras. Learning to model scene dynamics for temporally consistent novel-view rendering is a foundational task in computer graphics, where previous works often require dense multi-view captures using camera arrays of dozens or even hundreds of views. We propose 4C4D, a novel framework that enables high-fidelity 4D Gaussian Splatting from video captures of extremely sparse cameras. Our key insight is that geometric learning under sparse settings is substantially more difficult than modeling appearance. Driven by this observation, we introduce a Neural Decaying Function on Gaussian opacities for enhancing the geometric modeling capability of 4D Gaussians. This design mitigates the inherent imbalance between geometry and appearance modeling in 4DGS by encouraging the 4DGS gradients to focus more on geometric learning. Extensive experiments across sparse-view datasets with varying camera overlaps show that 4C4D achieves superior performance over prior art.
Figure 1. Overview of the 4C4D framework.
We introduce a Neural Decaying Function
Clone and set up the 4C4D environment:
git clone https://github.com/yangzf-1023/4C4D
cd 4C4D
conda env create --file environment.yml
conda activate 4c4dSet up MASt3R for dense point cloud initialization:
Since COLMAP produces extremely sparse point clouds with few input views, we use MASt3R-based reconstruction instead.
cd ..
git clone https://github.com/anttwo/MAtCha.git
cd MAtCha
python install.py
python download_checkpoints.py
conda activate matchaWhether you use the provided pre-processed data or prepare your own custom dataset, please organize the data directory as follows:
data/
├── N3V/ # or your custom dataset name
│ ├── flame_steak/ # scene directory
│ │ ├── images/ # input frames
│ │ │ ├── cam00_0000.png # format: cam{XX}_{YYYY}.png
│ │ │ ├── cam00_0001.png # XX = camera index (zero-padded)
│ │ │ ├── cam01_0000.png # YYYY = frame index (zero-padded)
│ │ │ └── ...
│ │ └── sparse/
│ │ └── 0/
│ │ ├── cameras.bin # camera intrinsics (COLMAP format)
│ │ ├── images.bin # camera extrinsics (COLMAP format)
│ │ └── points3D.bin # reconstructed 3D points
│ ├── cook_spinach/
│ │ ├── images/
│ │ └── sparse/
│ │ └── 0/
│ │ ├── cameras.bin
│ │ ├── images.bin
│ │ └── points3D.bin
│ └── ... # additional scenes
Format note: Both
.bin(binary) and.txt(text) COLMAP formats are supported for all files undersparse/0/.
Important — how
sparse/0/files are generated:
points3D.*is always reconstructed from sparse (training) views only, since it serves as the point cloud initialization for training.images.*andcameras.*can be generated from either sparse views or all (dense) views, depending on whether you need to render/evaluate on held-out test views. If you only train without evaluation, sparse views are sufficient; if you need test-view evaluation, generate them from all views so that test camera poses are included.
We provide pre-processed data for all scenes in the Neural 3D Video (N3V) dataset (first 300 frames, using training views 1, 10, 13, 20). You can download it directly and skip to Training:
Download: google drive
If you prefer to process the raw data yourself:
-
Download the Neural 3D Video dataset and extract each scene to
data/N3V/. -
Preprocess the raw video:
cd ../4C4D
conda activate 4dgs
python scripts/n3v2blender.py data/N3V/$SCENE --training_view $TRAIN_VIEW- (Recommended) Generate dense point clouds with MASt3R for best results:
# Convert to COLMAP format
python scripts/n3v2colmap.py data/N3V/$SCENE --training_view $TRAIN_VIEW
python scripts/n3v2colmap.py data/N3V/$SCENE
# Run MASt3R reconstruction
cd ../MAtCha
conda activate matcha
python train.py \
-s ../4C4D/data/N3V/$SCENE/mast3r_${N_SPARSE} \
-o ../4C4D/data/N3V/$SCENE/mast3r_${N_SPARSE} \
--sfm_config posed --sfm_only
# Copy reconstructed point cloud
cd ../4C4D
conda activate 4dgs
cp -r data/N3V/$SCENE/mast3r_${N_DENSE}/sparse data/N3V/$SCENE/
cp data/N3V/$SCENE/mast3r_${N_DENSE}/mast3r_sfm/sparse/0/points3D.* \
data/N3V/$SCENE/sparse/0/To use your own data, organize it according to the Dataset Structure above. Ensure that:
images/contains the extracted video frames named ascam{XX}_{YYYY}.png, whereXXis the zero-padded camera index andYYYYis the zero-padded frame index.sparse/0/contains valid COLMAP-format camera parameters and point cloud files. You may obtain these via COLMAP, MASt3R, or any other SfM pipeline. Refer to the generation notes above for guidance on which views to use.
Variable Reference
| Variable | Description | Example |
|---|---|---|
$SCENE |
Scene name from the N3V dataset | flame_steak |
$TRAIN_VIEW |
Training view indices (comma-separated) | 1,10,13,20 |
$N_SPARSE |
Number of sparse views, equal to len($TRAIN_VIEW)
|
4 |
$N_DENSE |
Total number of views in the scene | 21 |
python train.py \
--config $CONFIG_PATH \
--training_view $TRAIN_VIEW \
--output_dir $OUTPUT_DIRRender a novel-view trajectory after training:
python render.py \
--config $CONFIG_PATH \
--training_view $TRAIN_VIEW \
--output_dir $OUTPUT_DIR \
--traj arc \
--start_checkpoint output/N3V/$SCENE/chkpnt30000.pthEvaluate on held-out test views:
python render.py \
--config $CONFIG_PATH \
--training_view $TRAIN_VIEW \
--output_dir $OUTPUT_DIR \
--test \
--start_checkpoint output/N3V/$SCENE/chkpnt30000.pthIf you find this work useful, please consider citing:
@inproceedings{zhou20264c4d,
title = {4C4D: 4 Camera 4D Gaussian Splatting},
author = {Zhou, Junsheng and Yang, Zhifan and Han, Liang and Zhang, Wenyuan and Shi, Kanle and Xu, Shenkun and Liu, Yushen},
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2026}
}Our codebase builds upon 4DGS and MASt3R. We thank the authors for their excellent work.