A PyTorch implementation of F4Splat (Kim et al., 2026), which performs spatially adaptive Gaussian allocation for feed-forward 3D Gaussian Splatting from sparse, uncalibrated images.
- Predictive Densification: Learns a densification score that predicts where additional Gaussians should be allocated, without iterative optimization.
- Budget-Controllable: Adjust the total number of Gaussians at inference time via a single threshold — no retraining required.
- Spatially Adaptive Allocation: Concentrates Gaussians on geometrically complex regions while avoiding redundancy in simple or overlapping areas.
- Uncalibrated Input: Works directly from uncalibrated multi-view images with joint camera parameter estimation.
Context Images
|
v
[DINOv2 Encoder] (frozen)
|
v
[VGGT-style Backbone] (frame-wise + global self-attention)
|
+---> Camera Head ---> Intrinsics K, Extrinsics T
|
v
[DPT Decoder] (multi-scale feature maps at L=3 levels)
|
+---> Gaussian Center Head ---> Depth maps ---> 3D centers
|
+---> Gaussian Param Head ---> Opacity, Rotation, Scale, SH, Densification Scores
|
v
[Spatially Adaptive Allocation] (threshold tau + budget matching)
|
v
[gsplat Renderer] ---> Novel View Synthesis
f4splat/
├── config.py # All configuration dataclasses
├── model/
│ ├── backbone.py # DINOv2 + VGGT-style geometry backbone
│ ├── decoder.py # Multi-scale DPT decoder (L=3 levels)
│ ├── heads.py # Gaussian Center Head + Parameter Head
│ └── f4splat_model.py # Unified model (train & inference pipelines)
├── gaussian/
│ ├── allocation.py # Adaptive allocation masks + budget matching (Alg. 1 & 2)
│ └── renderer.py # gsplat differentiable rendering wrapper
├── loss/
│ └── losses.py # Rendering, score, camera, and scene-scale losses
├── data/
│ └── dataset.py # RealEstate10K & ACID dataset loaders
├── utils/
│ ├── geometry.py # Sim(3) alignment, camera ops, quaternion helpers
│ └── metrics.py # PSNR, SSIM
├── train.py # Distributed training (DDP + AMP)
└── eval.py # Evaluation script
# Clone
git clone https://github.com/<your-username>/f4splat.git
cd f4splat
# Install dependencies
pip install -r requirements.txt- Python >= 3.10
- PyTorch >= 2.1
- gsplat >= 1.0
- NVIDIA GPU (training was validated on H200)
Download RealEstate10K and/or ACID and organize as:
data/
├── re10k/
│ ├── train/
│ │ └── <scene_id>/
│ │ ├── images/
│ │ │ ├── 000000.png
│ │ │ └── ...
│ │ └── cameras.npz # intrinsics (N,3,3), extrinsics (N,4,4)
│ └── test/
│ └── ...
└── acid/
└── ...
# 8 GPUs, ~15 hours
NUM_GPUS=8 bash scripts/train_multiview.shbash scripts/train_twoview.sh re10kpython -m f4splat.train \
--dataset re10k \
--data-root ./data \
--output-dir ./outputs \
--max-iterations 15000 \
--image-size 256Key arguments:
| Argument | Default | Description |
|---|---|---|
--batch-images |
24 | Total images per iteration (batch size adapts to view count) |
--lr |
2e-4 | Learning rate |
--two-view |
off | Two-view training mode |
--no-amp |
off | Disable mixed precision |
python -m f4splat.eval \
--checkpoint outputs/checkpoint_final.pt \
--dataset re10k \
--data-root ./data \
--n-views 8 16 24To evaluate with a specific Gaussian budget:
python -m f4splat.eval \
--checkpoint outputs/checkpoint_final.pt \
--target-gaussians 500000During training, the network learns to predict a per-region densification score from input images alone. The ground-truth signal is derived from the rendering loss gradient:
d_g = log(1 + 1e4 * ||v_g||_2)
where v_g is the accumulated view-space positional gradient of each Gaussian.
Given multi-scale Gaussian maps at L=3 levels and a threshold tau:
- Coarsest level: Allocate where score < tau (simple regions)
- Intermediate levels: Allocate where score < tau AND not already covered by coarser levels
- Finest level: Allocate everything remaining
At inference, given a target Gaussian count, binary search finds the threshold tau that produces the closest match — no retraining needed.
L_total = L_render + 1e-4 * L_score + 10 * L_camera + 1e-2 * L_scene
| Loss | Description |
|---|---|
L_render |
MSE + 0.05 * LPIPS between rendered and target novel views |
L_score |
L1 between predicted and gradient-based densification scores |
L_camera |
Geodesic rotation + L2 translation error |
L_scene |
Regularizes average Gaussian center distance to 1 |
@article{kim2026f4splat,
title={F4Splat: Feed-Forward Predictive Densification for Feed-Forward 3D Gaussian Splatting},
author={Kim, Injae and Kim, Chaehyeon and Bae, Minseong and Joo, Minseok and Kim, Hyunwoo J.},
journal={arXiv preprint arXiv:2603.21304},
year={2026}
}This implementation builds upon:
- VGGT — Geometry backbone architecture
- DINOv2 — Image encoder
- gsplat — Differentiable Gaussian rasterization
- NoPoSplat — RGB shortcut and training strategy
This project is released for research purposes. Please refer to the original paper and upstream licenses for usage terms.