Skip to content

nktkt/f4splat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

F4Splat: Feed-Forward Predictive Densification for Feed-Forward 3D Gaussian Splatting

A PyTorch implementation of F4Splat (Kim et al., 2026), which performs spatially adaptive Gaussian allocation for feed-forward 3D Gaussian Splatting from sparse, uncalibrated images.

Key Features

  • Predictive Densification: Learns a densification score that predicts where additional Gaussians should be allocated, without iterative optimization.
  • Budget-Controllable: Adjust the total number of Gaussians at inference time via a single threshold — no retraining required.
  • Spatially Adaptive Allocation: Concentrates Gaussians on geometrically complex regions while avoiding redundancy in simple or overlapping areas.
  • Uncalibrated Input: Works directly from uncalibrated multi-view images with joint camera parameter estimation.

Architecture

Context Images
      |
      v
 [DINOv2 Encoder]  (frozen)
      |
      v
 [VGGT-style Backbone]  (frame-wise + global self-attention)
      |
      +---> Camera Head ---> Intrinsics K, Extrinsics T
      |
      v
 [DPT Decoder]  (multi-scale feature maps at L=3 levels)
      |
      +---> Gaussian Center Head ---> Depth maps ---> 3D centers
      |
      +---> Gaussian Param Head  ---> Opacity, Rotation, Scale, SH, Densification Scores
      |
      v
 [Spatially Adaptive Allocation]  (threshold tau + budget matching)
      |
      v
 [gsplat Renderer] ---> Novel View Synthesis

Project Structure

f4splat/
├── config.py                 # All configuration dataclasses
├── model/
│   ├── backbone.py           # DINOv2 + VGGT-style geometry backbone
│   ├── decoder.py            # Multi-scale DPT decoder (L=3 levels)
│   ├── heads.py              # Gaussian Center Head + Parameter Head
│   └── f4splat_model.py      # Unified model (train & inference pipelines)
├── gaussian/
│   ├── allocation.py         # Adaptive allocation masks + budget matching (Alg. 1 & 2)
│   └── renderer.py           # gsplat differentiable rendering wrapper
├── loss/
│   └── losses.py             # Rendering, score, camera, and scene-scale losses
├── data/
│   └── dataset.py            # RealEstate10K & ACID dataset loaders
├── utils/
│   ├── geometry.py           # Sim(3) alignment, camera ops, quaternion helpers
│   └── metrics.py            # PSNR, SSIM
├── train.py                  # Distributed training (DDP + AMP)
└── eval.py                   # Evaluation script

Installation

# Clone
git clone https://github.com/<your-username>/f4splat.git
cd f4splat

# Install dependencies
pip install -r requirements.txt

Requirements

  • Python >= 3.10
  • PyTorch >= 2.1
  • gsplat >= 1.0
  • NVIDIA GPU (training was validated on H200)

Data Preparation

Download RealEstate10K and/or ACID and organize as:

data/
├── re10k/
│   ├── train/
│   │   └── <scene_id>/
│   │       ├── images/
│   │       │   ├── 000000.png
│   │       │   └── ...
│   │       └── cameras.npz    # intrinsics (N,3,3), extrinsics (N,4,4)
│   └── test/
│       └── ...
└── acid/
    └── ...

Training

Multi-view (default)

# 8 GPUs, ~15 hours
NUM_GPUS=8 bash scripts/train_multiview.sh

Two-view

bash scripts/train_twoview.sh re10k

Single-GPU (debug)

python -m f4splat.train \
    --dataset re10k \
    --data-root ./data \
    --output-dir ./outputs \
    --max-iterations 15000 \
    --image-size 256

Key arguments:

Argument Default Description
--batch-images 24 Total images per iteration (batch size adapts to view count)
--lr 2e-4 Learning rate
--two-view off Two-view training mode
--no-amp off Disable mixed precision

Evaluation

python -m f4splat.eval \
    --checkpoint outputs/checkpoint_final.pt \
    --dataset re10k \
    --data-root ./data \
    --n-views 8 16 24

To evaluate with a specific Gaussian budget:

python -m f4splat.eval \
    --checkpoint outputs/checkpoint_final.pt \
    --target-gaussians 500000

Method Overview

Densification Score

During training, the network learns to predict a per-region densification score from input images alone. The ground-truth signal is derived from the rendering loss gradient:

d_g = log(1 + 1e4 * ||v_g||_2)

where v_g is the accumulated view-space positional gradient of each Gaussian.

Spatially Adaptive Allocation

Given multi-scale Gaussian maps at L=3 levels and a threshold tau:

  1. Coarsest level: Allocate where score < tau (simple regions)
  2. Intermediate levels: Allocate where score < tau AND not already covered by coarser levels
  3. Finest level: Allocate everything remaining

Budget Matching (Algorithms 1 & 2)

At inference, given a target Gaussian count, binary search finds the threshold tau that produces the closest match — no retraining needed.

Loss Functions

L_total = L_render + 1e-4 * L_score + 10 * L_camera + 1e-2 * L_scene
Loss Description
L_render MSE + 0.05 * LPIPS between rendered and target novel views
L_score L1 between predicted and gradient-based densification scores
L_camera Geodesic rotation + L2 translation error
L_scene Regularizes average Gaussian center distance to 1

Citation

@article{kim2026f4splat,
  title={F4Splat: Feed-Forward Predictive Densification for Feed-Forward 3D Gaussian Splatting},
  author={Kim, Injae and Kim, Chaehyeon and Bae, Minseong and Joo, Minseok and Kim, Hyunwoo J.},
  journal={arXiv preprint arXiv:2603.21304},
  year={2026}
}

Acknowledgments

This implementation builds upon:

  • VGGT — Geometry backbone architecture
  • DINOv2 — Image encoder
  • gsplat — Differentiable Gaussian rasterization
  • NoPoSplat — RGB shortcut and training strategy

License

This project is released for research purposes. Please refer to the original paper and upstream licenses for usage terms.

About

PyTorch implementation of F4Splat: Feed-Forward Predictive Densification for Feed-Forward 3D Gaussian Splatting (arXiv:2603.21304)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors