SAM4D

Jianyun Xu†, Song Wang†, Ziqian Ni†, Chunyong Hu, Sheng Yang*, Jianke Zhu, Qiang Li

This is the official implementation of SAM4D: Segment Anything in Camera and LiDAR Streams (ICCV2025) [Paper] [Project Page].

Abstract

We present SAM4D, a multi-modal and temporal foundation model designed for promptable segmentation across camera and LiDAR streams. Unified Multi-modal Positional Encoding (UMPE) is introduced to align camera and LiDAR features in a shared 3D space, enabling seamless cross-modal prompting and interaction. Additionally, we propose Motion-aware Cross-modal Memory Attention (MCMA), which leverages ego-motion compensation to enhance temporal consistency and long-horizon feature retrieval, ensuring robust segmentation across dynamically changing autonomous driving scenes. To avoid annotation bottlenecks, we develop a multi-modal automated data engine that synergizes VFM-driven video masklets, spatiotemporal 4D reconstruction, and cross-modal masklet fusion. This framework generates camera-LiDAR aligned pseudo-labels at a speed orders of magnitude faster than human annotation while preserving VFM-derived semantic fidelity in point cloud representations. We conduct extensive experiments on the constructed Waymo-4DSeg, which demonstrate the powerful cross-modal segmentation ability and great potential in data annotation of proposed SAM4D.

Installation

Please ensure that your Linux system has cuda 12.1 or above installed, along with nvcc.

#If you dont have, you may install cuda via:
wget https://developer.download.nvidia.com/compute/cuda/12.3.2/local_installers/cuda_12.3.2_545.23.08_linux.run
sudo sh cuda_12.3.2_545.23.08_linux.run \
     --silent \
     --toolkit \
     --toolkitpath=/usr/local/cuda-12.3 \
     --no-opengl-libs \
     --override

Create a conda environment and install torch

conda create -n sam4d python=3.8
conda activate sam4d
pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121

Clone the repo and install dependencies

git clone https://github.com/CN-ADLab/SAM4D.git && cd SAM4D
pip install -r requirements.txt
python setup.py develop

Note: If you encounter google/dense_hash_map issue with installing torchsparse, please install sparsehash first via sudo apt update && sudo apt install libsparsehash-dev.

Quick Start

Prepare Data

Download dataset Waymo-4DSeg from here: modelscope. You can put the extracted example data from Waymo-4DSeg/samples.zip to ./data. The data structure are as follows:

${dataset}
├── meta_infos
│   └── ${sequence_name}.pkl
├── pcds
│   └── ${sequence_name}
│       ├── {timestamp1}.npz
│       ├── {timestamp2}.npz
│       └── ...
├── sam4d_labels (optional)
│   └── ${sequence_name}
│       ├── {timestamp1}.json
│       ├── {timestamp2}.json
│       └── ...
└── undistort_images
    └── ${sequence_name}
        ├── ${timestamp1}
        │   ├── ${cam_name}.jpg
        │   └── ...
        ├── ${timestamp2}
        │   ├── ${cam_name}.jpg
        │   └── ...
        └── ...

meta_infos: a pickle file containing meta information of the sequence

# meta_infos/${sequence_name}.pkl structure:
from typing import Dict, List, Tuple, Union

MetaInfoType = Dict[str, Union[
    str,
    List[Dict[str, Union[
        Dict[str, Dict[str, Union[
            str,
            List[List[float]],
            None
        ]]],
        Dict[str, str],
        List[List[float]]
    ]]]
]]

example_meta_info: MetaInfoType = {
    'seq_name': 'your_sequence_name',
    'frames': [
        {
            'cams_info': {
                'your_cam_name': {
                    'data_path': 'undistort_images/your_sequence_name/your_timestamp/your_cam_name.jpg', # path to image
                    'camera_intrinsics': [[fx, 0, cx], [0, fy, cy], [0, 0, 1]],  # 3x3 matrix
                    'camera2lidar': [[...], [...], [...], [...]]  # 4x4 matrix
                },
                'your_cam_name2': {...}
            },
            'path': {
                'pcd': 'pcds/your_sequence_name/your_timestamp.npz', # path to point cloud
            },
            'lidar2world': [[...], [...], [...], [...]]  # 4x4 matrix
        }
    ]
}

You can follow data/visualize_sam4d_labels.ipynb to explore the dataset. Make sure you have extracted samples.zip to ./data.

Run Demo

Download the model checkpoints from: modelscope.

Please follow notebooks/sam4d_predictor_example.ipynb step by step to proceed. Make sure you have extracted samples.zip to ./data.

Acknowledgement

We gratefully acknowledge the developers of the following open-source projects and datasets, whose foundational tools enabled our research: SAM2, GroundingDINO, Grounded-SAM-2, Waymo Open Dataset, VDBFusion, among others.

Citing SAM4D

@article{xu2025sam4d,
  title={SAM4D: Segment Anything in Camera and LiDAR Streams},
  author={Xu, Jianyun and Wang, Song and Ni, Ziqian and Hu, Chunyong and Yang, Sheng and Zhu, Jianke and Li, Qiang},
  journal={arXiv preprint arXiv:2506.21547},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
configs		configs
data		data
notebooks		notebooks
sam4d		sam4d
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SAM4D

Abstract

Installation

Quick Start

Prepare Data

Run Demo

Acknowledgement

Citing SAM4D

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

CN-ADLab/SAM4D

Folders and files

Latest commit

History

Repository files navigation

SAM4D

Abstract

Installation

Quick Start

Prepare Data

Run Demo

Acknowledgement

Citing SAM4D

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages