Skip to content

CN-ADLab/SAM4D

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SAM4D

Jianyun Xu†, Song Wang†, Ziqian Ni†, Chunyong Hu, Sheng Yang*, Jianke Zhu, Qiang Li

This is the official implementation of SAM4D: Segment Anything in Camera and LiDAR Streams (ICCV2025) [Paper] [Project Page].

Abstract

We present SAM4D, a multi-modal and temporal foundation model designed for promptable segmentation across camera and LiDAR streams. Unified Multi-modal Positional Encoding (UMPE) is introduced to align camera and LiDAR features in a shared 3D space, enabling seamless cross-modal prompting and interaction. Additionally, we propose Motion-aware Cross-modal Memory Attention (MCMA), which leverages ego-motion compensation to enhance temporal consistency and long-horizon feature retrieval, ensuring robust segmentation across dynamically changing autonomous driving scenes. To avoid annotation bottlenecks, we develop a multi-modal automated data engine that synergizes VFM-driven video masklets, spatiotemporal 4D reconstruction, and cross-modal masklet fusion. This framework generates camera-LiDAR aligned pseudo-labels at a speed orders of magnitude faster than human annotation while preserving VFM-derived semantic fidelity in point cloud representations. We conduct extensive experiments on the constructed Waymo-4DSeg, which demonstrate the powerful cross-modal segmentation ability and great potential in data annotation of proposed SAM4D.

Installation

Please ensure that your Linux system has cuda 12.1 or above installed, along with nvcc.
#If you dont have, you may install cuda via:
wget https://developer.download.nvidia.com/compute/cuda/12.3.2/local_installers/cuda_12.3.2_545.23.08_linux.run
sudo sh cuda_12.3.2_545.23.08_linux.run \
     --silent \
     --toolkit \
     --toolkitpath=/usr/local/cuda-12.3 \
     --no-opengl-libs \
     --override
  1. Create a conda environment and install torch
conda create -n sam4d python=3.8
conda activate sam4d
pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121
  1. Clone the repo and install dependencies
git clone https://github.com/CN-ADLab/SAM4D.git && cd SAM4D
pip install -r requirements.txt
python setup.py develop

Note: If you encounter google/dense_hash_map issue with installing torchsparse, please install sparsehash first via sudo apt update && sudo apt install libsparsehash-dev.

Quick Start

Prepare Data

Download dataset Waymo-4DSeg from here: modelscope. You can put the extracted example data from Waymo-4DSeg/samples.zip to ./data. The data structure are as follows:

${dataset}
├── meta_infos
│   └── ${sequence_name}.pkl
├── pcds
│   └── ${sequence_name}
│       ├── {timestamp1}.npz
│       ├── {timestamp2}.npz
│       └── ...
├── sam4d_labels (optional)
│   └── ${sequence_name}
│       ├── {timestamp1}.json
│       ├── {timestamp2}.json
│       └── ...
└── undistort_images
    └── ${sequence_name}
        ├── ${timestamp1}
        │   ├── ${cam_name}.jpg
        │   └── ...
        ├── ${timestamp2}
        │   ├── ${cam_name}.jpg
        │   └── ...
        └── ...

meta_infos: a pickle file containing meta information of the sequence

# meta_infos/${sequence_name}.pkl structure:
from typing import Dict, List, Tuple, Union

MetaInfoType = Dict[str, Union[
    str,
    List[Dict[str, Union[
        Dict[str, Dict[str, Union[
            str,
            List[List[float]],
            None
        ]]],
        Dict[str, str],
        List[List[float]]
    ]]]
]]

example_meta_info: MetaInfoType = {
    'seq_name': 'your_sequence_name',
    'frames': [
        {
            'cams_info': {
                'your_cam_name': {
                    'data_path': 'undistort_images/your_sequence_name/your_timestamp/your_cam_name.jpg', # path to image
                    'camera_intrinsics': [[fx, 0, cx], [0, fy, cy], [0, 0, 1]],  # 3x3 matrix
                    'camera2lidar': [[...], [...], [...], [...]]  # 4x4 matrix
                },
                'your_cam_name2': {...}
            },
            'path': {
                'pcd': 'pcds/your_sequence_name/your_timestamp.npz', # path to point cloud
            },
            'lidar2world': [[...], [...], [...], [...]]  # 4x4 matrix
        }
    ]
}

You can follow data/visualize_sam4d_labels.ipynb to explore the dataset. Make sure you have extracted samples.zip to ./data.

Run Demo

Download the model checkpoints from: modelscope.

Please follow notebooks/sam4d_predictor_example.ipynb step by step to proceed. Make sure you have extracted samples.zip to ./data.

Acknowledgement

We gratefully acknowledge the developers of the following open-source projects and datasets, whose foundational tools enabled our research: SAM2, GroundingDINO, Grounded-SAM-2, Waymo Open Dataset, VDBFusion, among others.

Citing SAM4D

@article{xu2025sam4d,
  title={SAM4D: Segment Anything in Camera and LiDAR Streams},
  author={Xu, Jianyun and Wang, Song and Ni, Ziqian and Hu, Chunyong and Yang, Sheng and Zhu, Jianke and Li, Qiang},
  journal={arXiv preprint arXiv:2506.21547},
  year={2025}
}

About

[ICCV 2025] SAM4D: Segment Anything in Camera and LiDAR Streams

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •