3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection

3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection
Yung-Hsu Yang, Luigi Piccinelli, Mattia Segu, Siyuan Li, Rui Huang, Yuqian Fu, Marc Pollefeys, Hermann Blum, Zuria Bauer
ICCV 2025, Paper at arXiv 2507.23567

News and ToDo

27.08.2025: Add scripts/demo.py and Huggingface Demo!
25.08.2025: Release code and models.
25.06.2025: 3D-MOOD is accepted at ICCV 2025!

Getting Started

We use Vis4D as the framework to implement 3D-MOOD. Please check the document for more details.

Installation

We support Python 3.11+ and PyTorch 2.4.0+. Please install the correct PyTorch version according to your own hardware settings.

conda create -n opendet3d python=3.11 -y

conda activate opendet3d

# Install Vis4D
# It should also install the PyTorch with CUDA support. But please check.
pip install vis4d==1.0.0

# Install CUDA ops
pip install git+https://github.com/SysCV/vis4d_cuda_ops.git --no-build-isolation --no-cache-dir

# Install 3D-MOOD
pip install -v -e .

Demo

We provide the demo.py to test whether the installation is complete.

python scripts/demo.py

It will save the prediction as follow to assets/demo/output.png.

You can also try the live demo on here!

Data Preparation

We provide the HDF5 files and annotations here for ScanNet v2, Argoverse 2, and the depth GT for Omni3D datasets.

For training and testing with OmniD, please refer to DATA to setup the Omni3D data.

We also illustrate the coordinate system we use here.

The final data folder should be like:

REPO_ROOT
├── data
│   ├── omni3d
│   │   └── annotations
│   ├── KITTI_object
│   ├── KITTI_object_depth
│   ├── nuscenes
│   ├── nuscenes_depth
│   ├── objectron
│   ├── objectron_depth
│   ├── SUNRGBD
│   ├── ARKitScenes
│   ├── ARKitScenes_depth
│   ├── hypersim
│   ├── hypersim_depth
│   ├── argoverse2
│   │   ├── annotations
│   │   └── val.hdf5
│   └── scannet
│       ├── annotations
│       └── val.hdf5

By default in our provided config we use HDF5 as the data backend. You can convert each folder using the script to generate them.

It is worth noting that if you download the provided .hdf5 from here, you only need to convert each omni3d dataset to HDF5.

To be more specific:

cd data

python -m vis4d.data.io.to_hdf5 -p KITTI_object
python -m vis4d.data.io.to_hdf5 -p KITTI_object_depth # Only needed if you generate depth on your own

...

python -m vis4d.data.io.to_hdf5 -p hypersim
python -m vis4d.data.io.to_hdf5 -p hypersim_depth # Only needed if you generate depth on your own

Then you will have all datasets in .hdf5.

The other solution is to change the data_backend in the configs to FileBackend.

Model Zoo

Note that the score of Argoverse 2 and ScanNet is the proposed open detection score (ODS) and the score for Omni3D test set is AP.

Backbone	Config	Omni3D	Argoverse 2	ScanNet
Swin-T	config	28.4	22.4	30.2
Swin-B	config	30.0	23.8	31.5

For per-dataset results for Omni3D, please refer to the Table 3 of the paper.

Testing

# Swin-T
vis4d test --config opendet3d/zoo/gdino3d/gdino3d_swin_t_omni3d.py --gpus 1 --ckpt https://huggingface.co/RoyYang0714/3D-MOOD/resolve/main/gdino3d_swin-t_120e_omni3d_699f69.pt

# Swin-B 
vis4d test --config opendet3d/zoo/gdino3d/gdino3d_swin_b_omni3d.py --gpus 1 --ckpt https://huggingface.co/RoyYang0714/3D-MOOD/resolve/main/gdino3d_swin-b_120e_omni3d_834c97.pt

Training

We use batch size of 128 to train our models. The setting is assumed running on the cluster using RTX 4090.

# Swin-T
vis4d fit --config opendet3d/zoo/gdino3d/gdino3d_swin_t_omni3d.py --gpus 8 --nodes 4 --config.params.samples_per_gpu=4

# Swin-B 
vis4d fit --config opendet3d/zoo/gdino3d/gdino3d_swin_b_omni3d.py --gpus 8 --nodes 8

ScanNet200

We also provide the code to reproduce our ScanNet200 results in supplementray. Note that it will take longer time since we need to chunk the classes.

vis4d test --config opendet3d/zoo/gdino3d/gdino3d_swin_b_scannet200.py --gpus 1 --ckpt https://huggingface.co/RoyYang0714/3D-MOOD/resolve/main/gdino3d_swin-b_120e_omni3d_834c97.pt

Visualization

It will dump all the visualization results under vis4d-workspace/gdino3d_swin-b_omni3d/${VERSION}/vis/test/.

vis4d test --config opendet3d/zoo/gdino3d/gdino3d_swin_b_omni3d.py --gpus 1 --ckpt https://huggingface.co/RoyYang0714/3D-MOOD/resolve/main/gdino3d_swin-b_120e_omni3d_834c97.pt --vis --config.params.nms=True --config.params.score_threshold=0.1

Citation

If you find our work useful in your research please consider citing our publications:

@article{yang20253d,
  title={3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection},
  author={Yang, Yung-Hsu and Piccinelli, Luigi and Segu, Mattia and Li, Siyuan and Huang, Rui and Fu, Yuqian and Pollefeys, Marc and Blum, Hermann and Bauer, Zuria},
  journal={arXiv preprint arXiv:2507.23567},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.vscode		.vscode
assets		assets
docs		docs
opendet3d		opendet3d
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection

News and ToDo

Getting Started

Installation

Demo

Data Preparation

Model Zoo

Testing

Training

ScanNet200

Visualization

Citation

About

Uh oh!

Releases

Packages

Languages

License

cvg/3D-MOOD

Folders and files

Latest commit

History

Repository files navigation

3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection

News and ToDo

Getting Started

Installation

Demo

Data Preparation

Model Zoo

Testing

Training

ScanNet200

Visualization

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages