Open-Ended 3D Point Cloud Instance Segmentation ICCV'25 - OpenSUN3D

by Phuc Nguyen, Minh Luu, Anh Tran, Cuong Pham and Khoi Nguyen

Project Page

News ⚡

Jul 2025: We release the source code for OE-3DIS
Jul 2025: OE-3DIS is accepted to the ICCV 2025 - OpenSUN3D
Aug 2024: OE-3DIS released on arxiv

Abstract: Open-vocabulary 3D Instance Segmentation methods (OV-3DIS) have recently demonstrated their generalization ability to unseen objects. However, these methods still depend on predefined class names during inference, restricting agents' autonomy. To mitigate this constraint, we propose a novel problem termed Open-Ended 3D Instance Segmentation (OE-3DIS), which eliminates the necessity for predefined class names during testing. We present a comprehensive set of strong baselines inspired by OV-3DIS methodologies, utilizing 2D Multimodal Large Language Models. In addition, we introduce a novel token aggregation strategy that effectively fuses information from multiview images. To evaluate the performance of our OE-3DIS system, we benchmark both the proposed baselines and our method on two widely used indoor datasets: ScanNet200 and ScanNet++. Our approach achieves substantial performance gains over the baselines on both datasets. Notably, even without access to ground-truth object class names during inference, our method outperforms Open3DIS, the current state-of-the-art in OV-3DIS.

Details of the model architecture and experimental results can be found in our paper:

@article{nguyen2024open,
  title={Open-ended 3d point cloud instance segmentation},
  author={Nguyen, Phuc DA and Luu, Minh and Tran, Anh and Pham, Cuong and Nguyen, Khoi},
  journal={arXiv preprint arXiv:2408.11747},
  year={2024}
}

Please CITE our paper whenever this repository is used to help produce published results or incorporated into other software.

Features 📣

Optimized source code for 2D-3D VLM inference
Reproducibility code for ScanNet200, Scannet++ datasets!
2D segmenter: Support SAM, Detic!

Installation guide 🔨

Environment:

pip install torch==2.0.1 torchvision==0.15.2
pip install -r requirements.txt

VLM weight: weights

Save it under:

../weights/osm_final.pt

Data Preparation 📂

At this moment, dueing to the license of Scannet, we provide an example processed set of Scannet200 (1 scene) + Scannetpp (50 validation scenes) here: Scannet200, Scannetpp

Please, follow Scannet and Scannet++ license to use our preprocessed dataset.

For Scannet200, we construct data tree directory as follow and consider only for validation set:

data
├── Scannet200
############## 2D root folder with default image sampling factor: 5 ##############
│    ├── Scannet200_2D_5interval 
│    │    ├── val                                       <- validation set
│    │    |    ├── scene0011_00
│    │    |    │    ├──color                            <- folder with image RGB
│    │    │    │    │    00000.jpg
│    │    │    │    │    ...
│    │    |    │    ├──depth                            <- folder with image depth
│    │    │    │    │    00000.png
│    │    │    │    │    ...
│    │    |    │    ├──pose                             <- folder with camera poses
│    │    │    │    │    00000.txt
│    │    │    │    │    ...
│    │    |    |    intrinsic.txt (image intrinsic)
│    │    |    ....
│    │    |    intrinsic_depth.txt (depth intrinsic)    <- Scannet intrinsic ~ depth img
│    │    ├── train
│    │    ├── test 
############## 3D root folder with point cloud and annotation ##############
|    ├── Scannet200_3D
│    │    ├── val                                       <- validation set
│    │    │    ├── original_ply_files                   <- the .ply point cloud file from Scannet raw data.
│    │    │    │     scene0011_00.ply
│    │    │    │     ...
|    │    │    ├── groundtruth                          <- normalized point cloud, color from PLY + ann (for 3D backbone)
|    │    │    │     scene0011_00.pth           
|    │    │    │     ...
|    │    │    ├── superpoints                          <- superpoints directory
|    │    │    │     scene0011_00.pth
|    │    │    │     ...
|    │    │    ├── isbnet_clsagnostic_scannet200        <- class agnostic 3D proposals
|    │    │    │     scene0011_00.pth
|    │    │    │     ...
|    │    │    ├── dc_feat_scannet200                   <- 3D deep feature of 3D proposals network
|    │    │    │     scene0011_00.pth
|    │    │    │     ...
│    │    ├── train
│    │    ├── test 
####################################################################################

1) Generating RGB-D images, camera poses, original PLY, superpoints and inst_nostuff files

Download the ScannetV2 dataset
Please refer to RGB-D images, camera poses and original PLY
Please refer to Superpoints and inst_nostuff

For Scannetpp, we construct data tree directory as follow and consider only for validation set:

data
├── Scannetpp
############## 2D root folder with default image sampling factor: 5 ##############
│    ├── Scannetpp_2D_5interval 
│    │    ├── val                                            <- validation set
│    │    |    ├── 0d2ee665be
│    │    |    │    ├──color                                 <- folder with image RGB
│    │    │    │    │    00000.jpg
│    │    │    │    │    ...
│    │    |    │    ├──depth                                 <- folder with image depth
│    │    │    │    │    00000.png
│    │    │    │    │    ...
│    │    |    │    ├──pose                                  <- folder with camera poses
│    │    │    │    │    00000.txt
│    │    │    │    │    ...
│    │    |    │    ├──intrinsic                             <- folder with intrinsic (In Scannet200, intrinsic same across all views)
│    │    │    │    │    00000.txt
│    │    │    │    │    ...
│    │    |    |    intrinsic.txt (image intrinsic)
│    │    |    ....
│    │    |    intrinsic_depth.txt (depth intrinsic)         <- Scannet intrinsic ~ depth img
│    │    ├── train
│    │    ├── test 
############## 3D root folder with point cloud and annotation ##############
|    ├── Scannetpp_3D
│    │    ├── val                                            
│    │    │    ├── original_ply_files                       <- the .ply point cloud file from Scannet raw data.
│    │    │    │     0d2ee665be.ply
│    │    │    │     ...
|    │    │    ├── groundtruth                              <- point cloud, color from PLY + annotation
|    │    │    │     0d2ee665be.pth           
|    │    │    │     ...
|    │    │    ├── superpoints                              <- superpoints directory
|    │    │    │     0d2ee665be.pth
|    │    │    │     ...
|    │    │    ├── isbnet_clsagnostic_scannet200            <- class agnostic 3D proposals
|    │    │    │     0d2ee665be.pth
|    │    │    │     ...
|    │    │    ├── dc_feat_scannetpp                        <- 3D deep feature of 3D proposals network
|    │    │    │     0d2ee665be.pth
|    │    │    │     ...
│    │    ├── train
│    │    ├── test 
####################################################################################

Run the code 🏃

*NOTE: The transformers version might affect the final results

Install InstructBLIP weight:

pip install --upgrade transformers
python3 from transformers import InstructBlipProcessor, InstructBlipForConditionalGeneration

1) Top-1 Score Mask Open-Ended 3D Instance Segmentation

python run/freevocab_1n_average.py

2) Maskwise Open-Ended 3D Instance Segmentation

python run/freevocab_1n_average_multiview.py

3) Pointwise Open-Ended 3D Instance Segmentation

python run/freevocab_1n_pcfeature.py

4) Evaluation protocol: We design an approach that uses Hungarian matching algorithm to match the predicted proposals with the corresponding groundtruths via language model to assess the performance of OE-3DIS

python evaluation/eval3d.py

Acknowledgments

This repo is built upon Open3DIS, OSM

Contacts

If you have any questions or suggestions about this repo, please feel free to contact me (phucnda@gmail.com).

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
dataset3d		dataset3d
docs		docs
evaluation		evaluation
loader3d		loader3d
mask_generator		mask_generator
modeling		modeling
run		run
test		test
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt
result.txt		result.txt
result1.txt		result1.txt
setup.py		setup.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of contents

Open-Ended 3D Point Cloud Instance Segmentation ICCV'25 - OpenSUN3D

Project Page

News ⚡

Features 📣

Installation guide 🔨

Data Preparation 📂

Run the code 🏃

Acknowledgments

Contacts

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Table of contents

Open-Ended 3D Point Cloud Instance Segmentation ICCV'25 - OpenSUN3D

Project Page

News ⚡

Features 📣

Installation guide 🔨

Data Preparation 📂

Run the code 🏃

Acknowledgments

Contacts

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages