Open-Ended 3D Point Cloud Instance Segmentation ICCV'25 - OpenSUN3D
by Phuc Nguyen, Minh Luu, Anh Tran, Cuong Pham and Khoi Nguyen
-
Jul 2025: We release the source code for OE-3DIS
-
Jul 2025: OE-3DIS is accepted to the ICCV 2025 - OpenSUN3D
-
Aug 2024: OE-3DIS released on arxiv
Abstract: Open-vocabulary 3D Instance Segmentation methods (OV-3DIS) have recently demonstrated their generalization ability to unseen objects. However, these methods still depend on predefined class names during inference, restricting agents' autonomy. To mitigate this constraint, we propose a novel problem termed Open-Ended 3D Instance Segmentation (OE-3DIS), which eliminates the necessity for predefined class names during testing. We present a comprehensive set of strong baselines inspired by OV-3DIS methodologies, utilizing 2D Multimodal Large Language Models. In addition, we introduce a novel token aggregation strategy that effectively fuses information from multiview images. To evaluate the performance of our OE-3DIS system, we benchmark both the proposed baselines and our method on two widely used indoor datasets: ScanNet200 and ScanNet++. Our approach achieves substantial performance gains over the baselines on both datasets. Notably, even without access to ground-truth object class names during inference, our method outperforms Open3DIS, the current state-of-the-art in OV-3DIS.
Details of the model architecture and experimental results can be found in our paper:
@article{nguyen2024open,
title={Open-ended 3d point cloud instance segmentation},
author={Nguyen, Phuc DA and Luu, Minh and Tran, Anh and Pham, Cuong and Nguyen, Khoi},
journal={arXiv preprint arXiv:2408.11747},
year={2024}
}
Please CITE our paper whenever this repository is used to help produce published results or incorporated into other software.
- Optimized source code for 2D-3D VLM inference
- Reproducibility code for ScanNet200, Scannet++ datasets!
- 2D segmenter: Support SAM, Detic!
Environment:
pip install torch==2.0.1 torchvision==0.15.2
pip install -r requirements.txt
VLM weight: weights
Save it under:
../weights/osm_final.pt
At this moment, dueing to the license of Scannet, we provide an example processed set of Scannet200 (1 scene) + Scannetpp (50 validation scenes) here: Scannet200, Scannetpp
Please, follow Scannet and Scannet++ license to use our preprocessed dataset.
For Scannet200, we construct data tree directory as follow and consider only for validation set:
data
βββ Scannet200
############## 2D root folder with default image sampling factor: 5 ##############
β βββ Scannet200_2D_5interval
β β βββ val <- validation set
β β | βββ scene0011_00
β β | β βββcolor <- folder with image RGB
β β β β β 00000.jpg
β β β β β ...
β β | β βββdepth <- folder with image depth
β β β β β 00000.png
β β β β β ...
β β | β βββpose <- folder with camera poses
β β β β β 00000.txt
β β β β β ...
β β | | intrinsic.txt (image intrinsic)
β β | ....
β β | intrinsic_depth.txt (depth intrinsic) <- Scannet intrinsic ~ depth img
β β βββ train
β β βββ test
############## 3D root folder with point cloud and annotation ##############
| βββ Scannet200_3D
β β βββ val <- validation set
β β β βββ original_ply_files <- the .ply point cloud file from Scannet raw data.
β β β β scene0011_00.ply
β β β β ...
| β β βββ groundtruth <- normalized point cloud, color from PLY + ann (for 3D backbone)
| β β β scene0011_00.pth
| β β β ...
| β β βββ superpoints <- superpoints directory
| β β β scene0011_00.pth
| β β β ...
| β β βββ isbnet_clsagnostic_scannet200 <- class agnostic 3D proposals
| β β β scene0011_00.pth
| β β β ...
| β β βββ dc_feat_scannet200 <- 3D deep feature of 3D proposals network
| β β β scene0011_00.pth
| β β β ...
β β βββ train
β β βββ test
####################################################################################
1) Generating RGB-D images, camera poses, original PLY, superpoints and inst_nostuff files
-
Download the ScannetV2 dataset
-
Please refer to RGB-D images, camera poses and original PLY
-
Please refer to Superpoints and inst_nostuff
For Scannetpp, we construct data tree directory as follow and consider only for validation set:
data
βββ Scannetpp
############## 2D root folder with default image sampling factor: 5 ##############
β βββ Scannetpp_2D_5interval
β β βββ val <- validation set
β β | βββ 0d2ee665be
β β | β βββcolor <- folder with image RGB
β β β β β 00000.jpg
β β β β β ...
β β | β βββdepth <- folder with image depth
β β β β β 00000.png
β β β β β ...
β β | β βββpose <- folder with camera poses
β β β β β 00000.txt
β β β β β ...
β β | β βββintrinsic <- folder with intrinsic (In Scannet200, intrinsic same across all views)
β β β β β 00000.txt
β β β β β ...
β β | | intrinsic.txt (image intrinsic)
β β | ....
β β | intrinsic_depth.txt (depth intrinsic) <- Scannet intrinsic ~ depth img
β β βββ train
β β βββ test
############## 3D root folder with point cloud and annotation ##############
| βββ Scannetpp_3D
β β βββ val
β β β βββ original_ply_files <- the .ply point cloud file from Scannet raw data.
β β β β 0d2ee665be.ply
β β β β ...
| β β βββ groundtruth <- point cloud, color from PLY + annotation
| β β β 0d2ee665be.pth
| β β β ...
| β β βββ superpoints <- superpoints directory
| β β β 0d2ee665be.pth
| β β β ...
| β β βββ isbnet_clsagnostic_scannet200 <- class agnostic 3D proposals
| β β β 0d2ee665be.pth
| β β β ...
| β β βββ dc_feat_scannetpp <- 3D deep feature of 3D proposals network
| β β β 0d2ee665be.pth
| β β β ...
β β βββ train
β β βββ test
####################################################################################
*NOTE: The transformers version might affect the final results
Install InstructBLIP weight:
pip install --upgrade transformers
python3 from transformers import InstructBlipProcessor, InstructBlipForConditionalGeneration
1) Top-1 Score Mask Open-Ended 3D Instance Segmentation
python run/freevocab_1n_average.py
2) Maskwise Open-Ended 3D Instance Segmentation
python run/freevocab_1n_average_multiview.py
3) Pointwise Open-Ended 3D Instance Segmentation
python run/freevocab_1n_pcfeature.py
4) Evaluation protocol: We design an approach that uses Hungarian matching algorithm to match the predicted proposals with the corresponding groundtruths via language model to assess the performance of OE-3DIS
python evaluation/eval3d.py
This repo is built upon Open3DIS, OSM
If you have any questions or suggestions about this repo, please feel free to contact me (phucnda@gmail.com).