Skip to content

PhucNDA/OE-3DIS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Table of contents
  1. Installation guide
  2. Data Preparation
  3. Run the code
  4. Acknowledgments

Open-Ended 3D Point Cloud Instance Segmentation ICCV'25 - OpenSUN3D

by Phuc Nguyen, Minh Luu, Anh Tran, Cuong Pham and Khoi Nguyen

News ⚑

  • Jul 2025: We release the source code for OE-3DIS

  • Jul 2025: OE-3DIS is accepted to the ICCV 2025 - OpenSUN3D

  • Aug 2024: OE-3DIS released on arxiv

Abstract: Open-vocabulary 3D Instance Segmentation methods (OV-3DIS) have recently demonstrated their generalization ability to unseen objects. However, these methods still depend on predefined class names during inference, restricting agents' autonomy. To mitigate this constraint, we propose a novel problem termed Open-Ended 3D Instance Segmentation (OE-3DIS), which eliminates the necessity for predefined class names during testing. We present a comprehensive set of strong baselines inspired by OV-3DIS methodologies, utilizing 2D Multimodal Large Language Models. In addition, we introduce a novel token aggregation strategy that effectively fuses information from multiview images. To evaluate the performance of our OE-3DIS system, we benchmark both the proposed baselines and our method on two widely used indoor datasets: ScanNet200 and ScanNet++. Our approach achieves substantial performance gains over the baselines on both datasets. Notably, even without access to ground-truth object class names during inference, our method outperforms Open3DIS, the current state-of-the-art in OV-3DIS.

overview

Details of the model architecture and experimental results can be found in our paper:

@article{nguyen2024open,
  title={Open-ended 3d point cloud instance segmentation},
  author={Nguyen, Phuc DA and Luu, Minh and Tran, Anh and Pham, Cuong and Nguyen, Khoi},
  journal={arXiv preprint arXiv:2408.11747},
  year={2024}
}

Please CITE our paper whenever this repository is used to help produce published results or incorporated into other software.

Features πŸ“£

  • Optimized source code for 2D-3D VLM inference
  • Reproducibility code for ScanNet200, Scannet++ datasets!
  • 2D segmenter: Support SAM, Detic!

Installation guide πŸ”¨

Environment:

pip install torch==2.0.1 torchvision==0.15.2
pip install -r requirements.txt

VLM weight: weights

Save it under:

../weights/osm_final.pt

Data Preparation πŸ“‚

At this moment, dueing to the license of Scannet, we provide an example processed set of Scannet200 (1 scene) + Scannetpp (50 validation scenes) here: Scannet200, Scannetpp

Please, follow Scannet and Scannet++ license to use our preprocessed dataset.

For Scannet200, we construct data tree directory as follow and consider only for validation set:

data
β”œβ”€β”€ Scannet200
############## 2D root folder with default image sampling factor: 5 ##############
β”‚    β”œβ”€β”€ Scannet200_2D_5interval 
β”‚    β”‚    β”œβ”€β”€ val                                       <- validation set
β”‚    β”‚    |    β”œβ”€β”€ scene0011_00
β”‚    β”‚    |    β”‚    β”œβ”€β”€color                            <- folder with image RGB
β”‚    β”‚    β”‚    β”‚    β”‚    00000.jpg
β”‚    β”‚    β”‚    β”‚    β”‚    ...
β”‚    β”‚    |    β”‚    β”œβ”€β”€depth                            <- folder with image depth
β”‚    β”‚    β”‚    β”‚    β”‚    00000.png
β”‚    β”‚    β”‚    β”‚    β”‚    ...
β”‚    β”‚    |    β”‚    β”œβ”€β”€pose                             <- folder with camera poses
β”‚    β”‚    β”‚    β”‚    β”‚    00000.txt
β”‚    β”‚    β”‚    β”‚    β”‚    ...
β”‚    β”‚    |    |    intrinsic.txt (image intrinsic)
β”‚    β”‚    |    ....
β”‚    β”‚    |    intrinsic_depth.txt (depth intrinsic)    <- Scannet intrinsic ~ depth img
β”‚    β”‚    β”œβ”€β”€ train
β”‚    β”‚    β”œβ”€β”€ test 
############## 3D root folder with point cloud and annotation ##############
|    β”œβ”€β”€ Scannet200_3D
β”‚    β”‚    β”œβ”€β”€ val                                       <- validation set
β”‚    β”‚    β”‚    β”œβ”€β”€ original_ply_files                   <- the .ply point cloud file from Scannet raw data.
β”‚    β”‚    β”‚    β”‚     scene0011_00.ply
β”‚    β”‚    β”‚    β”‚     ...
|    β”‚    β”‚    β”œβ”€β”€ groundtruth                          <- normalized point cloud, color from PLY + ann (for 3D backbone)
|    β”‚    β”‚    β”‚     scene0011_00.pth           
|    β”‚    β”‚    β”‚     ...
|    β”‚    β”‚    β”œβ”€β”€ superpoints                          <- superpoints directory
|    β”‚    β”‚    β”‚     scene0011_00.pth
|    β”‚    β”‚    β”‚     ...
|    β”‚    β”‚    β”œβ”€β”€ isbnet_clsagnostic_scannet200        <- class agnostic 3D proposals
|    β”‚    β”‚    β”‚     scene0011_00.pth
|    β”‚    β”‚    β”‚     ...
|    β”‚    β”‚    β”œβ”€β”€ dc_feat_scannet200                   <- 3D deep feature of 3D proposals network
|    β”‚    β”‚    β”‚     scene0011_00.pth
|    β”‚    β”‚    β”‚     ...
β”‚    β”‚    β”œβ”€β”€ train
β”‚    β”‚    β”œβ”€β”€ test 
####################################################################################

1) Generating RGB-D images, camera poses, original PLY, superpoints and inst_nostuff files

For Scannetpp, we construct data tree directory as follow and consider only for validation set:

data
β”œβ”€β”€ Scannetpp
############## 2D root folder with default image sampling factor: 5 ##############
β”‚    β”œβ”€β”€ Scannetpp_2D_5interval 
β”‚    β”‚    β”œβ”€β”€ val                                            <- validation set
β”‚    β”‚    |    β”œβ”€β”€ 0d2ee665be
β”‚    β”‚    |    β”‚    β”œβ”€β”€color                                 <- folder with image RGB
β”‚    β”‚    β”‚    β”‚    β”‚    00000.jpg
β”‚    β”‚    β”‚    β”‚    β”‚    ...
β”‚    β”‚    |    β”‚    β”œβ”€β”€depth                                 <- folder with image depth
β”‚    β”‚    β”‚    β”‚    β”‚    00000.png
β”‚    β”‚    β”‚    β”‚    β”‚    ...
β”‚    β”‚    |    β”‚    β”œβ”€β”€pose                                  <- folder with camera poses
β”‚    β”‚    β”‚    β”‚    β”‚    00000.txt
β”‚    β”‚    β”‚    β”‚    β”‚    ...
β”‚    β”‚    |    β”‚    β”œβ”€β”€intrinsic                             <- folder with intrinsic (In Scannet200, intrinsic same across all views)
β”‚    β”‚    β”‚    β”‚    β”‚    00000.txt
β”‚    β”‚    β”‚    β”‚    β”‚    ...
β”‚    β”‚    |    |    intrinsic.txt (image intrinsic)
β”‚    β”‚    |    ....
β”‚    β”‚    |    intrinsic_depth.txt (depth intrinsic)         <- Scannet intrinsic ~ depth img
β”‚    β”‚    β”œβ”€β”€ train
β”‚    β”‚    β”œβ”€β”€ test 
############## 3D root folder with point cloud and annotation ##############
|    β”œβ”€β”€ Scannetpp_3D
β”‚    β”‚    β”œβ”€β”€ val                                            
β”‚    β”‚    β”‚    β”œβ”€β”€ original_ply_files                       <- the .ply point cloud file from Scannet raw data.
β”‚    β”‚    β”‚    β”‚     0d2ee665be.ply
β”‚    β”‚    β”‚    β”‚     ...
|    β”‚    β”‚    β”œβ”€β”€ groundtruth                              <- point cloud, color from PLY + annotation
|    β”‚    β”‚    β”‚     0d2ee665be.pth           
|    β”‚    β”‚    β”‚     ...
|    β”‚    β”‚    β”œβ”€β”€ superpoints                              <- superpoints directory
|    β”‚    β”‚    β”‚     0d2ee665be.pth
|    β”‚    β”‚    β”‚     ...
|    β”‚    β”‚    β”œβ”€β”€ isbnet_clsagnostic_scannet200            <- class agnostic 3D proposals
|    β”‚    β”‚    β”‚     0d2ee665be.pth
|    β”‚    β”‚    β”‚     ...
|    β”‚    β”‚    β”œβ”€β”€ dc_feat_scannetpp                        <- 3D deep feature of 3D proposals network
|    β”‚    β”‚    β”‚     0d2ee665be.pth
|    β”‚    β”‚    β”‚     ...
β”‚    β”‚    β”œβ”€β”€ train
β”‚    β”‚    β”œβ”€β”€ test 
####################################################################################

Run the code πŸƒ

*NOTE: The transformers version might affect the final results

Install InstructBLIP weight:

pip install --upgrade transformers
python3 from transformers import InstructBlipProcessor, InstructBlipForConditionalGeneration

1) Top-1 Score Mask Open-Ended 3D Instance Segmentation

python run/freevocab_1n_average.py

2) Maskwise Open-Ended 3D Instance Segmentation

python run/freevocab_1n_average_multiview.py

3) Pointwise Open-Ended 3D Instance Segmentation

python run/freevocab_1n_pcfeature.py

4) Evaluation protocol: We design an approach that uses Hungarian matching algorithm to match the predicted proposals with the corresponding groundtruths via language model to assess the performance of OE-3DIS

python evaluation/eval3d.py

Acknowledgments

This repo is built upon Open3DIS, OSM

Contacts

If you have any questions or suggestions about this repo, please feel free to contact me (phucnda@gmail.com).

About

Open-Ended 3D Point Cloud Instance Segmentation (ICCV'25)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages