A simple object retrieval tool designed as a component for InteractVLM and PICO. This tool enables efficient object lookup and retrieval from a single image using both the large Objaverse database and local 3D object collections.
Key Features:
- Precomputed Objaverse embeddings for fast retrieval from 800K+ objects
- Label-based filtering using semantic similarity to improve accuracy for occluded/complex scenes
- Local object collection support for custom 3D mesh databases
- Automatic object download and conversion from Objaverse
This tool is built using OpenShape (NeurIPS 2023).
conda create -n object_lookup python=3.9
conda activate object_lookup
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
pip install huggingface_hub transformers trimesh open3d tqdm einopsIf you plan to use ShapeNet-trained models, install MinkowskiEngine:
pip install -U git+https://github.com/NVIDIA/MinkowskiEngine
conda install -c dglteam/label/cu113 dglpip install -r requirements.txt Note: For label-based object filtering functionality, the system automatically downloads FastText Word2Vec embeddings (fasttext-wiki-news-subwords-300) from Gensim on first use. This requires an internet connection and may take a few minutes for the initial download.
Retrieve objects from the Objaverse database:
# Download top results from entire Objaverse database (800K Objects)
python obj_retrieval.py --use_precomputed \
--img_folder ./assets/imgs \
--download_objects \
--output_dir ./retrieved_objects
# Download top results from Objaverse-LVIS subset (55K Objects)
# Objaverse-LVIS is a manually annotated subset of Objaverse. However, some samples have multiple other objects or scenes.
python obj_retrieval.py --use_precomputed \
--use_label_filtering \
--img_folder ./assets/imgs \
--download_objects \
--output_dir ./retrieved_objectsFor improved accuracy, especially when dealing with occluded objects or complex scenes, you can use object category labels in your image filenames. When an object label is detected in the filename and the flag --use_label_filtering is used, the system automatically filters the search to semantically similar categories using Word2Vec embeddings.
Note: When the image contains only the target object with minimal background, running the lookup with the entire Objaverse database typically yields good results without requiring label filtering.
Filename Convention: Use double underscores (__) to separate the object category from other parts of the filename:
Image Quality Recommendations:
- Cropped and segmented images of the target object are strongly encouraged for best results
- Full scene images will work but may be less accurate even with label-based filtering since Objaverse has very noisy samples.
Retrieve similar objects from your local mesh collection:
# Basic usage with local objects
python obj_retrieval.py --obj_folder ./assets/objs \
--img_folder ./assets/imgs \
--top_k 5If you use this utility in your research, please cite the original OpenShape paper and the relevant work(s):
@misc{liu2023openshape,
title={OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding},
author={Minghua Liu and Ruoxi Shi and Kaiming Kuang and Yinhao Zhu and Xuanlin Li and Shizhong Han and Hong Cai and Fatih Porikli and Hao Su},
year={2023},
eprint={2305.10764},
archivePrefix={arXiv},
primaryClass={cs.CV}
}@inproceedings{dwivedi_interactvlm_2025,
title = {{InteractVLM}: {3D} Interaction Reasoning from {2D} Foundational Models},
author = {Dwivedi, Sai Kumar and Antić, Dimitrije and Tripathi, Shashank and Taheri, Omid and Schmid, Cordelia and Black, Michael J. and Tzionas, Dimitrios},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2025},
}@inproceedings{cseke_tripathi_2025_pico,
title = {{PICO}: Reconstructing {3D} People In Contact with Objects},
author = {Cseke, Alp\'{a}r and Tripathi, Shashank and Dwivedi, Sai Kumar and Lakshmipathy, Arjun and Chatterjee, Agniv and Black, Michael J. and Tzionas, Dimitrios},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2025},
pages = {1783--1794},
}This project builds upon OpenShape and is intended for research and educational purposes. Please refer to the original OpenShape repository for licensing details.