Skip to content

jxs0123/TR-GDRN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TR-GDRN

DOI

Official implementation and reproducibility material for the manuscript:

Robust Monocular 6D Object Pose Estimation with ConvNeXt and Hybrid Attention for Occlusion-Prone Scenes

The manuscript is submitted to The Visual Computer. This repository is directly related to that submission and is intended to help readers reproduce the reported experiments on LINEMOD, LM-O, and YCB-V. If you use this code or the released checkpoints, please cite the manuscript listed in the Citation section.

Public repository: https://github.com/jxs0123/TR-GDRN

Zenodo archive: https://doi.org/10.5281/zenodo.20003076

Overview

TR-GDRN is a transformer-enhanced geometry-guided direct regression network for monocular RGB-based 6D object pose estimation. It follows the GDR-Net pipeline and improves the feature extraction and pose regression stages with:

  • a ConvNeXt-Tiny backbone for stronger structural representation and stable small-batch training;
  • a lightweight TransformerBlock2D module for long-range spatial dependency modeling;
  • a Hybrid Transformer Decoder (HTD) for multi-scale memory fusion before geometry-aware pose regression.

Requirements

The codebase is intended for Linux-based CUDA environments.

  • Ubuntu 22.04
  • CUDA 11.3 or 11.8
  • Python >= 3.6
  • PyTorch >= 1.6 and torchvision
  • detectron2 installed from source

Install Python dependencies:

pip install -r requirements.txt

Install detectron2 from source following the official instructions:

git clone https://github.com/facebookresearch/detectron2.git
python -m pip install -e detectron2

Install additional project dependencies and compile the C++ extension for farthest point sampling:

sh scripts/install_deps.sh
sh core/csrc/compile.sh

Dataset Preparation

Download the 6D pose datasets from the BOP website:

  • LINEMOD (LM)
  • Occlusion LINEMOD (LM-O)
  • YCB-Video (YCB-V)

Download VOC 2012 for background images used in augmentation.

The repository also expects the image_sets and test_bboxes files used by the configs. These files can be downloaded from:

The dataset directory should follow this layout:

datasets/
|-- BOP_DATASETS/
|   |-- lm/
|   |-- lmo/
|   `-- ycbv/
|-- lm_imgn/              # OpenGL rendered LM images, 1k images per object
|-- lm_renders_blender/   # Blender rendered LM images, 10k images per object
`-- VOCdevkit/

Recommended setup uses symbolic links:

ln -sf /path/to/BOP_DATASETS datasets/BOP_DATASETS
ln -sf /path/to/VOCdevkit datasets/VOCdevkit

Additional LM rendering resources:

Training

General command:

./core/gdrn_modeling/train_gdrn.sh <config_path> <gpu_ids> [other args]

Train on LINEMOD:

./core/gdrn_modeling/train_gdrn.sh configs/gdrn/lm/a6_cPnP_lm13.py 0

Train on LM-O:

./core/gdrn_modeling/train_gdrn.sh configs/gdrn/lmo/a6_cPnP_AugAAETrunc_BG0.5_lmo_real_pbr0.1_40e.py 0

Train on YCB-V:

./core/gdrn_modeling/train_gdrn.sh configs/gdrn/ycbv/a6_cPnP_AugAAETrunc_BG0.5_Rsym_ycbv_real_pbr_visib20_10e.py 0

For multi-GPU training, pass comma-separated GPU ids, for example 0,1,2,3. Add --resume to continue an interrupted experiment.

Evaluation

General command:

./core/gdrn_modeling/test_gdrn.sh <config_path> <gpu_ids> <ckpt_path> [other args]

The manuscript reports ADD(-S) for LINEMOD and LM-O, and AUC of ADD-S / ADD(-S) for YCB-V.

Evaluate LINEMOD:

./core/gdrn_modeling/test_gdrn.sh configs/gdrn/lm/a6_cPnP_lm13.py 0 checkpoints/LM_model_final.pth

Evaluate LM-O:

./core/gdrn_modeling/test_gdrn.sh configs/gdrn/lmo/a6_cPnP_AugAAETrunc_BG0.5_lmo_real_pbr0.1_40e.py 0 checkpoints/LM-O_model_final.pth

Evaluate YCB-V:

./core/gdrn_modeling/test_gdrn.sh configs/gdrn/ycbv/a6_cPnP_AugAAETrunc_BG0.5_Rsym_ycbv_real_pbr_visib20_10e.py 0 checkpoints/YCB-V_model_final.pth

Outputs are written under the output/gdrn/ directory specified by each config.

Checkpoints

The reproducibility release is archived on Zenodo:

https://doi.org/10.5281/zenodo.20003076

The archive contains the source code and the following trained checkpoint files:

checkpoints/
|-- LM_model_final.pth
|-- LM-O_model_final.pth
`-- YCB-V_model_final.pth

Place the downloaded checkpoint files under a checkpoints/ directory at the repository root before running the evaluation commands above. If the release artifacts are unavailable, the same checkpoints can be regenerated with the training commands in this README.

Key Implementation Files

  • core/gdrn_modeling/models/convnext_backbone.py: ConvNeXt-Tiny backbone construction and feature extraction.
  • core/gdrn_modeling/models/GDRN.py: integration of the TR-GDRN model components and the TransformerBlock2D feature enhancement.
  • core/gdrn_modeling/models/cdpn_rot_head_region.py: geometry-aware rotation head and Hybrid Transformer Decoder for multi-scale memory fusion.

The main experiment configs are:

  • configs/gdrn/lm/a6_cPnP_lm13.py
  • configs/gdrn/lmo/a6_cPnP_AugAAETrunc_BG0.5_lmo_real_pbr0.1_40e.py
  • configs/gdrn/ycbv/a6_cPnP_AugAAETrunc_BG0.5_Rsym_ycbv_real_pbr_visib20_10e.py

DOI and Archival

The public GitHub repository is the primary code location for resubmission:

https://github.com/jxs0123/TR-GDRN

The archived Zenodo record is available at:

https://doi.org/10.5281/zenodo.20003076

Citation

If this repository is useful for your research, please cite the related manuscript:

@article{ji2026trgdrn,
  title   = {Robust Monocular 6D Object Pose Estimation with ConvNeXt and Hybrid Attention for Occlusion-Prone Scenes},
  author  = {Ji, Xiaosheng and Xu, Zhen and Zhang, Chunyan and Chen, Yibo},
  journal = {The Visual Computer},
  year    = {2026},
  note    = {Manuscript submitted}
}

License

This project is released under the Apache License 2.0. See LICENSE for details.

Acknowledgements

This codebase builds on the GDR-Net-style 6D pose estimation pipeline and uses public benchmark datasets from the BOP ecosystem. We thank the authors of the related open-source projects and datasets.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages