TR-GDRN

Official implementation and reproducibility material for the manuscript:

Robust Monocular 6D Object Pose Estimation with ConvNeXt and Hybrid Attention for Occlusion-Prone Scenes

The manuscript is submitted to The Visual Computer. This repository is directly related to that submission and is intended to help readers reproduce the reported experiments on LINEMOD, LM-O, and YCB-V. If you use this code or the released checkpoints, please cite the manuscript listed in the Citation section.

Public repository: https://github.com/jxs0123/TR-GDRN

Zenodo archive: https://doi.org/10.5281/zenodo.20003076

Overview

TR-GDRN is a transformer-enhanced geometry-guided direct regression network for monocular RGB-based 6D object pose estimation. It follows the GDR-Net pipeline and improves the feature extraction and pose regression stages with:

a ConvNeXt-Tiny backbone for stronger structural representation and stable small-batch training;
a lightweight TransformerBlock2D module for long-range spatial dependency modeling;
a Hybrid Transformer Decoder (HTD) for multi-scale memory fusion before geometry-aware pose regression.

Requirements

The codebase is intended for Linux-based CUDA environments.

Ubuntu 22.04
CUDA 11.3 or 11.8
Python >= 3.6
PyTorch >= 1.6 and torchvision
detectron2 installed from source

Install Python dependencies:

pip install -r requirements.txt

Install detectron2 from source following the official instructions:

git clone https://github.com/facebookresearch/detectron2.git
python -m pip install -e detectron2

Install additional project dependencies and compile the C++ extension for farthest point sampling:

sh scripts/install_deps.sh
sh core/csrc/compile.sh

Dataset Preparation

Download the 6D pose datasets from the BOP website:

LINEMOD (LM)
Occlusion LINEMOD (LM-O)
YCB-Video (YCB-V)

Download VOC 2012 for background images used in augmentation.

The repository also expects the image_sets and test_bboxes files used by the configs. These files can be downloaded from:

BaiduNetDisk, password: qjfk
Cloud.THU, password: fMNOASFHW0E8R72357T6mn9

The dataset directory should follow this layout:

datasets/
|-- BOP_DATASETS/
|   |-- lm/
|   |-- lmo/
|   `-- ycbv/
|-- lm_imgn/              # OpenGL rendered LM images, 1k images per object
|-- lm_renders_blender/   # Blender rendered LM images, 10k images per object
`-- VOCdevkit/

Training

General command:

./core/gdrn_modeling/train_gdrn.sh <config_path> <gpu_ids> [other args]

Train on LINEMOD:

./core/gdrn_modeling/train_gdrn.sh configs/gdrn/lm/a6_cPnP_lm13.py 0

Train on LM-O:

./core/gdrn_modeling/train_gdrn.sh configs/gdrn/lmo/a6_cPnP_AugAAETrunc_BG0.5_lmo_real_pbr0.1_40e.py 0

Train on YCB-V:

./core/gdrn_modeling/train_gdrn.sh configs/gdrn/ycbv/a6_cPnP_AugAAETrunc_BG0.5_Rsym_ycbv_real_pbr_visib20_10e.py 0

For multi-GPU training, pass comma-separated GPU ids, for example 0,1,2,3. Add --resume to continue an interrupted experiment.

Evaluation

General command:

./core/gdrn_modeling/test_gdrn.sh <config_path> <gpu_ids> <ckpt_path> [other args]

The manuscript reports ADD(-S) for LINEMOD and LM-O, and AUC of ADD-S / ADD(-S) for YCB-V.

Evaluate LINEMOD:

./core/gdrn_modeling/test_gdrn.sh configs/gdrn/lm/a6_cPnP_lm13.py 0 checkpoints/LM_model_final.pth

Evaluate LM-O:

./core/gdrn_modeling/test_gdrn.sh configs/gdrn/lmo/a6_cPnP_AugAAETrunc_BG0.5_lmo_real_pbr0.1_40e.py 0 checkpoints/LM-O_model_final.pth

Evaluate YCB-V:

./core/gdrn_modeling/test_gdrn.sh configs/gdrn/ycbv/a6_cPnP_AugAAETrunc_BG0.5_Rsym_ycbv_real_pbr_visib20_10e.py 0 checkpoints/YCB-V_model_final.pth

Outputs are written under the output/gdrn/ directory specified by each config.

Checkpoints

The reproducibility release is archived on Zenodo:

https://doi.org/10.5281/zenodo.20003076

The archive contains the source code and the following trained checkpoint files:

checkpoints/
|-- LM_model_final.pth
|-- LM-O_model_final.pth
`-- YCB-V_model_final.pth

Place the downloaded checkpoint files under a checkpoints/ directory at the repository root before running the evaluation commands above. If the release artifacts are unavailable, the same checkpoints can be regenerated with the training commands in this README.

Key Implementation Files

core/gdrn_modeling/models/convnext_backbone.py: ConvNeXt-Tiny backbone construction and feature extraction.
core/gdrn_modeling/models/GDRN.py: integration of the TR-GDRN model components and the TransformerBlock2D feature enhancement.
core/gdrn_modeling/models/cdpn_rot_head_region.py: geometry-aware rotation head and Hybrid Transformer Decoder for multi-scale memory fusion.

The main experiment configs are:

configs/gdrn/lm/a6_cPnP_lm13.py
configs/gdrn/lmo/a6_cPnP_AugAAETrunc_BG0.5_lmo_real_pbr0.1_40e.py
configs/gdrn/ycbv/a6_cPnP_AugAAETrunc_BG0.5_Rsym_ycbv_real_pbr_visib20_10e.py

DOI and Archival

The public GitHub repository is the primary code location for resubmission:

https://github.com/jxs0123/TR-GDRN

The archived Zenodo record is available at:

https://doi.org/10.5281/zenodo.20003076

Citation

If this repository is useful for your research, please cite the related manuscript:

@article{ji2026trgdrn,
  title   = {Robust Monocular 6D Object Pose Estimation with ConvNeXt and Hybrid Attention for Occlusion-Prone Scenes},
  author  = {Ji, Xiaosheng and Xu, Zhen and Zhang, Chunyan and Chen, Yibo},
  journal = {The Visual Computer},
  year    = {2026},
  note    = {Manuscript submitted}
}

License

This project is released under the Apache License 2.0. See LICENSE for details.

Acknowledgements

This codebase builds on the GDR-Net-style 6D pose estimation pipeline and uses public benchmark datasets from the BOP ecosystem. We thank the authors of the related open-source projects and datasets.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
configs		configs
core		core
lib		lib
ref		ref
scripts		scripts
tools		tools
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TR-GDRN

Overview

Requirements

Dataset Preparation

Training

Evaluation

Checkpoints

Key Implementation Files

DOI and Archival

Citation

License

Acknowledgements

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TR-GDRN

Overview

Requirements

Dataset Preparation

Training

Evaluation

Checkpoints

Key Implementation Files

DOI and Archival

Citation

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages