Skip to content

zgspose/TwinPose

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 

Repository files navigation

TwinPose: Person-Specific Subspaces for Multi-View 3D Pose Estimation

News

  • 2026-06-08: 🔗 Code is now available at https://github.com/HYPER-THEORY/TwinPose.

  • 2026-05-06: 🎉 TwinPose has been accepted to SIGGRAPH 2026 Journal Track (ACM Transactions on Graphics)!

  • 2024-11-01: 🚀 TwinPose was successfully developed and integrated into our self-developed real-time multi-view motion capture system.

Introduction

Following the success of deep neural networks in 2D pose estimation, reconstruction-based approaches have significantly advanced multi-person 3D pose estimation from sparse multi-view images. These methods typically detect 2D poses independently in each view and then associate them for 3D reconstruction. However, despite strong progress, recent state-of-the-art methods still face critical limitations: 1) They often depend on global optimization over a large and complex set of multi-view 2D joints to jointly infer 3D poses for all individuals, making the process highly complex and prone to suboptimal solutions; 2) Their tight coupling with the bottom-up detector OpenPose hinders the use of more advanced top-down or single-stage 2D pose estimators and restricts the integration of richer instance-level cues learned by these models.

To address these limitations, we propose TwinPose, a novel framework that alleviates the complexity of global pose inference by optimizing within person-specific 3D pose subspaces, while fully supporting diverse 2D pose detectors and effectively leveraging pose-instance cues. The key idea is to introduce a twin pose — a 3D counterpart of each 2D pose — that inherits its instance representation and aggregates geometrically consistent 2D joints from other views. All twin poses are unified in a common 3D space, where those belonging to the same individual naturally share a number of bones. This structural property enables association by counting shared bones, forming person-specific subspaces from which each individual’s 3D pose can be inferred independently in an efficient and robust manner.

Extensive experiments demonstrate that TwinPose achieves state-of-the-art performance in both accuracy and efficiency across multiple public and proprietary datasets. Importantly, it is fully detector-agnostic, allowing seamless integration with current and future advances in 2D pose estimation while remaining highly robust to noisy or imperfect 2D predictions.

TwinPose

Perspective and Broader Impact

TwinPose reflects our observation-first view of multi-view 3D motion capture: the quality of 2D observations from each camera determines the upper bound of 3D pose estimation. The goal of TwinPose is to make this upper bound easier to approach in practice. By building person-specific 3D pose subspaces, TwinPose avoids heavy global optimization, supports arbitrary 2D human pose detectors, and provides a scalable framework for future improvements driven by stronger 2D pose estimation models.

This framework also connects naturally with our broader research on video-based 2D pose estimation, including DSTA, PAVE-Net, and TAR-ViTPose. These works systematically explore how temporal information can be used to improve 2D pose estimation, with the hope of moving beyond the dominant single-frame paradigm toward a more robust video-based paradigm.

Quantitative Performance

with the fastest per-frame time (e.g., 0.92 ms on Shelf) and full flexibility to work with any 2D pose detector (e.g, HRNet, RTMO, and OpenPose).

Quantitative comparison on the Shelf dataset.

MethodA1A2A3AvgTime (ms)
Tanke and Gall [2019]99.890.098.096.0N/A
Bridgeman et al. [2019]99.391.697.696.29.1
Dong et al. [2019]98.894.197.896.990
Chen et al. [2020a]99.693.297.596.83.08
Tu et al. [2020]99.394.197.697.0333
Huang et al. [2020]98.896.297.297.4640
Zhang et al. [2020]99.096.297.697.631.9
Dong et al. [2021]99.193.598.196.9N/A
Wang et al. [2021]99.395.197.897.4~170
Wu et al. [2021]99.396.597.397.7~48.8
Reddy et al. [2021]99.196.398.397.9>333
Lin and Lee [2021]99.396.598.097.923.4
Zhang et al. [2021]99.597.097.898.1>31.9
Zhou et al. [2022]99.596.798.298.12.94
Choudhury et al. [2023]99.096.398.297.8N/A
Liao et al. [2024]99.596.897.898.0210
TwinPose (Ours)99.896.298.598.20.92

Quantitative comparison on the 4DA dataset.

Method2D DetectorPrecision (%)Recall (%)
Methods tightly coupled to the bottom‑up detector OpenPose
Zhang et al. [2020]OpenPose88.590.2
Dong et al. [2021]OpenPose90.189.0
Zhou et al. [2022]OpenPose92.091.2
Detector‑agnostic methods (any 2D pose detector)
Dong et al. [2019]OpenPose78.577.1
Dong et al. [2019]HRNet84.984.9
Dong et al. [2019]RTMO85.485.5
TwinPose (Ours)OpenPose91.490.4
TwinPose (Ours)HRNet94.393.2
TwinPose (Ours)RTMO94.895.0

Quantitative comparison on the Hi4D dataset.

MethodMPJPE ↓PCP ↑AP₅₀ ↑AP₁₀₀ ↑Recall ↑
Dong et al. [2019]53.0587.5767.9780.2893.80
Zhang et al. [2020]41.2988.6280.8797.2798.78
Lu et al. [2024a]32.1096.9091.4897.3398.78
TwinPose (Ours)22.0099.7190.8099.3499.86

Qualitative Results

Comparison with skeleton-level association method [Dong et al. 2019]. Traditional skeleton-level association approaches indiscriminately use all joints and bones, leading to incorrect associations (red boxes). TwinPose preserves only cross-view geometrically consistent joints, substantially improving robustness.

TwinPose

Comparison with the state-of-the-art 4DA method [Zhang et al. 2020]. Global optimization in 4DA causes incorrect cross-person associations (red boxes). TwinPose performs person-specific inference in pose subspaces, enhancing both robustness and efficiency.

TwinPose

Whole-body 3D pose estimation results of our method on the Panoptic dataset. Results from eight camera views demonstrate consistent multi-view reconstructior of body, hands, feet, and facial keypoints.

TwinPose

Video Demo

For a complete video demonstration of our methods, please see this YouTube video.

simpledemo.mp4

Citations

If you find our paper useful in your research, please consider citing:

@article{yang2026twinpose,
  title         = {TwinPose: Person-Specific Subspaces for Multi-View 3D Pose Estimation},
  author        = {Yang, Wenwu and He, Tianyi and Ding, Jiwei and Wang, Xun and Zhang, Rong and Zhou, Kun},
  journal       = {ACM Transactions on Graphics},
  volume        = {45},
  number        = {4},
  articleno     = {61},
  year          = {2026},
  note          = {SIGGRAPH 2026 Journal Track}
}

@article{yang2023lightweight,
  title         = {Lightweight Multi-Person Motion Capture System in the Wild},
  author        = {Yang, Wenwu and Li, Yue and Xing, Shuai and Cai, Jiahang and Wang, Xun},
  journal       = {SCIENTIA SINICA Informationis},
  volume        = {53},
  number        = {11},
  pages         = {2230--2249},
  year          = {2023},
  note          = {In Chinese}
}

Acknowledgement

We thank Tianyi He for implementing the TwinPose algorithm; Jiwei Ding for his assistance with the quantitative and qualitative experiments; Yihui Sun and Bin Zhou for their assistance with the experiments on whole-body 3D pose estimation and learning-based methods; Siying Chen for video editing and homepage development; Xiongbin Lin for video editing; and all participants who contributed to the motion capture data collection.

About

[SIGGRAPH 2026] This is the official homepage of our paper "TwinPose: Person-Specific Subspaces for Multi-View 3D Pose Estimation".

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors