The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
단안 RGB 이미지에서 6DoF 포즈 추정은 어렵지만 근본적인 작업입니다. 단위 방향 벡터 필드 표현과 허프 투표 전략을 기반으로 한 방법은 최첨단 성능을 달성했습니다. 그럼에도 불구하고 그들은 부드러운 방식을 적용합니다. l단위 벡터의 두 요소를 개별적으로 학습하는 데 1개의 손실이 발생하므로 픽셀과 키포인트 간의 이전 거리가 고려되지 않습니다. 위치 오류는 이전 거리에 크게 영향을 받습니다. 본 연구에서는 보다 정확한 벡터 필드 표현을 위해 사전 거리를 활용하는 PDAL(Prior Distance Augmented Loss)을 제안합니다. 또한 적응형 기능 융합을 위한 경량 채널 수준 주의 모듈을 제안합니다. 이 AFAM(Adaptive Fusion Attention Module)을 U-Net에 내장하여 Attention Voting Network를 구축하여 방법의 성능을 더욱 향상시킵니다. 우리는 LINEMOD, OCCLUSION 및 YCB-Video 데이터 세트에 대한 방법의 효율성과 성능 개선을 입증하기 위해 광범위한 실험을 수행합니다. 우리의 실험은 제안된 방법이 상당한 성능 향상을 가져오고 사후 개선 없이 최첨단 RGB 기반 방법보다 성능이 우수하다는 것을 보여줍니다.
Yong HE
Chongqing University
Ji LI
Chongqing University
Xuanhong ZHOU
Chongqing University
Zewei CHEN
Chongqing University
Xin LIU
Chongqing University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Yong HE, Ji LI, Xuanhong ZHOU, Zewei CHEN, Xin LIU, "Attention Voting Network with Prior Distance Augmented Loss for 6DoF Pose Estimation" in IEICE TRANSACTIONS on Information,
vol. E104-D, no. 7, pp. 1039-1048, July 2021, doi: 10.1587/transinf.2020EDP7235.
Abstract: 6DoF pose estimation from a monocular RGB image is a challenging but fundamental task. The methods based on unit direction vector-field representation and Hough voting strategy achieved state-of-the-art performance. Nevertheless, they apply the smooth l1 loss to learn the two elements of the unit vector separately, resulting in which is not taken into account that the prior distance between the pixel and the keypoint. While the positioning error is significantly affected by the prior distance. In this work, we propose a Prior Distance Augmented Loss (PDAL) to exploit the prior distance for more accurate vector-field representation. Furthermore, we propose a lightweight channel-level attention module for adaptive feature fusion. Embedding this Adaptive Fusion Attention Module (AFAM) into the U-Net, we build an Attention Voting Network to further improve the performance of our method. We conduct extensive experiments to demonstrate the effectiveness and performance improvement of our methods on the LINEMOD, OCCLUSION and YCB-Video datasets. Our experiments show that the proposed methods bring significant performance gains and outperform state-of-the-art RGB-based methods without any post-refinement.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2020EDP7235/_p
부
@ARTICLE{e104-d_7_1039,
author={Yong HE, Ji LI, Xuanhong ZHOU, Zewei CHEN, Xin LIU, },
journal={IEICE TRANSACTIONS on Information},
title={Attention Voting Network with Prior Distance Augmented Loss for 6DoF Pose Estimation},
year={2021},
volume={E104-D},
number={7},
pages={1039-1048},
abstract={6DoF pose estimation from a monocular RGB image is a challenging but fundamental task. The methods based on unit direction vector-field representation and Hough voting strategy achieved state-of-the-art performance. Nevertheless, they apply the smooth l1 loss to learn the two elements of the unit vector separately, resulting in which is not taken into account that the prior distance between the pixel and the keypoint. While the positioning error is significantly affected by the prior distance. In this work, we propose a Prior Distance Augmented Loss (PDAL) to exploit the prior distance for more accurate vector-field representation. Furthermore, we propose a lightweight channel-level attention module for adaptive feature fusion. Embedding this Adaptive Fusion Attention Module (AFAM) into the U-Net, we build an Attention Voting Network to further improve the performance of our method. We conduct extensive experiments to demonstrate the effectiveness and performance improvement of our methods on the LINEMOD, OCCLUSION and YCB-Video datasets. Our experiments show that the proposed methods bring significant performance gains and outperform state-of-the-art RGB-based methods without any post-refinement.},
keywords={},
doi={10.1587/transinf.2020EDP7235},
ISSN={1745-1361},
month={July},}
부
TY - JOUR
TI - Attention Voting Network with Prior Distance Augmented Loss for 6DoF Pose Estimation
T2 - IEICE TRANSACTIONS on Information
SP - 1039
EP - 1048
AU - Yong HE
AU - Ji LI
AU - Xuanhong ZHOU
AU - Zewei CHEN
AU - Xin LIU
PY - 2021
DO - 10.1587/transinf.2020EDP7235
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E104-D
IS - 7
JA - IEICE TRANSACTIONS on Information
Y1 - July 2021
AB - 6DoF pose estimation from a monocular RGB image is a challenging but fundamental task. The methods based on unit direction vector-field representation and Hough voting strategy achieved state-of-the-art performance. Nevertheless, they apply the smooth l1 loss to learn the two elements of the unit vector separately, resulting in which is not taken into account that the prior distance between the pixel and the keypoint. While the positioning error is significantly affected by the prior distance. In this work, we propose a Prior Distance Augmented Loss (PDAL) to exploit the prior distance for more accurate vector-field representation. Furthermore, we propose a lightweight channel-level attention module for adaptive feature fusion. Embedding this Adaptive Fusion Attention Module (AFAM) into the U-Net, we build an Attention Voting Network to further improve the performance of our method. We conduct extensive experiments to demonstrate the effectiveness and performance improvement of our methods on the LINEMOD, OCCLUSION and YCB-Video datasets. Our experiments show that the proposed methods bring significant performance gains and outperform state-of-the-art RGB-based methods without any post-refinement.
ER -