The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
최근 몇 년 동안 확장성과 효율성으로 인해 많은 주목을 받고 있는 Siamese 추적기에도 불구하고 연구자들은 배경 모양을 무시해 왔으며, 이로 인해 특히 배경이 어수선하고 산만한 요소가 있는 복잡한 시나리오에서 다양한 변형이 있는 임의의 대상 개체를 인식하는 데 적용할 수 없게 되었습니다. 본 논문에서는 시각적 추적을 위해 주어진 특정 대상 객체의 특성을 학습하기 위해 이동된 창 다중 헤드 셀프 어텐션이 생성되는 간단하면서도 효과적인 Siamese 추적기를 제시합니다. 제안된 추적기의 효율성을 검증하기 위해 Swin Transformer를 백본 네트워크로 사용하고 보조 기능 향상 네트워크를 도입했습니다. 두 가지 평가 데이터세트에 대한 광범위한 실험 결과는 제안된 추적기가 다른 기준선보다 성능이 우수하다는 것을 보여줍니다.
Peng GAO
Qufu Normal University
Xin-Yue ZHANG
Qufu Normal University
Xiao-Li YANG
Qufu Normal University
Jian-Cheng NI
Qufu Normal University
Fei WANG
Harbin Institute of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Peng GAO, Xin-Yue ZHANG, Xiao-Li YANG, Jian-Cheng NI, Fei WANG, "Robust Visual Tracking Using Hierarchical Vision Transformer with Shifted Windows Multi-Head Self-Attention" in IEICE TRANSACTIONS on Information,
vol. E107-D, no. 1, pp. 161-164, January 2024, doi: 10.1587/transinf.2023EDL8053.
Abstract: Despite Siamese trackers attracting much attention due to their scalability and efficiency in recent years, researchers have ignored the background appearance, which leads to their inapplicability in recognizing arbitrary target objects with various variations, especially in complex scenarios with background clutter and distractors. In this paper, we present a simple yet effective Siamese tracker, where the shifted windows multi-head self-attention is produced to learn the characteristics of a specific given target object for visual tracking. To validate the effectiveness of our proposed tracker, we use the Swin Transformer as the backbone network and introduced an auxiliary feature enhancement network. Extensive experimental results on two evaluation datasets demonstrate that the proposed tracker outperforms other baselines.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2023EDL8053/_p
부
@ARTICLE{e107-d_1_161,
author={Peng GAO, Xin-Yue ZHANG, Xiao-Li YANG, Jian-Cheng NI, Fei WANG, },
journal={IEICE TRANSACTIONS on Information},
title={Robust Visual Tracking Using Hierarchical Vision Transformer with Shifted Windows Multi-Head Self-Attention},
year={2024},
volume={E107-D},
number={1},
pages={161-164},
abstract={Despite Siamese trackers attracting much attention due to their scalability and efficiency in recent years, researchers have ignored the background appearance, which leads to their inapplicability in recognizing arbitrary target objects with various variations, especially in complex scenarios with background clutter and distractors. In this paper, we present a simple yet effective Siamese tracker, where the shifted windows multi-head self-attention is produced to learn the characteristics of a specific given target object for visual tracking. To validate the effectiveness of our proposed tracker, we use the Swin Transformer as the backbone network and introduced an auxiliary feature enhancement network. Extensive experimental results on two evaluation datasets demonstrate that the proposed tracker outperforms other baselines.},
keywords={},
doi={10.1587/transinf.2023EDL8053},
ISSN={1745-1361},
month={January},}
부
TY - JOUR
TI - Robust Visual Tracking Using Hierarchical Vision Transformer with Shifted Windows Multi-Head Self-Attention
T2 - IEICE TRANSACTIONS on Information
SP - 161
EP - 164
AU - Peng GAO
AU - Xin-Yue ZHANG
AU - Xiao-Li YANG
AU - Jian-Cheng NI
AU - Fei WANG
PY - 2024
DO - 10.1587/transinf.2023EDL8053
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E107-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 2024
AB - Despite Siamese trackers attracting much attention due to their scalability and efficiency in recent years, researchers have ignored the background appearance, which leads to their inapplicability in recognizing arbitrary target objects with various variations, especially in complex scenarios with background clutter and distractors. In this paper, we present a simple yet effective Siamese tracker, where the shifted windows multi-head self-attention is produced to learn the characteristics of a specific given target object for visual tracking. To validate the effectiveness of our proposed tracker, we use the Swin Transformer as the backbone network and introduced an auxiliary feature enhancement network. Extensive experimental results on two evaluation datasets demonstrate that the proposed tracker outperforms other baselines.
ER -