The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
최근 인간의 감정을 자동으로 인식하고 분석하는 것이 여러 분야의 커뮤니티에서 점점 더 주목을 받고 있습니다. 그러나 여러 양식에서 감정 정보를 동시에 활용하는 것은 어렵습니다. 이전 연구에서는 다양한 융합 방법을 탐구했지만 주로 양식 간 상호 작용 또는 양식 내 상호 작용에 중점을 두었습니다. 이 편지에서 우리는 통합된 엔드투엔드 프레임워크에서 양식 내 및 양식 간 상호 작용을 동시에 모델링하기 위해 MAF(양식 주의 흐름)라는 새로운 XNUMX단계 융합 전략을 제안합니다. 실험 결과는 제안된 접근 방식이 널리 사용되는 후기 융합 방법보다 성능이 뛰어나고 적층된 MAF 블록 수가 증가할 때 더 나은 성능을 달성한다는 것을 보여줍니다.
Dongni HU
Chinese Academy of Sciences,University of Chinese Academy of Sciences
Chengxin CHEN
Chinese Academy of Sciences,University of Chinese Academy of Sciences
Pengyuan ZHANG
Chinese Academy of Sciences,University of Chinese Academy of Sciences
Junfeng LI
Chinese Academy of Sciences,University of Chinese Academy of Sciences
Yonghong YAN
Chinese Academy of Sciences,University of Chinese Academy of Sciences
Qingwei ZHAO
Chinese Academy of Sciences
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Dongni HU, Chengxin CHEN, Pengyuan ZHANG, Junfeng LI, Yonghong YAN, Qingwei ZHAO, "A Two-Stage Attention Based Modality Fusion Framework for Multi-Modal Speech Emotion Recognition" in IEICE TRANSACTIONS on Information,
vol. E104-D, no. 8, pp. 1391-1394, August 2021, doi: 10.1587/transinf.2021EDL8002.
Abstract: Recently, automated recognition and analysis of human emotion has attracted increasing attention from multidisciplinary communities. However, it is challenging to utilize the emotional information simultaneously from multiple modalities. Previous studies have explored different fusion methods, but they mainly focused on either inter-modality interaction or intra-modality interaction. In this letter, we propose a novel two-stage fusion strategy named modality attention flow (MAF) to model the intra- and inter-modality interactions simultaneously in a unified end-to-end framework. Experimental results show that the proposed approach outperforms the widely used late fusion methods, and achieves even better performance when the number of stacked MAF blocks increases.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2021EDL8002/_p
부
@ARTICLE{e104-d_8_1391,
author={Dongni HU, Chengxin CHEN, Pengyuan ZHANG, Junfeng LI, Yonghong YAN, Qingwei ZHAO, },
journal={IEICE TRANSACTIONS on Information},
title={A Two-Stage Attention Based Modality Fusion Framework for Multi-Modal Speech Emotion Recognition},
year={2021},
volume={E104-D},
number={8},
pages={1391-1394},
abstract={Recently, automated recognition and analysis of human emotion has attracted increasing attention from multidisciplinary communities. However, it is challenging to utilize the emotional information simultaneously from multiple modalities. Previous studies have explored different fusion methods, but they mainly focused on either inter-modality interaction or intra-modality interaction. In this letter, we propose a novel two-stage fusion strategy named modality attention flow (MAF) to model the intra- and inter-modality interactions simultaneously in a unified end-to-end framework. Experimental results show that the proposed approach outperforms the widely used late fusion methods, and achieves even better performance when the number of stacked MAF blocks increases.},
keywords={},
doi={10.1587/transinf.2021EDL8002},
ISSN={1745-1361},
month={August},}
부
TY - JOUR
TI - A Two-Stage Attention Based Modality Fusion Framework for Multi-Modal Speech Emotion Recognition
T2 - IEICE TRANSACTIONS on Information
SP - 1391
EP - 1394
AU - Dongni HU
AU - Chengxin CHEN
AU - Pengyuan ZHANG
AU - Junfeng LI
AU - Yonghong YAN
AU - Qingwei ZHAO
PY - 2021
DO - 10.1587/transinf.2021EDL8002
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E104-D
IS - 8
JA - IEICE TRANSACTIONS on Information
Y1 - August 2021
AB - Recently, automated recognition and analysis of human emotion has attracted increasing attention from multidisciplinary communities. However, it is challenging to utilize the emotional information simultaneously from multiple modalities. Previous studies have explored different fusion methods, but they mainly focused on either inter-modality interaction or intra-modality interaction. In this letter, we propose a novel two-stage fusion strategy named modality attention flow (MAF) to model the intra- and inter-modality interactions simultaneously in a unified end-to-end framework. Experimental results show that the proposed approach outperforms the widely used late fusion methods, and achieves even better performance when the number of stacked MAF blocks increases.
ER -