The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
조회수
140
음성 감정 인식을 위한 딥러닝의 광범위한 활용에도 불구하고 심층 신경망의 상위 계층에서의 정보 손실과 성능 저하 문제로 인해 심각한 제한을 받고 있습니다. 정보를 효율적으로 활용하고 성능 저하 문제를 해결하기 위해 음성 감정 인식을 위한 주의 기반 LSTM(Dense Long Short-Term Memory)이 제안되었습니다. 음성과 같은 시계열을 처리할 수 있는 기능을 갖춘 LSTM 네트워크는 주의 기반 밀집 연결이 도입되어 구성됩니다. 즉, 계층 간 감정 정보의 차이를 구별하고 하위 계층의 중복 정보가 상위 계층의 유효 정보에 간섭하는 것을 방지하기 위해 각 계층의 스킵 연결에 가중치 계수를 추가한다는 의미입니다. 실험을 통해 제안한 방법이 eNTERFACE 코퍼스와 IEMOCAP 코퍼스에서 각각 12%, 7% 인식 성능을 향상시키는 것을 보여주었다.
Yue XIE
Southeast University
Ruiyu LIANG
Nanjing Institute of Technology
Zhenlin LIANG
Southeast University
Li ZHAO
Southeast University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Yue XIE, Ruiyu LIANG, Zhenlin LIANG, Li ZHAO, "Attention-Based Dense LSTM for Speech Emotion Recognition" in IEICE TRANSACTIONS on Information,
vol. E102-D, no. 7, pp. 1426-1429, July 2019, doi: 10.1587/transinf.2019EDL8019.
Abstract: Despite the widespread use of deep learning for speech emotion recognition, they are severely restricted due to the information loss in the high layer of deep neural networks, as well as the degradation problem. In order to efficiently utilize information and solve degradation, attention-based dense long short-term memory (LSTM) is proposed for speech emotion recognition. LSTM networks with the ability to process time series such as speech are constructed into which attention-based dense connections are introduced. That means the weight coefficients are added to skip-connections of each layer to distinguish the difference of the emotional information between layers and avoid the interference of redundant information from the bottom layer to the effective information from the top layer. The experiments demonstrate that proposed method improves the recognition performance by 12% and 7% on eNTERFACE and IEMOCAP corpus respectively.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2019EDL8019/_p
부
@ARTICLE{e102-d_7_1426,
author={Yue XIE, Ruiyu LIANG, Zhenlin LIANG, Li ZHAO, },
journal={IEICE TRANSACTIONS on Information},
title={Attention-Based Dense LSTM for Speech Emotion Recognition},
year={2019},
volume={E102-D},
number={7},
pages={1426-1429},
abstract={Despite the widespread use of deep learning for speech emotion recognition, they are severely restricted due to the information loss in the high layer of deep neural networks, as well as the degradation problem. In order to efficiently utilize information and solve degradation, attention-based dense long short-term memory (LSTM) is proposed for speech emotion recognition. LSTM networks with the ability to process time series such as speech are constructed into which attention-based dense connections are introduced. That means the weight coefficients are added to skip-connections of each layer to distinguish the difference of the emotional information between layers and avoid the interference of redundant information from the bottom layer to the effective information from the top layer. The experiments demonstrate that proposed method improves the recognition performance by 12% and 7% on eNTERFACE and IEMOCAP corpus respectively.},
keywords={},
doi={10.1587/transinf.2019EDL8019},
ISSN={1745-1361},
month={July},}
부
TY - JOUR
TI - Attention-Based Dense LSTM for Speech Emotion Recognition
T2 - IEICE TRANSACTIONS on Information
SP - 1426
EP - 1429
AU - Yue XIE
AU - Ruiyu LIANG
AU - Zhenlin LIANG
AU - Li ZHAO
PY - 2019
DO - 10.1587/transinf.2019EDL8019
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 7
JA - IEICE TRANSACTIONS on Information
Y1 - July 2019
AB - Despite the widespread use of deep learning for speech emotion recognition, they are severely restricted due to the information loss in the high layer of deep neural networks, as well as the degradation problem. In order to efficiently utilize information and solve degradation, attention-based dense long short-term memory (LSTM) is proposed for speech emotion recognition. LSTM networks with the ability to process time series such as speech are constructed into which attention-based dense connections are introduced. That means the weight coefficients are added to skip-connections of each layer to distinguish the difference of the emotional information between layers and avoid the interference of redundant information from the bottom layer to the effective information from the top layer. The experiments demonstrate that proposed method improves the recognition performance by 12% and 7% on eNTERFACE and IEMOCAP corpus respectively.
ER -