The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
ASC(Acoustic Scene Classification)의 목표는 주변 환경에 대한 인간의 분석을 시뮬레이션하고 신속하게 정확한 결정을 내리는 것입니다. 실제 시나리오의 오디오 신호에서 유용한 정보를 추출하는 것은 어려운 일이며 특히 상대적으로 동질적인 배경을 가진 환경에서는 음향 장면 분류에서 차선의 성능으로 이어질 수 있습니다. 이 문제를 해결하기 위해 우리는 실생활에서 "술고래"의 냉정한 과정과 일반 사람들의 행동을 지도하는 과정을 모델링하고 "술고래 방법론"이라는 고정밀 경량 모델 구현 방법론을 구축합니다. 핵심 아이디어는 세 부분으로 구성됩니다. (1) 술고래와 일반 사람들 사이의 서로 다른 정보 인식 메커니즘을 기반으로 특수 기능 변환 모듈을 설계하여 점차 정신을 차리는 과정과 기능 인식 능력의 변화를 시뮬레이션합니다. (2) 일반 모델의 인식 처리 프로세스와 일치하는 경량의 "drunken" 모델을 연구합니다. 이 모델은 다중 스케일 클래스 잔차 블록 구조를 사용하고 다양한 스케일에서 추출된 정보를 융합하여 더 미세한 특징 표현을 얻을 수 있습니다. (3) "술 취한" 모델에 기존 모델의 안내 및 융합 모듈을 도입하여 정신 차리기 프로세스의 속도를 높이고 반복적인 최적화 및 정확도 향상을 달성합니다. DCASE2022 Task1의 공식 데이터세트에 대한 평가 결과는 우리의 기준 시스템이 40.4K 매개변수와 2.284M MAC(곱셈-누산 연산) 조건에서 442.67% 정확도와 19.40 손실을 달성했음을 보여줍니다. Drunkard 메커니즘을 채택한 후 정확도는 45.2%로 향상되었으며 0.634K 매개변수 및 551.89M MAC 조건에서 손실은 23.6만큼 감소했습니다.
Wenkai LIU
North China University of Technology
Lin ZHANG
North China University of Technology
Menglong WU
North China University of Technology
Xichang CAI
North China University of Technology
Hongxia DONG
North China University of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Wenkai LIU, Lin ZHANG, Menglong WU, Xichang CAI, Hongxia DONG, "Research on Lightweight Acoustic Scene Perception Method Based on Drunkard Methodology" in IEICE TRANSACTIONS on Information,
vol. E107-D, no. 1, pp. 83-92, January 2024, doi: 10.1587/transinf.2023EDP7107.
Abstract: The goal of Acoustic Scene Classification (ASC) is to simulate human analysis of the surrounding environment and make accurate decisions promptly. Extracting useful information from audio signals in real-world scenarios is challenging and can lead to suboptimal performance in acoustic scene classification, especially in environments with relatively homogeneous backgrounds. To address this problem, we model the sobering-up process of “drunkards” in real-life and the guiding behavior of normal people, and construct a high-precision lightweight model implementation methodology called the “drunkard methodology”. The core idea includes three parts: (1) designing a special feature transformation module based on the different mechanisms of information perception between drunkards and ordinary people, to simulate the process of gradually sobering up and the changes in feature perception ability; (2) studying a lightweight “drunken” model that matches the normal model's perception processing process. The model uses a multi-scale class residual block structure and can obtain finer feature representations by fusing information extracted at different scales; (3) introducing a guiding and fusion module of the conventional model to the “drunken” model to speed up the sobering-up process and achieve iterative optimization and accuracy improvement. Evaluation results on the official dataset of DCASE2022 Task1 demonstrate that our baseline system achieves 40.4% accuracy and 2.284 loss under the condition of 442.67K parameters and 19.40M MAC (multiply-accumulate operations). After adopting the “drunkard” mechanism, the accuracy is improved to 45.2%, and the loss is reduced by 0.634 under the condition of 551.89K parameters and 23.6M MAC.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2023EDP7107/_p
부
@ARTICLE{e107-d_1_83,
author={Wenkai LIU, Lin ZHANG, Menglong WU, Xichang CAI, Hongxia DONG, },
journal={IEICE TRANSACTIONS on Information},
title={Research on Lightweight Acoustic Scene Perception Method Based on Drunkard Methodology},
year={2024},
volume={E107-D},
number={1},
pages={83-92},
abstract={The goal of Acoustic Scene Classification (ASC) is to simulate human analysis of the surrounding environment and make accurate decisions promptly. Extracting useful information from audio signals in real-world scenarios is challenging and can lead to suboptimal performance in acoustic scene classification, especially in environments with relatively homogeneous backgrounds. To address this problem, we model the sobering-up process of “drunkards” in real-life and the guiding behavior of normal people, and construct a high-precision lightweight model implementation methodology called the “drunkard methodology”. The core idea includes three parts: (1) designing a special feature transformation module based on the different mechanisms of information perception between drunkards and ordinary people, to simulate the process of gradually sobering up and the changes in feature perception ability; (2) studying a lightweight “drunken” model that matches the normal model's perception processing process. The model uses a multi-scale class residual block structure and can obtain finer feature representations by fusing information extracted at different scales; (3) introducing a guiding and fusion module of the conventional model to the “drunken” model to speed up the sobering-up process and achieve iterative optimization and accuracy improvement. Evaluation results on the official dataset of DCASE2022 Task1 demonstrate that our baseline system achieves 40.4% accuracy and 2.284 loss under the condition of 442.67K parameters and 19.40M MAC (multiply-accumulate operations). After adopting the “drunkard” mechanism, the accuracy is improved to 45.2%, and the loss is reduced by 0.634 under the condition of 551.89K parameters and 23.6M MAC.},
keywords={},
doi={10.1587/transinf.2023EDP7107},
ISSN={1745-1361},
month={January},}
부
TY - JOUR
TI - Research on Lightweight Acoustic Scene Perception Method Based on Drunkard Methodology
T2 - IEICE TRANSACTIONS on Information
SP - 83
EP - 92
AU - Wenkai LIU
AU - Lin ZHANG
AU - Menglong WU
AU - Xichang CAI
AU - Hongxia DONG
PY - 2024
DO - 10.1587/transinf.2023EDP7107
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E107-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 2024
AB - The goal of Acoustic Scene Classification (ASC) is to simulate human analysis of the surrounding environment and make accurate decisions promptly. Extracting useful information from audio signals in real-world scenarios is challenging and can lead to suboptimal performance in acoustic scene classification, especially in environments with relatively homogeneous backgrounds. To address this problem, we model the sobering-up process of “drunkards” in real-life and the guiding behavior of normal people, and construct a high-precision lightweight model implementation methodology called the “drunkard methodology”. The core idea includes three parts: (1) designing a special feature transformation module based on the different mechanisms of information perception between drunkards and ordinary people, to simulate the process of gradually sobering up and the changes in feature perception ability; (2) studying a lightweight “drunken” model that matches the normal model's perception processing process. The model uses a multi-scale class residual block structure and can obtain finer feature representations by fusing information extracted at different scales; (3) introducing a guiding and fusion module of the conventional model to the “drunken” model to speed up the sobering-up process and achieve iterative optimization and accuracy improvement. Evaluation results on the official dataset of DCASE2022 Task1 demonstrate that our baseline system achieves 40.4% accuracy and 2.284 loss under the condition of 442.67K parameters and 19.40M MAC (multiply-accumulate operations). After adopting the “drunkard” mechanism, the accuracy is improved to 45.2%, and the loss is reduced by 0.634 under the condition of 551.89K parameters and 23.6M MAC.
ER -