The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
음향 장면 분류(ASC)는 인공 지능 분류 작업 영역 내의 기본 영역입니다. ASC 기반 작업은 일반적으로 Log-Mel 스펙트로그램을 음향 특징 수집을 위한 입력으로 활용하는 CNN(컨벌루션 신경망) 기반 모델을 사용합니다. 본 논문에서는 ASC를 위한 CNN 기반 다중 규모 풀링(MSP) 전략을 설계했습니다. Log-Mel 스펙트로그램은 4개의 주파수 축 세그먼트로 분할된 CNN에 대한 입력으로 활용됩니다. 또한, 우리는 서로 다른 주파수 범위에서 입력을 얻기 위해 4개의 CNN 채널을 고안했습니다. 다양한 주파수 대역의 출력에서 추출된 상위 수준 특징은 여러 수준의 주파수 피라미드 평균 풀링 레이어를 통해 통합됩니다. 그 후, 다양한 장면을 분류하기 위해 소프트맥스 분류기가 사용됩니다. 우리의 연구는 우리가 설계한 모델을 구현하면 두 개의 음향 데이터 세트를 테스트한 결과 모델 성능이 크게 향상된다는 사실을 보여줍니다.
Rong HUANG
Nanjing University of Posts and Telecommunications
Yue XIE
Nanjing Institute of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Rong HUANG, Yue XIE, "A CNN-Based Multi-Scale Pooling Strategy for Acoustic Scene Classification" in IEICE TRANSACTIONS on Information,
vol. E107-D, no. 1, pp. 153-156, January 2024, doi: 10.1587/transinf.2023EDL8048.
Abstract: Acoustic scene classification (ASC) is a fundamental domain within the realm of artificial intelligence classification tasks. ASC-based tasks commonly employ models based on convolutional neural networks (CNNs) that utilize log-Mel spectrograms as input for gathering acoustic features. In this paper, we designed a CNN-based multi-scale pooling (MSP) strategy for ASC. The log-Mel spectrograms are utilized as the input to CNN, which is partitioned into four frequency axis segments. Furthermore, we devised four CNN channels to acquire inputs from distinct frequency ranges. The high-level features extracted from outputs in various frequency bands are integrated through frequency pyramid average pooling layers at multiple levels. Subsequently, a softmax classifier is employed to classify different scenes. Our study demonstrates that the implementation of our designed model leads to a significant enhancement in the model's performance, as evidenced by the testing of two acoustic datasets.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2023EDL8048/_p
부
@ARTICLE{e107-d_1_153,
author={Rong HUANG, Yue XIE, },
journal={IEICE TRANSACTIONS on Information},
title={A CNN-Based Multi-Scale Pooling Strategy for Acoustic Scene Classification},
year={2024},
volume={E107-D},
number={1},
pages={153-156},
abstract={Acoustic scene classification (ASC) is a fundamental domain within the realm of artificial intelligence classification tasks. ASC-based tasks commonly employ models based on convolutional neural networks (CNNs) that utilize log-Mel spectrograms as input for gathering acoustic features. In this paper, we designed a CNN-based multi-scale pooling (MSP) strategy for ASC. The log-Mel spectrograms are utilized as the input to CNN, which is partitioned into four frequency axis segments. Furthermore, we devised four CNN channels to acquire inputs from distinct frequency ranges. The high-level features extracted from outputs in various frequency bands are integrated through frequency pyramid average pooling layers at multiple levels. Subsequently, a softmax classifier is employed to classify different scenes. Our study demonstrates that the implementation of our designed model leads to a significant enhancement in the model's performance, as evidenced by the testing of two acoustic datasets.},
keywords={},
doi={10.1587/transinf.2023EDL8048},
ISSN={1745-1361},
month={January},}
부
TY - JOUR
TI - A CNN-Based Multi-Scale Pooling Strategy for Acoustic Scene Classification
T2 - IEICE TRANSACTIONS on Information
SP - 153
EP - 156
AU - Rong HUANG
AU - Yue XIE
PY - 2024
DO - 10.1587/transinf.2023EDL8048
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E107-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 2024
AB - Acoustic scene classification (ASC) is a fundamental domain within the realm of artificial intelligence classification tasks. ASC-based tasks commonly employ models based on convolutional neural networks (CNNs) that utilize log-Mel spectrograms as input for gathering acoustic features. In this paper, we designed a CNN-based multi-scale pooling (MSP) strategy for ASC. The log-Mel spectrograms are utilized as the input to CNN, which is partitioned into four frequency axis segments. Furthermore, we devised four CNN channels to acquire inputs from distinct frequency ranges. The high-level features extracted from outputs in various frequency bands are integrated through frequency pyramid average pooling layers at multiple levels. Subsequently, a softmax classifier is employed to classify different scenes. Our study demonstrates that the implementation of our designed model leads to a significant enhancement in the model's performance, as evidenced by the testing of two acoustic datasets.
ER -