The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
우리는 음악적 잡음을 줄이고 더 나은 청각적 느낌을 제공하는 심층 신경망(DNN) 기반 음성 향상을 제안합니다. 음악적 소음은 비선형 신호 처리에 의해 생성된 인공물이며 청각적 인상에 부정적인 영향을 미칩니다. 우리는 음악적 잡음 발생을 억제하고 지각적으로 편안한 향상된 음성을 생성하는 음악적 잡음 없는 음성 향상 방법을 개발하는 것을 목표로 합니다. 소프트 마스크를 사용한 DNN 기반 음성 향상은 높은 소음 감소를 달성하지만 음성이 아닌 영역에서 음악적 소음을 발생시킵니다. 따라서 먼저 DNN 기반 저음악 잡음 음성 향상을 위한 첨도 매칭을 정의합니다. 첨도는 1차 모멘트이며 음악적 소음의 양과 상관 관계가 있는 것으로 알려져 있습니다. 첨도 매칭은 DNN 훈련의 페널티 용어이며 음악적 소음의 양을 줄이는 데 사용됩니다. 우리는 이 방식을 표준화된 순간 매칭으로 확장합니다. 확장 방식은 첨도보다 차수가 높은 모멘트를 사용하고 첨도 매칭을 기반으로 하는 기존의 음악적 잡음 없는 방법을 일반화합니다. 우리는 표준화된 모멘트 매칭을 공식화하고 고차 모멘트가 음악 소음의 양을 얼마나 효과적으로 줄이는지 탐구합니다. 실험적 평가 결과는 2) 첨도 매칭이 잡음 억제에 부정적인 영향을 주지 않으면서 음악적 잡음을 감소시킬 수 있음을 입증하였고, XNUMX) XNUMX차 순간 매칭이 첨도 매칭뿐만 아니라 저음악 잡음 음성 향상도 달성함을 새롭게 밝혔다.
Satoshi MIZOGUCHI
University of Tokyo
Yuki SAITO
University of Tokyo
Shinnosuke TAKAMICHI
University of Tokyo
Hiroshi SARUWATARI
University of Tokyo
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Satoshi MIZOGUCHI, Yuki SAITO, Shinnosuke TAKAMICHI, Hiroshi SARUWATARI, "DNN-Based Low-Musical-Noise Single-Channel Speech Enhancement Based on Higher-Order-Moments Matching" in IEICE TRANSACTIONS on Information,
vol. E104-D, no. 11, pp. 1971-1980, November 2021, doi: 10.1587/transinf.2021EDP7041.
Abstract: We propose deep neural network (DNN)-based speech enhancement that reduces musical noise and achieves better auditory impressions. The musical noise is an artifact generated by nonlinear signal processing and negatively affects the auditory impressions. We aim to develop musical-noise-free speech enhancement methods that suppress the musical noise generation and produce perceptually-comfortable enhanced speech. DNN-based speech enhancement using a soft mask achieves high noise reduction but generates musical noise in non-speech regions. Therefore, first, we define kurtosis matching for DNN-based low-musical-noise speech enhancement. Kurtosis is the fourth-order moment and is known to correlate with the amount of musical noise. The kurtosis matching is a penalty term of the DNN training and works to reduce the amount of musical noise. We further extend this scheme to standardized-moment matching. The extended scheme involves using moments whose orders are higher than kurtosis and generalizes the conventional musical-noise-free method based on kurtosis matching. We formulate standardized-moment matching and explore how effectively the higher-order moments reduce the amount of musical noise. Experimental evaluation results 1) demonstrate that kurtosis matching can reduce musical noise without negatively affecting noise suppression and 2) newly reveal that the sixth-moment matching also achieves low-musical-noise speech enhancement as well as kurtosis matching.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2021EDP7041/_p
부
@ARTICLE{e104-d_11_1971,
author={Satoshi MIZOGUCHI, Yuki SAITO, Shinnosuke TAKAMICHI, Hiroshi SARUWATARI, },
journal={IEICE TRANSACTIONS on Information},
title={DNN-Based Low-Musical-Noise Single-Channel Speech Enhancement Based on Higher-Order-Moments Matching},
year={2021},
volume={E104-D},
number={11},
pages={1971-1980},
abstract={We propose deep neural network (DNN)-based speech enhancement that reduces musical noise and achieves better auditory impressions. The musical noise is an artifact generated by nonlinear signal processing and negatively affects the auditory impressions. We aim to develop musical-noise-free speech enhancement methods that suppress the musical noise generation and produce perceptually-comfortable enhanced speech. DNN-based speech enhancement using a soft mask achieves high noise reduction but generates musical noise in non-speech regions. Therefore, first, we define kurtosis matching for DNN-based low-musical-noise speech enhancement. Kurtosis is the fourth-order moment and is known to correlate with the amount of musical noise. The kurtosis matching is a penalty term of the DNN training and works to reduce the amount of musical noise. We further extend this scheme to standardized-moment matching. The extended scheme involves using moments whose orders are higher than kurtosis and generalizes the conventional musical-noise-free method based on kurtosis matching. We formulate standardized-moment matching and explore how effectively the higher-order moments reduce the amount of musical noise. Experimental evaluation results 1) demonstrate that kurtosis matching can reduce musical noise without negatively affecting noise suppression and 2) newly reveal that the sixth-moment matching also achieves low-musical-noise speech enhancement as well as kurtosis matching.},
keywords={},
doi={10.1587/transinf.2021EDP7041},
ISSN={1745-1361},
month={November},}
부
TY - JOUR
TI - DNN-Based Low-Musical-Noise Single-Channel Speech Enhancement Based on Higher-Order-Moments Matching
T2 - IEICE TRANSACTIONS on Information
SP - 1971
EP - 1980
AU - Satoshi MIZOGUCHI
AU - Yuki SAITO
AU - Shinnosuke TAKAMICHI
AU - Hiroshi SARUWATARI
PY - 2021
DO - 10.1587/transinf.2021EDP7041
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E104-D
IS - 11
JA - IEICE TRANSACTIONS on Information
Y1 - November 2021
AB - We propose deep neural network (DNN)-based speech enhancement that reduces musical noise and achieves better auditory impressions. The musical noise is an artifact generated by nonlinear signal processing and negatively affects the auditory impressions. We aim to develop musical-noise-free speech enhancement methods that suppress the musical noise generation and produce perceptually-comfortable enhanced speech. DNN-based speech enhancement using a soft mask achieves high noise reduction but generates musical noise in non-speech regions. Therefore, first, we define kurtosis matching for DNN-based low-musical-noise speech enhancement. Kurtosis is the fourth-order moment and is known to correlate with the amount of musical noise. The kurtosis matching is a penalty term of the DNN training and works to reduce the amount of musical noise. We further extend this scheme to standardized-moment matching. The extended scheme involves using moments whose orders are higher than kurtosis and generalizes the conventional musical-noise-free method based on kurtosis matching. We formulate standardized-moment matching and explore how effectively the higher-order moments reduce the amount of musical noise. Experimental evaluation results 1) demonstrate that kurtosis matching can reduce musical noise without negatively affecting noise suppression and 2) newly reveal that the sixth-moment matching also achieves low-musical-noise speech enhancement as well as kurtosis matching.
ER -