The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
새로운 온라인 비지도 음성 활동 감지(VAD) 방법이 제안되었습니다. 이 방법은 HOS(고차 통계)에서 파생된 기능을 기반으로 하며, 비가우시안 잡음에 대한 견고성을 향상시키기 위해 정규화된 자기상관 피크를 기반으로 하는 두 번째 측정항목으로 강화되었습니다. 이 기능은 근거리 대화와 원거리 대화를 구별하는 데 중점을 두고 있으므로 에너지 수준과 무관한 인간 대 인간 상호 작용의 맥락에서 VAD 방법을 제공합니다. 분류는 음성 신호의 잡음 변화를 추적하고 적응하기 위해 EM(기대 최대화) 알고리즘의 온라인 변형을 통해 수행됩니다. 제안된 방법의 성능은 내부 데이터와 자동 음성 인식(ASR) 환경에서 VAD에 사용되는 공개 데이터베이스인 CENSREC-1-C를 통해 평가됩니다. 두 테스트 세트 모두에서 제안된 방법은 단순한 에너지 기반 알고리즘보다 성능이 뛰어나며 음성 희소성, SNR 가변성 및 잡음 유형의 변화에 대해 더 강력한 것으로 나타났습니다.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
David COURNAPEAU, Tatsuya KAWAHARA, "Voice Activity Detection Based on High Order Statistics and Online EM Algorithm" in IEICE TRANSACTIONS on Information,
vol. E91-D, no. 12, pp. 2854-2861, December 2008, doi: 10.1093/ietisy/e91-d.12.2854.
Abstract: A new online, unsupervised voice activity detection (VAD) method is proposed. The method is based on a feature derived from high-order statistics (HOS), enhanced by a second metric based on normalized autocorrelation peaks to improve its robustness to non-Gaussian noises. This feature is also oriented for discriminating between close-talk and far-field speech, thus providing a VAD method in the context of human-to-human interaction independent of the energy level. The classification is done by an online variation of the Expectation-Maximization (EM) algorithm, to track and adapt to noise variations in the speech signal. Performance of the proposed method is evaluated on an in-house data and on CENSREC-1-C, a publicly available database used for VAD in the context of automatic speech recognition (ASR). On both test sets, the proposed method outperforms a simple energy-based algorithm and is shown to be more robust against the change in speech sparsity, SNR variability and the noise type.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e91-d.12.2854/_p
부
@ARTICLE{e91-d_12_2854,
author={David COURNAPEAU, Tatsuya KAWAHARA, },
journal={IEICE TRANSACTIONS on Information},
title={Voice Activity Detection Based on High Order Statistics and Online EM Algorithm},
year={2008},
volume={E91-D},
number={12},
pages={2854-2861},
abstract={A new online, unsupervised voice activity detection (VAD) method is proposed. The method is based on a feature derived from high-order statistics (HOS), enhanced by a second metric based on normalized autocorrelation peaks to improve its robustness to non-Gaussian noises. This feature is also oriented for discriminating between close-talk and far-field speech, thus providing a VAD method in the context of human-to-human interaction independent of the energy level. The classification is done by an online variation of the Expectation-Maximization (EM) algorithm, to track and adapt to noise variations in the speech signal. Performance of the proposed method is evaluated on an in-house data and on CENSREC-1-C, a publicly available database used for VAD in the context of automatic speech recognition (ASR). On both test sets, the proposed method outperforms a simple energy-based algorithm and is shown to be more robust against the change in speech sparsity, SNR variability and the noise type.},
keywords={},
doi={10.1093/ietisy/e91-d.12.2854},
ISSN={1745-1361},
month={December},}
부
TY - JOUR
TI - Voice Activity Detection Based on High Order Statistics and Online EM Algorithm
T2 - IEICE TRANSACTIONS on Information
SP - 2854
EP - 2861
AU - David COURNAPEAU
AU - Tatsuya KAWAHARA
PY - 2008
DO - 10.1093/ietisy/e91-d.12.2854
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E91-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2008
AB - A new online, unsupervised voice activity detection (VAD) method is proposed. The method is based on a feature derived from high-order statistics (HOS), enhanced by a second metric based on normalized autocorrelation peaks to improve its robustness to non-Gaussian noises. This feature is also oriented for discriminating between close-talk and far-field speech, thus providing a VAD method in the context of human-to-human interaction independent of the energy level. The classification is done by an online variation of the Expectation-Maximization (EM) algorithm, to track and adapt to noise variations in the speech signal. Performance of the proposed method is evaluated on an in-house data and on CENSREC-1-C, a publicly available database used for VAD in the context of automatic speech recognition (ASR). On both test sets, the proposed method outperforms a simple energy-based algorithm and is shown to be more robust against the change in speech sparsity, SNR variability and the noise type.
ER -