The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
본 연구에서는 음향 소스의 위치와 방향 각도로 구성된 공간 정보를 인공 신경망(ANN)을 통해 추정합니다. 밀폐된 공간에서 스피커의 추정 위치는 지연 및 합 빔 형성기에 대한 추정 시간 지연을 개선하여 출력 신호를 향상시키는 데 사용됩니다. 반면 지향각은 화자가 말하는 동안 특정 방향을 향하고 있다고 가정하여 인식 단계에서 사용되는 어휘를 제한하는 데 사용됩니다. 짧은 프레임 분석 창 내 전송 채널의 영향을 보상하기 위해 GMM(Gaussian Mixture Model) 기반의 새로운 CMN(Cepstral Mean Normalization) 방법이 연구되었으며 짧은 발화에 대해 기존 CMN보다 더 나은 성능을 나타냅니다. 제안된 방법의 성능은 일본어 숫자/명령어 인식 실험을 통해 평가된다.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Alberto Yoshihiro NAKANO, Seiichi NAKAGAWA, Kazumasa YAMAMOTO, "Distant Speech Recognition Using a Microphone Array Network" in IEICE TRANSACTIONS on Information,
vol. E93-D, no. 9, pp. 2451-2462, September 2010, doi: 10.1587/transinf.E93.D.2451.
Abstract: In this work, spatial information consisting of the position and orientation angle of an acoustic source is estimated by an artificial neural network (ANN). The estimated position of a speaker in an enclosed space is used to refine the estimated time delays for a delay-and-sum beamformer, thus enhancing the output signal. On the other hand, the orientation angle is used to restrict the lexicon used in the recognition phase, assuming that the speaker faces a particular direction while speaking. To compensate the effect of the transmission channel inside a short frame analysis window, a new cepstral mean normalization (CMN) method based on a Gaussian mixture model (GMM) is investigated and shows better performance than the conventional CMN for short utterances. The performance of the proposed method is evaluated through Japanese digit/command recognition experiments.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E93.D.2451/_p
부
@ARTICLE{e93-d_9_2451,
author={Alberto Yoshihiro NAKANO, Seiichi NAKAGAWA, Kazumasa YAMAMOTO, },
journal={IEICE TRANSACTIONS on Information},
title={Distant Speech Recognition Using a Microphone Array Network},
year={2010},
volume={E93-D},
number={9},
pages={2451-2462},
abstract={In this work, spatial information consisting of the position and orientation angle of an acoustic source is estimated by an artificial neural network (ANN). The estimated position of a speaker in an enclosed space is used to refine the estimated time delays for a delay-and-sum beamformer, thus enhancing the output signal. On the other hand, the orientation angle is used to restrict the lexicon used in the recognition phase, assuming that the speaker faces a particular direction while speaking. To compensate the effect of the transmission channel inside a short frame analysis window, a new cepstral mean normalization (CMN) method based on a Gaussian mixture model (GMM) is investigated and shows better performance than the conventional CMN for short utterances. The performance of the proposed method is evaluated through Japanese digit/command recognition experiments.},
keywords={},
doi={10.1587/transinf.E93.D.2451},
ISSN={1745-1361},
month={September},}
부
TY - JOUR
TI - Distant Speech Recognition Using a Microphone Array Network
T2 - IEICE TRANSACTIONS on Information
SP - 2451
EP - 2462
AU - Alberto Yoshihiro NAKANO
AU - Seiichi NAKAGAWA
AU - Kazumasa YAMAMOTO
PY - 2010
DO - 10.1587/transinf.E93.D.2451
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E93-D
IS - 9
JA - IEICE TRANSACTIONS on Information
Y1 - September 2010
AB - In this work, spatial information consisting of the position and orientation angle of an acoustic source is estimated by an artificial neural network (ANN). The estimated position of a speaker in an enclosed space is used to refine the estimated time delays for a delay-and-sum beamformer, thus enhancing the output signal. On the other hand, the orientation angle is used to restrict the lexicon used in the recognition phase, assuming that the speaker faces a particular direction while speaking. To compensate the effect of the transmission channel inside a short frame analysis window, a new cepstral mean normalization (CMN) method based on a Gaussian mixture model (GMM) is investigated and shows better performance than the conventional CMN for short utterances. The performance of the proposed method is evaluated through Japanese digit/command recognition experiments.
ER -