The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
본 논문에서는 HMM(Hidden Markov Model) 기반 음성 인식을 위한 교차 검증을 이용한 베이지안 문맥 클러스터링을 제안합니다. 베이지안 접근법은 모델 매개변수를 무작위 변수로 처리하여 신뢰할 수 있는 예측 분포를 추정하기 위한 통계 기법입니다. 베이지안 접근법의 효율적인 근사법으로 널리 사용되는 변형 베이지안 방법이 HMM 기반 음성인식에 적용되어 좋은 성능을 보인다. 또한 베이지안 접근 방식은 훈련 데이터의 양을 고려하면서 적절한 모델 구조를 선택할 수 있습니다. 모델 매개변수에 대한 사전 정보를 나타내는 사전 분포는 사후 분포 추정 및 모델 구조 선택(예: 의사결정 트리 기반 컨텍스트 클러스터링)에 영향을 미치므로 사전 분포를 결정하는 것이 중요한 문제입니다. 그러나 음성인식에서는 이에 대한 철저한 연구가 이루어지지 않았으며, 사전분포 결정기법도 제대로 수행되지 않았다. 제안된 방법은 튜닝 매개변수 없이도 신뢰할 수 있는 사전 분포를 결정할 수 있으며 훈련 데이터의 양을 고려하면서 적절한 모델 구조를 선택할 수 있습니다. 지속적인 음소 인식 실험을 통해 제안한 방법이 기존 방법보다 더 높은 성능을 보이는 것을 확인하였다.
베이지안 접근, 음성 인식, HMM, 컨텍스트 클러스터링, 교차 검증
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Kei HASHIMOTO, Heiga ZEN, Yoshihiko NANKAKU, Akinobu LEE, Keiichi TOKUDA, "Bayesian Context Clustering Using Cross Validation for Speech Recognition" in IEICE TRANSACTIONS on Information,
vol. E94-D, no. 3, pp. 668-678, March 2011, doi: 10.1587/transinf.E94.D.668.
Abstract: This paper proposes Bayesian context clustering using cross validation for hidden Markov model (HMM) based speech recognition. The Bayesian approach is a statistical technique for estimating reliable predictive distributions by treating model parameters as random variables. The variational Bayesian method, which is widely used as an efficient approximation of the Bayesian approach, has been applied to HMM-based speech recognition, and it shows good performance. Moreover, the Bayesian approach can select an appropriate model structure while taking account of the amount of training data. Since prior distributions which represent prior information about model parameters affect estimation of the posterior distributions and selection of model structure (e.g., decision tree based context clustering), the determination of prior distributions is an important problem. However, it has not been thoroughly investigated in speech recognition, and the determination technique of prior distributions has not performed well. The proposed method can determine reliable prior distributions without any tuning parameters and select an appropriate model structure while taking account of the amount of training data. Continuous phoneme recognition experiments show that the proposed method achieved a higher performance than the conventional methods.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E94.D.668/_p
부
@ARTICLE{e94-d_3_668,
author={Kei HASHIMOTO, Heiga ZEN, Yoshihiko NANKAKU, Akinobu LEE, Keiichi TOKUDA, },
journal={IEICE TRANSACTIONS on Information},
title={Bayesian Context Clustering Using Cross Validation for Speech Recognition},
year={2011},
volume={E94-D},
number={3},
pages={668-678},
abstract={This paper proposes Bayesian context clustering using cross validation for hidden Markov model (HMM) based speech recognition. The Bayesian approach is a statistical technique for estimating reliable predictive distributions by treating model parameters as random variables. The variational Bayesian method, which is widely used as an efficient approximation of the Bayesian approach, has been applied to HMM-based speech recognition, and it shows good performance. Moreover, the Bayesian approach can select an appropriate model structure while taking account of the amount of training data. Since prior distributions which represent prior information about model parameters affect estimation of the posterior distributions and selection of model structure (e.g., decision tree based context clustering), the determination of prior distributions is an important problem. However, it has not been thoroughly investigated in speech recognition, and the determination technique of prior distributions has not performed well. The proposed method can determine reliable prior distributions without any tuning parameters and select an appropriate model structure while taking account of the amount of training data. Continuous phoneme recognition experiments show that the proposed method achieved a higher performance than the conventional methods.},
keywords={},
doi={10.1587/transinf.E94.D.668},
ISSN={1745-1361},
month={March},}
부
TY - JOUR
TI - Bayesian Context Clustering Using Cross Validation for Speech Recognition
T2 - IEICE TRANSACTIONS on Information
SP - 668
EP - 678
AU - Kei HASHIMOTO
AU - Heiga ZEN
AU - Yoshihiko NANKAKU
AU - Akinobu LEE
AU - Keiichi TOKUDA
PY - 2011
DO - 10.1587/transinf.E94.D.668
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E94-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2011
AB - This paper proposes Bayesian context clustering using cross validation for hidden Markov model (HMM) based speech recognition. The Bayesian approach is a statistical technique for estimating reliable predictive distributions by treating model parameters as random variables. The variational Bayesian method, which is widely used as an efficient approximation of the Bayesian approach, has been applied to HMM-based speech recognition, and it shows good performance. Moreover, the Bayesian approach can select an appropriate model structure while taking account of the amount of training data. Since prior distributions which represent prior information about model parameters affect estimation of the posterior distributions and selection of model structure (e.g., decision tree based context clustering), the determination of prior distributions is an important problem. However, it has not been thoroughly investigated in speech recognition, and the determination technique of prior distributions has not performed well. The proposed method can determine reliable prior distributions without any tuning parameters and select an appropriate model structure while taking account of the amount of training data. Continuous phoneme recognition experiments show that the proposed method achieved a higher performance than the conventional methods.
ER -