The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
본 논문에서는 프레임과 상태 선택을 통한 상태 수준의 로그 우도 비율을 이용한 발화 검증 시스템을 제안한다. 음성 인식 및 발화 검증을 위해 은닉 마르코프 모델을 음향 모델과 안티폰 모델로 사용합니다. 은닉 마르코프 모델에는 세 가지 상태가 있으며 각 상태는 전화기의 다양한 특성을 나타냅니다. 따라서 우리는 인식된 전화기에 대한 보다 신뢰할 수 있는 신뢰도 측정을 얻기 위해 상태 수준의 로그 우도 비율을 계산하고 상태에 가중치를 부여하는 알고리즘을 제안합니다. 추가적으로 입력 음성에 적절한 음성이 포함된 프레임에 대한 신뢰도를 계산하는 프레임 선택 알고리즘을 제안합니다. 일반적으로 화자 독립 음성 인식 시스템에서 얻은 음소 분할 정보는 정확하지 않습니다. 왜냐하면 삼음 기반 음향 모델은 다양한 발음과 조음 효과를 다루기 위해 효과적으로 훈련하기 어렵기 때문입니다. 따라서 상태 분할 정보를 얻을 때 올바른 일치 상태를 찾는 것이 더 어렵습니다. 유효한 상태를 찾기 위해 상태 선택 알고리즘이 제안됩니다. 프레임 및 상태 선택과 함께 상태 수준 로그 우도 비율을 사용하는 제안된 방법은 간단한 전화 수준 로그 우도 비율을 사용하는 기준 시스템에 비해 동일 오류율의 상대적 감소가 18.1%임을 보여줍니다.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Suk-Bong KWON, Hoirin KIM, "Utterance Verification Using State-Level Log-Likelihood Ratio with Frame and State Selection" in IEICE TRANSACTIONS on Information,
vol. E93-D, no. 3, pp. 647-650, March 2010, doi: 10.1587/transinf.E93.D.647.
Abstract: This paper suggests utterance verification system using state-level log-likelihood ratio with frame and state selection. We use hidden Markov models for speech recognition and utterance verification as acoustic models and anti-phone models. The hidden Markov models have three states and each state represents different characteristics of a phone. Thus we propose an algorithm to compute state-level log-likelihood ratio and give weights on states for obtaining more reliable confidence measure of recognized phones. Additionally, we propose a frame selection algorithm to compute confidence measure on frames including proper speech in the input speech. In general, phone segmentation information obtained from speaker-independent speech recognition system is not accurate because triphone-based acoustic models are difficult to effectively train for covering diverse pronunciation and coarticulation effect. So, it is more difficult to find the right matched states when obtaining state segmentation information. A state selection algorithm is suggested for finding valid states. The proposed method using state-level log-likelihood ratio with frame and state selection shows that the relative reduction in equal error rate is 18.1% compared to the baseline system using simple phone-level log-likelihood ratios.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E93.D.647/_p
부
@ARTICLE{e93-d_3_647,
author={Suk-Bong KWON, Hoirin KIM, },
journal={IEICE TRANSACTIONS on Information},
title={Utterance Verification Using State-Level Log-Likelihood Ratio with Frame and State Selection},
year={2010},
volume={E93-D},
number={3},
pages={647-650},
abstract={This paper suggests utterance verification system using state-level log-likelihood ratio with frame and state selection. We use hidden Markov models for speech recognition and utterance verification as acoustic models and anti-phone models. The hidden Markov models have three states and each state represents different characteristics of a phone. Thus we propose an algorithm to compute state-level log-likelihood ratio and give weights on states for obtaining more reliable confidence measure of recognized phones. Additionally, we propose a frame selection algorithm to compute confidence measure on frames including proper speech in the input speech. In general, phone segmentation information obtained from speaker-independent speech recognition system is not accurate because triphone-based acoustic models are difficult to effectively train for covering diverse pronunciation and coarticulation effect. So, it is more difficult to find the right matched states when obtaining state segmentation information. A state selection algorithm is suggested for finding valid states. The proposed method using state-level log-likelihood ratio with frame and state selection shows that the relative reduction in equal error rate is 18.1% compared to the baseline system using simple phone-level log-likelihood ratios.},
keywords={},
doi={10.1587/transinf.E93.D.647},
ISSN={1745-1361},
month={March},}
부
TY - JOUR
TI - Utterance Verification Using State-Level Log-Likelihood Ratio with Frame and State Selection
T2 - IEICE TRANSACTIONS on Information
SP - 647
EP - 650
AU - Suk-Bong KWON
AU - Hoirin KIM
PY - 2010
DO - 10.1587/transinf.E93.D.647
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E93-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2010
AB - This paper suggests utterance verification system using state-level log-likelihood ratio with frame and state selection. We use hidden Markov models for speech recognition and utterance verification as acoustic models and anti-phone models. The hidden Markov models have three states and each state represents different characteristics of a phone. Thus we propose an algorithm to compute state-level log-likelihood ratio and give weights on states for obtaining more reliable confidence measure of recognized phones. Additionally, we propose a frame selection algorithm to compute confidence measure on frames including proper speech in the input speech. In general, phone segmentation information obtained from speaker-independent speech recognition system is not accurate because triphone-based acoustic models are difficult to effectively train for covering diverse pronunciation and coarticulation effect. So, it is more difficult to find the right matched states when obtaining state segmentation information. A state selection algorithm is suggested for finding valid states. The proposed method using state-level log-likelihood ratio with frame and state selection shows that the relative reduction in equal error rate is 18.1% compared to the baseline system using simple phone-level log-likelihood ratios.
ER -