The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
화자 인식에서 가장 중요한 과제 중 하나는 ISV(Intersession variability), 주로 채널 간 효과입니다. 최근 NIST 화자 인식 평가(SRE)에는 여러 다른 언어로 수집된 다국어 화자가 포함된 교육 대화가 포함된 다국어 시나리오가 포함되어 있어 성능이 더욱 저하됩니다. 이에 대한 한 가지 중요한 이유는 점점 더 많은 연구자들이 화자 인식을 향상시키기 위해 높은 수준의 정보를 도입하기 위해 음성 클러스터링을 사용하고 있다는 것입니다. 그러나 이러한 언어 의존적 방법은 다국어 환경에서는 잘 작동하지 않습니다. 본 논문에서는 SVM(Support Vector Machine) 화자 인식 시스템을 사용하여 언어와 채널 불일치를 모두 연구합니다. UBM(Universal Background Model)을 적용한 MLLR(Maximum Likelihood Linear Regression) 변환이 기능으로 채택되었습니다. 먼저 다국어 효과를 줄이기 위해 새로운 언어 독립적 통계 이진 결정 트리를 소개하고 이 데이터 기반 접근 방식을 전통 지식 기반 접근 방식과 비교합니다. 또한 모델 도메인에서 기능 도메인 LFA(Latent Factor Analysis) 및 MLLR 슈퍼벡터 커널 기반 NAP(Nuisance Attribute Projection)를 사용하여 채널 보상을 위한 프레임워크를 구축합니다. NIST SRE 2006 1conv4w-1conv4w/mic 자료의 결과는 상당한 개선을 보여줍니다. 또한 보상된 MLLR-SVM 시스템을 최첨단 켑스트럴 가우스 혼합 및 SVM 시스템과 비교하고 추가 개선을 위해 결합합니다.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Shan ZHONG, Yuxiang SHAN, Liang HE, Jia LIU, "Research on Intersession Variability Compensation for MLLR-SVM Speaker Recognition" in IEICE TRANSACTIONS on Fundamentals,
vol. E92-A, no. 8, pp. 1892-1897, August 2009, doi: 10.1587/transfun.E92.A.1892.
Abstract: One of the most important challenges in speaker recognition is intersession variability (ISV), primarily cross-channel effects. Recent NIST speaker recognition evaluations (SRE) include a multilingual scenario with training conversations involving multilingual speakers collected in a number of other languages, leading to further performance decline. One important reason for this is that more and more researchers are using phonetic clustering to introduce high level information to improve speaker recognition. But such language dependent methods do not work well in multilingual conditions. In this paper, we study both language and channel mismatch using a support vector machine (SVM) speaker recognition system. Maximum likelihood linear regression (MLLR) transforms adapting a universal background model (UBM) are adopted as features. We first introduce a novel language independent statistical binary-decision tree to reduce multi-language effects, and compare this data-driven approach with a traditional knowledge based one. We also construct a framework for channel compensation using feature-domain latent factor analysis (LFA) and MLLR supervector kernel-based nuisance attribute projection (NAP) in the model-domain. Results on the NIST SRE 2006 1conv4w-1conv4w/mic corpus show significant improvement. We also compare our compensated MLLR-SVM system with state-of-the-art cepstral Gaussian mixture and SVM systems, and combine them for a further improvement.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.E92.A.1892/_p
부
@ARTICLE{e92-a_8_1892,
author={Shan ZHONG, Yuxiang SHAN, Liang HE, Jia LIU, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Research on Intersession Variability Compensation for MLLR-SVM Speaker Recognition},
year={2009},
volume={E92-A},
number={8},
pages={1892-1897},
abstract={One of the most important challenges in speaker recognition is intersession variability (ISV), primarily cross-channel effects. Recent NIST speaker recognition evaluations (SRE) include a multilingual scenario with training conversations involving multilingual speakers collected in a number of other languages, leading to further performance decline. One important reason for this is that more and more researchers are using phonetic clustering to introduce high level information to improve speaker recognition. But such language dependent methods do not work well in multilingual conditions. In this paper, we study both language and channel mismatch using a support vector machine (SVM) speaker recognition system. Maximum likelihood linear regression (MLLR) transforms adapting a universal background model (UBM) are adopted as features. We first introduce a novel language independent statistical binary-decision tree to reduce multi-language effects, and compare this data-driven approach with a traditional knowledge based one. We also construct a framework for channel compensation using feature-domain latent factor analysis (LFA) and MLLR supervector kernel-based nuisance attribute projection (NAP) in the model-domain. Results on the NIST SRE 2006 1conv4w-1conv4w/mic corpus show significant improvement. We also compare our compensated MLLR-SVM system with state-of-the-art cepstral Gaussian mixture and SVM systems, and combine them for a further improvement.},
keywords={},
doi={10.1587/transfun.E92.A.1892},
ISSN={1745-1337},
month={August},}
부
TY - JOUR
TI - Research on Intersession Variability Compensation for MLLR-SVM Speaker Recognition
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 1892
EP - 1897
AU - Shan ZHONG
AU - Yuxiang SHAN
AU - Liang HE
AU - Jia LIU
PY - 2009
DO - 10.1587/transfun.E92.A.1892
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E92-A
IS - 8
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - August 2009
AB - One of the most important challenges in speaker recognition is intersession variability (ISV), primarily cross-channel effects. Recent NIST speaker recognition evaluations (SRE) include a multilingual scenario with training conversations involving multilingual speakers collected in a number of other languages, leading to further performance decline. One important reason for this is that more and more researchers are using phonetic clustering to introduce high level information to improve speaker recognition. But such language dependent methods do not work well in multilingual conditions. In this paper, we study both language and channel mismatch using a support vector machine (SVM) speaker recognition system. Maximum likelihood linear regression (MLLR) transforms adapting a universal background model (UBM) are adopted as features. We first introduce a novel language independent statistical binary-decision tree to reduce multi-language effects, and compare this data-driven approach with a traditional knowledge based one. We also construct a framework for channel compensation using feature-domain latent factor analysis (LFA) and MLLR supervector kernel-based nuisance attribute projection (NAP) in the model-domain. Results on the NIST SRE 2006 1conv4w-1conv4w/mic corpus show significant improvement. We also compare our compensated MLLR-SVM system with state-of-the-art cepstral Gaussian mixture and SVM systems, and combine them for a further improvement.
ER -