The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
본 논문에서는 HMM 기반 음성 합성에서 소량의 화자의 음성 데이터를 이용하여 임의 화자의 합성 음성의 감정 표현 강도와 화법을 제어하는 방법을 제시한다. 모델 적응 접근법은 다중 회귀 숨겨진 세미 마르코프 모델(MRHSMM)을 기반으로 하는 스타일 제어 기술에 도입되었습니다. 대상 화자의 MRHSMM을 훈련하기 위해 두 가지 다른 접근 방식이 제안됩니다. 첫 번째는 사전 훈련된 MRHSMM이 대상 화자의 모델에 적응되는 MRHSMM 기반 모델 적응입니다. 이를 위해 우리는 MRHSMM에 대한 MLLR 적응 알고리즘을 공식화합니다. 두 번째 방법은 MRHSMM의 초기화에 사용되는 대상 화자의 스타일 종속 HSMM을 얻기 위해 평균 음성 모델에서 화자와 스타일의 동시 적응을 활용합니다. 스타일별 50개 문장의 적응 데이터를 이용한 주관적 평가 결과, 제안한 방법이 목표 화자의 동일한 크기의 음성 데이터를 사용할 때 기존의 화자 종속 모델 훈련보다 우수한 성능을 보이는 것을 확인했습니다.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Takashi NOSE, Makoto TACHIBANA, Takao KOBAYASHI, "HMM-Based Style Control for Expressive Speech Synthesis with Arbitrary Speaker's Voice Using Model Adaptation" in IEICE TRANSACTIONS on Information,
vol. E92-D, no. 3, pp. 489-497, March 2009, doi: 10.1587/transinf.E92.D.489.
Abstract: This paper presents methods for controlling the intensity of emotional expressions and speaking styles of an arbitrary speaker's synthetic speech by using a small amount of his/her speech data in HMM-based speech synthesis. Model adaptation approaches are introduced into the style control technique based on the multiple-regression hidden semi-Markov model (MRHSMM). Two different approaches are proposed for training a target speaker's MRHSMMs. The first one is MRHSMM-based model adaptation in which the pretrained MRHSMM is adapted to the target speaker's model. For this purpose, we formulate the MLLR adaptation algorithm for the MRHSMM. The second method utilizes simultaneous adaptation of speaker and style from an average voice model to obtain the target speaker's style-dependent HSMMs which are used for the initialization of the MRHSMM. From the result of subjective evaluation using adaptation data of 50 sentences of each style, we show that the proposed methods outperform the conventional speaker-dependent model training when using the same size of speech data of the target speaker.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E92.D.489/_p
부
@ARTICLE{e92-d_3_489,
author={Takashi NOSE, Makoto TACHIBANA, Takao KOBAYASHI, },
journal={IEICE TRANSACTIONS on Information},
title={HMM-Based Style Control for Expressive Speech Synthesis with Arbitrary Speaker's Voice Using Model Adaptation},
year={2009},
volume={E92-D},
number={3},
pages={489-497},
abstract={This paper presents methods for controlling the intensity of emotional expressions and speaking styles of an arbitrary speaker's synthetic speech by using a small amount of his/her speech data in HMM-based speech synthesis. Model adaptation approaches are introduced into the style control technique based on the multiple-regression hidden semi-Markov model (MRHSMM). Two different approaches are proposed for training a target speaker's MRHSMMs. The first one is MRHSMM-based model adaptation in which the pretrained MRHSMM is adapted to the target speaker's model. For this purpose, we formulate the MLLR adaptation algorithm for the MRHSMM. The second method utilizes simultaneous adaptation of speaker and style from an average voice model to obtain the target speaker's style-dependent HSMMs which are used for the initialization of the MRHSMM. From the result of subjective evaluation using adaptation data of 50 sentences of each style, we show that the proposed methods outperform the conventional speaker-dependent model training when using the same size of speech data of the target speaker.},
keywords={},
doi={10.1587/transinf.E92.D.489},
ISSN={1745-1361},
month={March},}
부
TY - JOUR
TI - HMM-Based Style Control for Expressive Speech Synthesis with Arbitrary Speaker's Voice Using Model Adaptation
T2 - IEICE TRANSACTIONS on Information
SP - 489
EP - 497
AU - Takashi NOSE
AU - Makoto TACHIBANA
AU - Takao KOBAYASHI
PY - 2009
DO - 10.1587/transinf.E92.D.489
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E92-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2009
AB - This paper presents methods for controlling the intensity of emotional expressions and speaking styles of an arbitrary speaker's synthetic speech by using a small amount of his/her speech data in HMM-based speech synthesis. Model adaptation approaches are introduced into the style control technique based on the multiple-regression hidden semi-Markov model (MRHSMM). Two different approaches are proposed for training a target speaker's MRHSMMs. The first one is MRHSMM-based model adaptation in which the pretrained MRHSMM is adapted to the target speaker's model. For this purpose, we formulate the MLLR adaptation algorithm for the MRHSMM. The second method utilizes simultaneous adaptation of speaker and style from an average voice model to obtain the target speaker's style-dependent HSMMs which are used for the initialization of the MRHSMM. From the result of subjective evaluation using adaptation data of 50 sentences of each style, we show that the proposed methods outperform the conventional speaker-dependent model training when using the same size of speech data of the target speaker.
ER -