The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
우리는 Nagoya Institute of Technology(Nitech)와 Nara Institute of Science and Technology(NAIST)의 공동 그룹이 개발한 통계적 매개변수 음성 합성 시스템에 대해 설명합니다. 블리자드 챌린지 2006. 2005년 시스템(Nitech-HTS 2005)을 개선하기 위해 MGC-LSP(멜 일반화 켑스트럼 기반 선 스펙트럼 쌍), MLLT(최대 우도 선형 변환) 및 전체 공분산 GV(전역 분산)와 같은 새로운 기능을 조사했습니다. 확률 밀도 함수(pdf). mel-cepstral 계수, MLLT 및 전체 공분산 GV pdf의 조합은 주관적 청취 테스트에서 가장 높은 점수를 얻었으며 2006 시스템은 2005 시스템보다 훨씬 더 나은 성능을 보였습니다. Blizzard Challenge 2006 평가에 따르면 Nitech-NAIST-HTS 2006은 상대적으로 큰 규모의 음성 데이터베이스를 사용하는 경우에도 경쟁력이 있는 것으로 나타났습니다.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Heiga ZEN, Tomoki TODA, Keiichi TOKUDA, "The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006" in IEICE TRANSACTIONS on Information,
vol. E91-D, no. 6, pp. 1764-1773, June 2008, doi: 10.1093/ietisy/e91-d.6.1764.
Abstract: We describe a statistical parametric speech synthesis system developed by a joint group from the Nagoya Institute of Technology (Nitech) and the Nara Institute of Science and Technology (NAIST) for the annual open evaluation of text-to-speech synthesis systems named Blizzard Challenge 2006. To improve our 2005 system (Nitech-HTS 2005), we investigated new features such as mel-generalized cepstrum-based line spectral pairs (MGC-LSPs), maximum likelihood linear transform (MLLT), and a full covariance global variance (GV) probability density function (pdf). A combination of mel-cepstral coefficients, MLLT, and full covariance GV pdf scored highest in subjective listening tests, and the 2006 system performed significantly better than the 2005 system. The Blizzard Challenge 2006 evaluations show that Nitech-NAIST-HTS 2006 is competitive even when working with relatively large speech databases.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e91-d.6.1764/_p
부
@ARTICLE{e91-d_6_1764,
author={Heiga ZEN, Tomoki TODA, Keiichi TOKUDA, },
journal={IEICE TRANSACTIONS on Information},
title={The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006},
year={2008},
volume={E91-D},
number={6},
pages={1764-1773},
abstract={We describe a statistical parametric speech synthesis system developed by a joint group from the Nagoya Institute of Technology (Nitech) and the Nara Institute of Science and Technology (NAIST) for the annual open evaluation of text-to-speech synthesis systems named Blizzard Challenge 2006. To improve our 2005 system (Nitech-HTS 2005), we investigated new features such as mel-generalized cepstrum-based line spectral pairs (MGC-LSPs), maximum likelihood linear transform (MLLT), and a full covariance global variance (GV) probability density function (pdf). A combination of mel-cepstral coefficients, MLLT, and full covariance GV pdf scored highest in subjective listening tests, and the 2006 system performed significantly better than the 2005 system. The Blizzard Challenge 2006 evaluations show that Nitech-NAIST-HTS 2006 is competitive even when working with relatively large speech databases.},
keywords={},
doi={10.1093/ietisy/e91-d.6.1764},
ISSN={1745-1361},
month={June},}
부
TY - JOUR
TI - The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006
T2 - IEICE TRANSACTIONS on Information
SP - 1764
EP - 1773
AU - Heiga ZEN
AU - Tomoki TODA
AU - Keiichi TOKUDA
PY - 2008
DO - 10.1093/ietisy/e91-d.6.1764
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E91-D
IS - 6
JA - IEICE TRANSACTIONS on Information
Y1 - June 2008
AB - We describe a statistical parametric speech synthesis system developed by a joint group from the Nagoya Institute of Technology (Nitech) and the Nara Institute of Science and Technology (NAIST) for the annual open evaluation of text-to-speech synthesis systems named Blizzard Challenge 2006. To improve our 2005 system (Nitech-HTS 2005), we investigated new features such as mel-generalized cepstrum-based line spectral pairs (MGC-LSPs), maximum likelihood linear transform (MLLT), and a full covariance global variance (GV) probability density function (pdf). A combination of mel-cepstral coefficients, MLLT, and full covariance GV pdf scored highest in subjective listening tests, and the 2006 system performed significantly better than the 2005 system. The Blizzard Challenge 2006 evaluations show that Nitech-NAIST-HTS 2006 is competitive even when working with relatively large speech databases.
ER -