The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
본 논문에서는 성도 특성과 음높이 주기를 특징 변수로 활용하는 새로운 음성 성격 변환 알고리즘을 제안한다. 성도 전달 기능은 시불변 부분과 시변 부분으로 구분됩니다. 시변 부분에 대한 변환 규칙은 KL(Karhunen-Loève) 계수로 표현되는 LPC 켑스트럼에 대한 소프트 클러스터링 기술을 기반으로 분류된 선형 변환 행렬로 구성됩니다. 운율 정보를 포함하는 여기 신호는 평균 피치 비율로 변환됩니다. 자연스러움을 향상시키기 위해 여기 신호의 변환은 전체 스펙트럼 구조를 보존하기 위해 유성 대역과 무성 대역에 별도로 적용됩니다. 객관적인 테스트를 통해 제안된 방법으로 합성된 음성의 LPC 켑스트럼과 타겟 화자의 LPC 켑스트럼 사이의 거리가 소스 화자의 LPC 켑스트럼과 타겟 화자의 LPC 켑스트럼 사이의 거리에 비해 약 70% 감소하는 것을 확인하였다. 또한 주관적 청취 테스트에서는 청취자의 60~70%가 변환된 음성을 대상 화자의 음성으로 식별하는 것으로 나타났습니다.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Ki-Seung LEE, Won DOH, Dae-Hee YOUN, "Voice Conversion Using Low Dimensional Vector Mapping" in IEICE TRANSACTIONS on Information,
vol. E85-D, no. 8, pp. 1297-1305, August 2002, doi: .
Abstract: In this paper, a new voice personality transformation algorithm which uses the vocal tract characteristics and pitch period as feature parameters is proposed. The vocal tract transfer function is divided into time-invariant and time-varying parts. Conversion rules for the time-varying part are constructed by the classified-linear transformation matrix based on soft-clustering techniques for LPC cepstrum expressed in KL (Karhunen-Loève) coefficients. An excitation signal containing prosodic information is transformed by average pitch ratio. In order to improve the naturalness, transformation on the excitation signal is separately applied to voiced and unvoiced bands to preserve the overall spectral structure. Objective tests show that the distance between the LPC cepstrum of a target speaker and that of the speech synthesized using the proposed method is reduced by about 70% compared with the distance between the target speaker's LPC cepstrum and the source speaker's. Also, subjective listening tests show that 60-70% of listeners identify the transformed speech as the target speaker's.
URL: https://global.ieice.org/en_transactions/information/10.1587/e85-d_8_1297/_p
부
@ARTICLE{e85-d_8_1297,
author={Ki-Seung LEE, Won DOH, Dae-Hee YOUN, },
journal={IEICE TRANSACTIONS on Information},
title={Voice Conversion Using Low Dimensional Vector Mapping},
year={2002},
volume={E85-D},
number={8},
pages={1297-1305},
abstract={In this paper, a new voice personality transformation algorithm which uses the vocal tract characteristics and pitch period as feature parameters is proposed. The vocal tract transfer function is divided into time-invariant and time-varying parts. Conversion rules for the time-varying part are constructed by the classified-linear transformation matrix based on soft-clustering techniques for LPC cepstrum expressed in KL (Karhunen-Loève) coefficients. An excitation signal containing prosodic information is transformed by average pitch ratio. In order to improve the naturalness, transformation on the excitation signal is separately applied to voiced and unvoiced bands to preserve the overall spectral structure. Objective tests show that the distance between the LPC cepstrum of a target speaker and that of the speech synthesized using the proposed method is reduced by about 70% compared with the distance between the target speaker's LPC cepstrum and the source speaker's. Also, subjective listening tests show that 60-70% of listeners identify the transformed speech as the target speaker's.},
keywords={},
doi={},
ISSN={},
month={August},}
부
TY - JOUR
TI - Voice Conversion Using Low Dimensional Vector Mapping
T2 - IEICE TRANSACTIONS on Information
SP - 1297
EP - 1305
AU - Ki-Seung LEE
AU - Won DOH
AU - Dae-Hee YOUN
PY - 2002
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E85-D
IS - 8
JA - IEICE TRANSACTIONS on Information
Y1 - August 2002
AB - In this paper, a new voice personality transformation algorithm which uses the vocal tract characteristics and pitch period as feature parameters is proposed. The vocal tract transfer function is divided into time-invariant and time-varying parts. Conversion rules for the time-varying part are constructed by the classified-linear transformation matrix based on soft-clustering techniques for LPC cepstrum expressed in KL (Karhunen-Loève) coefficients. An excitation signal containing prosodic information is transformed by average pitch ratio. In order to improve the naturalness, transformation on the excitation signal is separately applied to voiced and unvoiced bands to preserve the overall spectral structure. Objective tests show that the distance between the LPC cepstrum of a target speaker and that of the speech synthesized using the proposed method is reduced by about 70% compared with the distance between the target speaker's LPC cepstrum and the source speaker's. Also, subjective listening tests show that 60-70% of listeners identify the transformed speech as the target speaker's.
ER -