The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
이 기사에서는 연속 보코더에서 잔여 윙윙거림을 제거할 수 있는 "연속 잡음 마스킹(cNM)"이라는 방법을 제안합니다. 즉, 모든 매개변수가 연속적이며 간단하고 유연한 음성 분석 및 합성 시스템을 제공합니다. 전통적인 파라메트릭 보코더는 일반적으로 서로 다른 처리 알고리즘으로 인해 합성된 음성의 품질이 눈에 띄게 저하되는 것을 보여줍니다. 또한, 부정확한 소음 재합성(예: 호흡곤란 또는 쉰 목소리)도 성능 저하의 주요 근본 원인 중 하나로 간주되어 합성된 음성에서 시끄러운 과도 현상과 시간적 불연속성을 초래합니다. 이러한 문제를 극복하기 위해 잔류 잡음의 지각 효과를 줄이고 잡음 특성을 적절하게 재구성하고 자연스러운 음성에서 발생할 수 있는 삐걱거리는 음성 세그먼트를 더 잘 모델링하기 위해 위상 왜곡 편차를 기반으로 새로운 cNM이 개발되었습니다. 이를 위해 cNM은 cNM 임계값 조건 하에서 음성 구성 요소만 유지하고 다른 구성 요소는 폐기하도록 설계되었습니다. 우리는 제안된 접근 방식을 평가하고 객관적이고 주관적인 청취 테스트를 사용하여 최첨단 보코더와 비교합니다. 실험 결과는 제안된 방법이 잔류 잡음의 영향을 줄일 수 있고 STRAIGHT 및 로그 도메인 펄스 모델(PML)과 같은 다른 정교한 접근 방식의 품질에 도달할 수 있음을 보여줍니다.
Mohammed Salah AL-RADHI
Budapest University of Technology and Economics
Tamás Gábor CSAPÓ
Budapest University of Technology and Economics,MTA-ELTE Lendület Lingual Articulation Research Group
Géza NÉMETH
Budapest University of Technology and Economics
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Mohammed Salah AL-RADHI, Tamás Gábor CSAPÓ, Géza NÉMETH, "Continuous Noise Masking Based Vocoder for Statistical Parametric Speech Synthesis" in IEICE TRANSACTIONS on Information,
vol. E103-D, no. 5, pp. 1099-1107, May 2020, doi: 10.1587/transinf.2019EDP7167.
Abstract: In this article, we propose a method called “continuous noise masking (cNM)” that allows eliminating residual buzziness in a continuous vocoder, i.e. of which all parameters are continuous and offers a simple and flexible speech analysis and synthesis system. Traditional parametric vocoders generally show a perceptible deterioration in the quality of the synthesized speech due to different processing algorithms. Furthermore, an inaccurate noise resynthesis (e.g. in breathiness or hoarseness) is also considered to be one of the main underlying causes of performance degradation, leading to noisy transients and temporal discontinuity in the synthesized speech. To overcome these issues, a new cNM is developed based on the phase distortion deviation in order to reduce the perceptual effect of the residual noise, allowing a proper reconstruction of noise characteristics, and model better the creaky voice segments that may happen in natural speech. To this end, the cNM is designed to keep only voice components under a condition of the cNM threshold while discarding others. We evaluate the proposed approach and compare with state-of-the-art vocoders using objective and subjective listening tests. Experimental results show that the proposed method can reduce the effect of residual noise and can reach the quality of other sophisticated approaches like STRAIGHT and log domain pulse model (PML).
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2019EDP7167/_p
부
@ARTICLE{e103-d_5_1099,
author={Mohammed Salah AL-RADHI, Tamás Gábor CSAPÓ, Géza NÉMETH, },
journal={IEICE TRANSACTIONS on Information},
title={Continuous Noise Masking Based Vocoder for Statistical Parametric Speech Synthesis},
year={2020},
volume={E103-D},
number={5},
pages={1099-1107},
abstract={In this article, we propose a method called “continuous noise masking (cNM)” that allows eliminating residual buzziness in a continuous vocoder, i.e. of which all parameters are continuous and offers a simple and flexible speech analysis and synthesis system. Traditional parametric vocoders generally show a perceptible deterioration in the quality of the synthesized speech due to different processing algorithms. Furthermore, an inaccurate noise resynthesis (e.g. in breathiness or hoarseness) is also considered to be one of the main underlying causes of performance degradation, leading to noisy transients and temporal discontinuity in the synthesized speech. To overcome these issues, a new cNM is developed based on the phase distortion deviation in order to reduce the perceptual effect of the residual noise, allowing a proper reconstruction of noise characteristics, and model better the creaky voice segments that may happen in natural speech. To this end, the cNM is designed to keep only voice components under a condition of the cNM threshold while discarding others. We evaluate the proposed approach and compare with state-of-the-art vocoders using objective and subjective listening tests. Experimental results show that the proposed method can reduce the effect of residual noise and can reach the quality of other sophisticated approaches like STRAIGHT and log domain pulse model (PML).},
keywords={},
doi={10.1587/transinf.2019EDP7167},
ISSN={1745-1361},
month={May},}
부
TY - JOUR
TI - Continuous Noise Masking Based Vocoder for Statistical Parametric Speech Synthesis
T2 - IEICE TRANSACTIONS on Information
SP - 1099
EP - 1107
AU - Mohammed Salah AL-RADHI
AU - Tamás Gábor CSAPÓ
AU - Géza NÉMETH
PY - 2020
DO - 10.1587/transinf.2019EDP7167
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E103-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2020
AB - In this article, we propose a method called “continuous noise masking (cNM)” that allows eliminating residual buzziness in a continuous vocoder, i.e. of which all parameters are continuous and offers a simple and flexible speech analysis and synthesis system. Traditional parametric vocoders generally show a perceptible deterioration in the quality of the synthesized speech due to different processing algorithms. Furthermore, an inaccurate noise resynthesis (e.g. in breathiness or hoarseness) is also considered to be one of the main underlying causes of performance degradation, leading to noisy transients and temporal discontinuity in the synthesized speech. To overcome these issues, a new cNM is developed based on the phase distortion deviation in order to reduce the perceptual effect of the residual noise, allowing a proper reconstruction of noise characteristics, and model better the creaky voice segments that may happen in natural speech. To this end, the cNM is designed to keep only voice components under a condition of the cNM threshold while discarding others. We evaluate the proposed approach and compare with state-of-the-art vocoders using objective and subjective listening tests. Experimental results show that the proposed method can reduce the effect of residual noise and can reach the quality of other sophisticated approaches like STRAIGHT and log domain pulse model (PML).
ER -