The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
스펙트럼 엔벨로프 매개변수는 보코더 품질에 있어 중요한 음성 매개변수입니다. 최근 VQ-VAE(Vector Quantized Variational AutoEncoder)는 딥러닝 모델을 기반으로 하는 최첨단 엔드투엔드 양자화 방법입니다. 본 논문에서는 VQ-VAE-EMGAN이라는 스펙트럼 포락선 매개변수를 양자화하기 위한 생성적 적대 신경망(Generative Adversarial Network)을 사용하여 VQ-VAE의 임베딩 공간 학습을 개선하는 새로운 기술을 제안했습니다. 실험에서는 16kHz 음성 파형에서 추출된 WORLD 보코더의 스펙트럼 포락선 매개변수에 대한 양자화기를 설계했습니다. 결과에서 볼 수 있듯이 제안하는 기법은 기존 VQ-VAE에 비해 0.5개의 타겟 비트 연산에 대해 평균적으로 LSD(Log Spectral Distortion)를 0.17dB 정도 감소시키고 PESQ를 XNUMX 정도 증가시키는 것으로 나타났다.
Tanasan SRIKOTR
Shibaura Institute of Technology
Kazunori MANO
Shibaura Institute of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Tanasan SRIKOTR, Kazunori MANO, "Vector Quantization of Speech Spectrum Based on the VQ-VAE Embedding Space Learning by GAN Technique" in IEICE TRANSACTIONS on Fundamentals,
vol. E105-A, no. 4, pp. 647-654, April 2022, doi: 10.1587/transfun.2021SMP0018.
Abstract: The spectral envelope parameter is a significant speech parameter in the vocoder's quality. Recently, the Vector Quantized Variational AutoEncoder (VQ-VAE) is a state-of-the-art end-to-end quantization method based on the deep learning model. This paper proposed a new technique for improving the embedding space learning of VQ-VAE with the Generative Adversarial Network for quantizing the spectral envelope parameter, called VQ-VAE-EMGAN. In experiments, we designed the quantizer for the spectral envelope parameters of the WORLD vocoder extracted from the 16kHz speech waveform. As the results shown, the proposed technique reduced the Log Spectral Distortion (LSD) around 0.5dB and increased the PESQ by around 0.17 on average for four target bit operations compared to the conventional VQ-VAE.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.2021SMP0018/_p
부
@ARTICLE{e105-a_4_647,
author={Tanasan SRIKOTR, Kazunori MANO, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Vector Quantization of Speech Spectrum Based on the VQ-VAE Embedding Space Learning by GAN Technique},
year={2022},
volume={E105-A},
number={4},
pages={647-654},
abstract={The spectral envelope parameter is a significant speech parameter in the vocoder's quality. Recently, the Vector Quantized Variational AutoEncoder (VQ-VAE) is a state-of-the-art end-to-end quantization method based on the deep learning model. This paper proposed a new technique for improving the embedding space learning of VQ-VAE with the Generative Adversarial Network for quantizing the spectral envelope parameter, called VQ-VAE-EMGAN. In experiments, we designed the quantizer for the spectral envelope parameters of the WORLD vocoder extracted from the 16kHz speech waveform. As the results shown, the proposed technique reduced the Log Spectral Distortion (LSD) around 0.5dB and increased the PESQ by around 0.17 on average for four target bit operations compared to the conventional VQ-VAE.},
keywords={},
doi={10.1587/transfun.2021SMP0018},
ISSN={1745-1337},
month={April},}
부
TY - JOUR
TI - Vector Quantization of Speech Spectrum Based on the VQ-VAE Embedding Space Learning by GAN Technique
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 647
EP - 654
AU - Tanasan SRIKOTR
AU - Kazunori MANO
PY - 2022
DO - 10.1587/transfun.2021SMP0018
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E105-A
IS - 4
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - April 2022
AB - The spectral envelope parameter is a significant speech parameter in the vocoder's quality. Recently, the Vector Quantized Variational AutoEncoder (VQ-VAE) is a state-of-the-art end-to-end quantization method based on the deep learning model. This paper proposed a new technique for improving the embedding space learning of VQ-VAE with the Generative Adversarial Network for quantizing the spectral envelope parameter, called VQ-VAE-EMGAN. In experiments, we designed the quantizer for the spectral envelope parameters of the WORLD vocoder extracted from the 16kHz speech waveform. As the results shown, the proposed technique reduced the Log Spectral Distortion (LSD) around 0.5dB and increased the PESQ by around 0.17 on average for four target bit operations compared to the conventional VQ-VAE.
ER -