A Vocoder-free WaveNet Voice Conversion with Non-Parallel Data

Categories: Dynamic Time Warping | Gaussian Mixture Model | WaveNet

Tags: 2019 | Eng Siong Chng | Haizhou Li | Xiaohai Tian

In a typical voice conversion system, vocoder is commonly used for speech-to-features analysis and features-to-speech synthesis. However, vocoder can be a source of speech quality degradation. This paper presents a vocoder-free voice conversion approach using WaveNet for non-parallel training data. Instead of dealing with the intermediate features, the proposed approach utilizes the WaveNet to map the Phonetic PosteriorGrams (PPGs) to the waveform samples directly. In this way, we avoid the estimation errors caused by vocoder and feature conversion. Additionally, as PPG is assumed to be speake...

High quality voice conversion using prosodic and high-resolution spectral features

Voice Conversion

Categories: Autoencoder | Deep Learning | Dynamic Time Warping | Fourier Transform

Voice conversion methods have advanced rapidly over the last decade. Studies have shown that speaker characteristics are captured by spectral feature as well as various prosodic features. Most existing conversion methods focus on the spectral feature as it directly represents the timbre characteristics, while some conversion methods have focused only on the prosodic feature represented by the fundamental frequency. In this paper, a comprehensive framework using deep neural networks to convert both timbre and prosodic features is proposed. The timbre feature is represented by a high-resolution ...

Tag: Eng Siong Chng