Czech Speech Synthesis with Generative Neural Vocoder
Vít, Jakub
Hanzlíček, Zdeněk
Matoušek, Jindřich
Speech synthesis, LSTM-based speech synthesis, WaveRNN, Neural vocoder, Unit selection
Abstrakt: In recent years, new neural architectures for generating high-quality synthetic speech on a per-sample basis were introduced. We describe our application of statistical parametric speech synthesis based on LSTM neural networks combined with a generative neural vocoder for the Czech language. We used a traditional LSTM architecture for generating vocoder parametrization from linguistic features. We replaced a standard vocoder with a WaveRNN neural network. We conducted a MUSHRA listening test to compare the proposed approach with the unit selection and LSTM-based parametric speech synthesis utilizing a standard vocoder. In contrast with our previous work, we managed to outperform a well-tuned unit selection TTS system by a great margin on both professional and amateur voices.
