Skip to main navigation Skip to search Skip to main content

Interpolating V/UV mixture functions of a harmonic model for concatenative speech synthesis

Research output: Journal Publications and ReviewsRGC 22 - Publication in policy or professional journal

Abstract

In this paper, a high quality speech synthesis method based on interpolating the voiced/unvoiced (V/UV) mixture functions of the multiband excitation model (MBE) is proposed. In MBE model, each harmonic band of the fundamental frequency in an excitation spectrum is rigidly declared as either voice or unvoiced and the harmonic band is pitch-dependent. In the proposed method, each harmonic band in a short time spectrum is synthesized by mixing both voiced and unvoiced energies. The ratio of the V/UV energies in a spectrum is determined by the V/UV mixture function which is subsequently parametrized by an all-zero model. Since the V/UV decision in the proposed method is not rigidly declared and the V/UV mixture function is pitch-independent, interpolating the V/UV excitation spectrum becomes possible. Smooth transition of excitation between acoustic units can be achieved by interpolating the V/UV mixture functions of adjacent frames. Simulation results show that by incorporating V/UV mixture function for concatenative synthesis, significant improvement in synthetic speech quality can be achieved.

Fingerprint

Dive into the research topics of 'Interpolating V/UV mixture functions of a harmonic model for concatenative speech synthesis'. Together they form a unique fingerprint.

Cite this