Skip to main navigation Skip to search Skip to main content

Transformation of Prosody in Voice Conversion

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

Voice Conversion (VC) aims to convert one's voice to sound like that of another. So far, most of the voice conversion frameworks mainly focus only on the conversion of spectrum. We note that speaker identity is also characterized by the prosody features such as fundamental frequency (F0), energy contour and duration. Motivated by this, we propose a framework that can perform F0, energy contour and duration conversion. In the traditional exemplar-based sparse representation approach to voice conversion, a general source-target dictionary of exemplars is constructed to establish the correspondence between source and target speakers. In this work, we propose a Phonetically Aware Sparse Representation of fundamental frequency and energy contour by using Continuous Wavelet Transform (CWT). Our idea is motivated by the facts that CWT decompositions of F0 and energy contours describe prosody patterns in different temporal scales and allow for effective prosody manipulation in speech synthesis. Furthermore, phonetically aware exemplars lead to better estimation of activation matrix, therefore, possibly better conversion of prosody. We also propose a phonetically aware duration conversion framework which takes into account both phone-level and sentence-level speaking rates. We report that the proposed prosody conversion outperforms the traditional prosody conversion techniques in both objective and subjective evaluations.
Original languageEnglish
Title of host publicationProceedings - Ninth Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2017)
PublisherIEEE
Pages1537-1546
ISBN (Electronic)978-1-5386-1542-3
DOIs
Publication statusPublished - Dec 2017
Event9th Annual Summit and Conference of the Asia-Pacific-Signal-and-Information-Processing-Association (APSIPA ASC) - Kuala Lumpur, Malaysia
Duration: 12 Dec 201715 Dec 2017

Conference

Conference9th Annual Summit and Conference of the Asia-Pacific-Signal-and-Information-Processing-Association (APSIPA ASC)
PlaceMalaysia
CityKuala Lumpur
Period12/12/1715/12/17

Bibliographical note

Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).

Fingerprint

Dive into the research topics of 'Transformation of Prosody in Voice Conversion'. Together they form a unique fingerprint.

Cite this