Sparse representation of phonetic features for voice conversion with and without parallel data

Berrak Sisman, Haizhou Li, Kay Chen Tan

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

37 Citations (Scopus)

Abstract

This paper presents a voice conversion framework that uses phonetic information in an exemplar-based voice conversion approach. The proposed idea is motivated by the fact that phone-dependent exemplars lead to better estimation of activation matrix, therefore, possibly better conversion. We propose to use the phone segmentation results from automatic speech recognition (ASR) to construct a sub-dictionary for each phone. The proposed framework can work with or without parallel training data. With parallel training data, we found that phonetic sub-dictionary outperforms the state-of-the-art baseline in objective and subjective evaluations. Without parallel training data, we use Phonetic PosteriorGrams (PPGs) as the speaker-independent exemplars in the phonetic sub-dictionary to serve as a bridge between speakers. We report that such technique achieves a competitive performance without the need of parallel training data.
Original languageEnglish
Title of host publication2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) - Proceedings
PublisherIEEE
Pages677-684
ISBN (Electronic)978-1-5090-4788-8
DOIs
Publication statusPublished - Dec 2017
Event2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Okinawa, Japan
Duration: 16 Dec 201720 Dec 2017

Conference

Conference2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017
PlaceJapan
CityOkinawa
Period16/12/1720/12/17

Bibliographical note

Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).

Research Keywords

  • phonetic exemplars
  • Phonetic PosteriorGrams
  • sparse representation
  • Voice conversion

Fingerprint

Dive into the research topics of 'Sparse representation of phonetic features for voice conversion with and without parallel data'. Together they form a unique fingerprint.

Cite this