Polyphonic song analysis and representation for query-by-singing systems

針對歌聲查詢系統的複音樂曲分析與表示

Student thesis: Master's Thesis

View graph of relations

Author(s)

  • Tat Wan LEUNG

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date3 Oct 2005

Abstract

Due to the improvement of storage and compression technologies, the amount of musical data available on the Internet is explosively increasing. To enable the query-by-humming and query-by-singing type of interfaces for musical search, effective techniques and tools for musical signal analysis and representation are highly demanded. Currently, most approaches in musical retrieval are based on monophonic signals (single voice) in textual symbolic format (e.g., MIDI). In this thesis, we investigate techniques related to polyphonic signal (multi voices) analysis in acoustic format, with popular songs as examples. Compared with monophonic song, the analysis of polyphonic song is highly challenging due to the requirement of main theme extraction to allow song matching. To date, there is no single unique solution to this difficult problem. We concentrate our domain of analysis on polyphonic popular song since singing voice, in general, represents the main theme of a song. The theme can be statistically modeled through data training or heuristic assumptions. One common assumption is that singing voice is the only dominant signal in a song. By modeling and extracting singing voices, the possibility of matching polyphonic songs and monophonic query is enlightened. The issues we investigate in this thesis include time-domain and frequency-domain singing voice extraction, mid-level singing voice representation for song matching, indexing and retrieval. Initially, a polyphonic song is partitioned and classified into segments of singing voice (SV) and instrument sound (IS) in time-domain by support vector machine. The "pure SV" are further extracted from the classified SV segments by independent component analysis in frequency domain. Two mid-level melody representations are then proposed to characterize the extracted SV. To allow the matching between SV and query, proportional transportation distance is employed for song matching while vantage point tree is used as data structure to support song indexing.

    Research areas

  • Music, Information storage and retrieval systems, Multimedia systems, Computer sound processing