Skip to main navigation Skip to search Skip to main content

Time series models for semantic music annotation

Emanuele Coviello, Antoni B. Chan, Gert Lanckriet

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

Many state-of-the-art systems for automatic music tagging model music based on bag-of-features representations which give little or no account of temporal dynamics, a key characteristic of the audio signal. We describe a novel approach to automatic music annotation and retrieval that captures temporal (e.g., rhythmical) aspects as well as timbral content. The proposed approach leverages a recently proposed song model that is based on a generative time series model of the musical content-the dynamic texture mixture (DTM) model that treats fragments of audio as the output of a linear dynamical system. To model characteristic temporal dynamics and timbral content at the tag level, a novel, efficient, and hierarchical expectation-maximization (EM) algorithm for DTM (HEM-DTM) is used to summarize the common information shared by DTMs modeling individual songs associated with a tag. Experiments show learning the semantics of music benefits from modeling temporal dynamics. © 2010 IEEE.
Original languageEnglish
Article number5613150
Pages (from-to)1343-1359
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume19
Issue number5
DOIs
Publication statusPublished - 2011

Research Keywords

  • Audio annotation and retrieval
  • dynamic texture model
  • music information retrieval

Fingerprint

Dive into the research topics of 'Time series models for semantic music annotation'. Together they form a unique fingerprint.

Cite this