TY - GEN
T1 - Genre classification and the invariance of MFCC features to key and tempo
AU - Li, Tom L.H.
AU - Chan, Antoni B.
PY - 2011
Y1 - 2011
N2 - Musical genre classification is a promising yet difficult task in the field of musical information retrieval. As a widely used feature in genre classification systems, MFCC is typically believed to encode timbral information, since it represents short-duration musical textures. In this paper, we investigate the invariance of MFCC to musical key and tempo, and show that MFCCs in fact encode both timbral and key information. We also show that musical genres, which should be independent of key, are in fact influenced by the fundamental keys of the instruments involved. As a result, genre classifiers based on the MFCC features will be influenced by the dominant keys of the genre, resulting in poor performance on songs in less common keys. We propose an approach to address this problem, which consists of augmenting classifier training and prediction with various key and tempo transformations of the songs. The resulting genre classifier is invariant to key, and thus more timbre-oriented, resulting in improved classification accuracy in our experiments. © 2011 Springer-Verlag Berlin Heidelberg.
AB - Musical genre classification is a promising yet difficult task in the field of musical information retrieval. As a widely used feature in genre classification systems, MFCC is typically believed to encode timbral information, since it represents short-duration musical textures. In this paper, we investigate the invariance of MFCC to musical key and tempo, and show that MFCCs in fact encode both timbral and key information. We also show that musical genres, which should be independent of key, are in fact influenced by the fundamental keys of the instruments involved. As a result, genre classifiers based on the MFCC features will be influenced by the dominant keys of the genre, resulting in poor performance on songs in less common keys. We propose an approach to address this problem, which consists of augmenting classifier training and prediction with various key and tempo transformations of the songs. The resulting genre classifier is invariant to key, and thus more timbre-oriented, resulting in improved classification accuracy in our experiments. © 2011 Springer-Verlag Berlin Heidelberg.
UR - http://www.scopus.com/inward/record.url?scp=78751660647&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-78751660647&origin=recordpage
U2 - 10.1007/978-3-642-17832-0_30
DO - 10.1007/978-3-642-17832-0_30
M3 - RGC 32 - Refereed conference paper (with host publication)
SN - 3642178316
SN - 9783642178313
VL - 6523 LNCS
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 317
EP - 327
BT - Advances in Multimedia Modeling
PB - Springer Verlag
T2 - 17th Multimedia Modeling Conference, MMM 2011
Y2 - 5 January 2011 through 7 January 2011
ER -