| 研究生: |
蔡宜澄 Tsai, Yi-Cheng |
|---|---|
| 論文名稱: |
多標籤方法於音樂情緒辨識上之實驗性分析 Empirical Analysis of Multi-labeling Methods for Music Emotion Recognition |
| 指導教授: |
曾新穆
Tseng, Vincent-S. |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2013 |
| 畢業學年度: | 101 |
| 語文別: | 英文 |
| 論文頁數: | 63 |
| 中文關鍵詞: | 音樂情緒 、多標籤 、標記方法 、實驗分析 |
| 外文關鍵詞: | Music emotion, Multi-labeling, Annotation, Tagging, Empirical analysis |
| 相關次數: | 點閱:74 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於音樂內含高度的情感以及易親近於人的特性,使得音樂一直大量地融入我們的日常生活當中。當人們花許多時間去聆聽歌曲時,情緒是在選擇音樂時一項重要的指標。因此對於音樂信息檢索而言,要如何從音樂中找出所表達的情緒一直是過去數十年內一個熱門的議題。到目前為止,大量地多標籤的學術研究已經被建立出來去標記音樂的情緒。在本篇論文中,我們透過大量的實驗評估,來對音樂標記情緒建的現代多標籤方法建立全方位的比較性分析。這些比較實驗被實作在名為CAL500的公開資料集上。而為了讓實驗分析更穩固,本研究主要的貢獻為:(1)我們建立了全方位的實驗比較,比較項目涵括不同預測策略的評估,不同情緒複雜度的評估,單示例與多示例情況的評估,執行的時間的分析,特徵值靈敏度的分析以及穩健性的分析等;(2)我們較過去既有研究比較了更多種的多標籤方法;(3)我們總結從這些實驗評估的觀察結果,並且從技術角度的觀點提供給其他研究音樂情緒辨識議題或設計情感化音樂演算法的研究者們一些深入之見解。
Music has heavily been immersed in our everyday life because of its high accessibility and emotionality. While people spend much time listening to music pieces, the emotion is an important factor in choosing the music pieces. Hence, how to extract emotions from music has been a hot topic for music information retrieval over the past few decades. To this end, a considerable number of multi-labeling studies have been conducted on tagging music emotions. In this work, we conduct a comparative analysis of state-of-the-art methods for music emotion annotation through extensive experimental evaluations. The comparative experiments were performed on public dataset called CAL500. To make the evaluations solid, the main contributions in this work are: 1) We conduct comprehensive experimental comparisons, including evaluations by different prediction strategies, evaluations by different emotion complexities, evaluations by the single-instance and multi-instance cases, analysis of the execution time, analysis of the feature sensitivities, and analysis of the robustness. 2) Our study compares more multi-labeling methods than previous ones. 3) We conclude our observations from experimental evaluations and provide some insightful guidance for the researchers who study in the topics of music emotion recognition or design novel algorithms of emotionalizing music from technical point of view.
[1] 2010 Audio Music Mood Classification Website, http://www.music-ir.org/mirex/wiki/2010:Audio_Music_Mood_Classification.
[2] All Music Guide, http://www.allmusic.com/.
[3] L. Barrington, A. Chan, D. Turnbull, and G. Lanckriet, “Audio Information Retrieval Using Semantic Similarity,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing, pp. II-725 - II-728, 2007.
[4] L. Barrington, A. B. Chan, and G. Lanckriet, “Modeling Music as a Dynamic Texture,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 3, pp. 602-612, 2010.
[5] T. Bertin-Mahieux, D. Eck, F. o. Maillet, and P. Lamere, “Autotagger: A Model for Predicting Social Tags from Acoustic Features on Large Music Databases,” Journal of New Music Research, vol. 37, no. 2, pp. 115-135, 2008.
[6] M. R. Boutell, J. Luo, X. Shen, and C. M. Brown, “Learning multi-label scene classification,” Pattern Recognition, vol. 37, no. 9, pp. 1757-1771, 2004.
[7] C.-C. Chang, and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 2, no. 3, pp. 1-27, 2011.
[8] R. Chen, and M. Li, “Music Structural Segmentation by Combining Harmonic and Timbral Information,” in Proceedings of International Society for Music Information Retrieval Conference (ISMIR), pp. 477-482, 2011.
[9] W. Cheng, and E. Hullermeier, “Combining instance-based learning and logistic regression for multilabel classification,” Machine Learning, vol. 76, no. 2-3, pp. 211-225, 2009.
[10] CLUS, http://clus.sourceforge.net/.
[11] J. J. Deng, and C. H. C. Leung, "Music Retrieval in Joint Emotion Space Using Audio Features and Emotional Tags," Advances in Multimedia Modeling, published by Springer Berlin Heidelberg, pp. 524-534, 2013.
[12] J. Furnkranz, E. Hullermeier, E. L. Mencia, and K. Brinker, “Multilabel classification via calibrated label ranking,” Machine Learning, vol. 73, no. 2, pp. 133-153, 2008.
[13] M. D. Hoffman, D. M. Blei, and P. R. Cook, “Easy As CBA: A Simple Probabilistic Model for Tagging Music,” in Proceedings of International Symposium on Music Information Retrieval, pp. 369-374, 2009.
[14] X. Hu, and J. S. Downie, “When Lyrics Outperform Audio for Music Mood Classification: A Feature Analysis,” in Proceedings of International Society for Music Information Retrieval Conference (ISMIR), pp. 619-624, 2010.
[15] X. Hu, J. S. Downie, C. Laurier, M. Bay, and A. F. Ehmann, “The 2007 MIREX Audio Mood Classification Task: Lessons Learned,” in Proceedings of International Society for Music Information Retrieval Conference (ISMIR), pp. 462-467, 2008.
[16] S. Ja-Hwung, C. Chien-Li, L. Ching-Yung, and V. S. Tseng, “Effective Semantic Annotation by Image-to-Concept Distribution Model,” IEEE Transactions on Multimedia, vol. 13, no. 3, pp. 530-538, 2011.
[17] Jing.fm, http://jing.fm/.
[18] Y. E. Kim, E. M. Schmidt, R. Migneco, B. G. Morton, P. Richardson, J. Scott, J. A. Speck, and D. Turnbull, “State of the Art Report: Music Emotion Recognition: A State of the Art Review,” in Proceedings of International Society for Music Information Retrieval Conference (ISMIR), pp. 255-266, 2010.
[19] O. Lartillot, P. Toiviainen, and T. Eerola, "A Matlab Toolbox for Music Information Retrieval," Data Analysis, Machine Learning and Applications, published by Springer Berlin Heidelberg, pp. 261-268, 2008.
[20] Last.fm, http://cn.last.fm/.
[21] M. Levy, and M. Sandler, “Music information retrieval using social tags and audio,” IEEE Transactions on Multimedia, vol. 11, no. 3, pp. 383-395, 2009.
[22] Y.-C. Lin, Y.-H. Yang, and H. H. Chen, “Exploiting online music tags for music emotion classification,” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), vol. 7S, no. 1, pp. 1-16, 2011.
[23] M3MIML-LAMDA, http://lamda.nju.edu.cn/code_M3MIML.ashx.
[24] G. Madjarov, D. Kocev, D. Gjorgjevikj, and S. D?eroski, “An extensive experimental comparison of methods for multi-label learning,” Pattern Recognition, vol. 45, no. 9, pp. 3084-3104, 2012.
[25] Marsyas, http://marsyasweb.appspot.com/about/.
[26] R. Miotto, and G. Lanckriet, “A Generative Context Model for Semantic Music Annotation and Retrieval,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 4, pp. 1096-1108, 2012.
[27] S. R. Ness, A. Theocharis, G. Tzanetakis, and L. G. Martins, “Improving automatic music tag annotation using stacked generalization of probabilistic SVM outputs,” in Proceedings of the 17th ACM International Conference on Multimedia, pp. 705-708, 2009.
[28] J. A. Russell, “A circumplex model of affect,” Journal of Personality and Social Psychology, vol. 39, no. 6, pp. 1161-1178, 1980.
[29] C. Sanden, and J. Z. Zhang, “An Empirical Study of Multi-Label Classifiers for Music Tag Annotation,” in Proceedings of International Society for Music Information Retrieval Conference (ISMIR), pp. 717-722, 2011.
[30] E. M. Schmidt, and Y. E. Kim, “Prediction of Time-Varying Musical Mood Distributions Using Kalman Filtering,” in Proceedings of the 2010 Ninth International Conference on Machine Learning and Applications, pp. 655-660, 2010.
[31] Y. Song, S. Dixon, and M. Pearce, “Evaluation of Musical Features for Emotion Classification,” in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pp. 523-528, 2012.
[32] E. Spyromitros, G. Tsoumakas, and I. Vlahavas, “An Empirical Study of Lazy Multilabel Classification Algorithms,” in Proceedings of the 5th Hellenic Conference on Artificial Intelligence(SETN) : Theories, Models and Applications, pp. 401-406, 2008.
[33] B. L. Sturm, “Two systems for automatic music genre recognition: what are they really recognizing?,” in Proceedings of the Second International ACM Workshop on Music Information Retrieval with User-centered and Multimodal Strategies, pp. 69-74, 2012.
[34] D. Su, and P. Fung, “Personalized music emotion classification via active learning,” in Proceedings of the Second International ACM Workshop on Music Information Retrieval with User-centered and Multimodal Strategies, pp. 57-62, 2012.
[35] K. Trohidis, G. Tsoumakas, G. Kalliris, and I. Vlahavas, “Multi-Label Classification of Music into Emotions,” in Proceedings of International Society for Music Information Retrieval Conference (ISMIR), pp. 325-330, 2008.
[36] K. Trohidis, G. Tsoumakas, G. Kalliris, and I. P. Vlahavas, “Multi-label classification of music by emotion,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2011, pp. 4, 2011.
[37] G. Tsoumakas, I. Katakis, and I. Vlahavas, “Random k-Labelsets for Multilabel Classification,” IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 7, pp. 1079-1089, 2011.
[38] G. Tsoumakas, I. Katakis, and I. P. Vlahavas, “Effective and Efficient Multilabel Classification in Domains with Large Number of Labels,” in Proceedings of ECML/PKDD 2008 Workshop on Mining Multidimensional Data, 2008.
[39] G. Tsoumakas, and I. Katakis, “Multi-Label Classification: An Overview,” International Journal of Data Warehousing and Mining, vol. 3, no. 3, pp. 1-13, 2007.
[40] G. Tsoumakas, E. Spyromitros-Xioufis, J. Vilcek, and I. Vlahavas, “MULAN: A Java Library for Multi-Label Learning,” Journal of Machine Learning Research, vol. 12, pp. 2411-2414, 2011.
[41] D. Turnbull, L. Barrington, D. Torres, and G. Lanckriet, “Semantic Annotation and Retrieval of Music and Sound Effects,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 2, pp. 467-476, 2008.
[42] J.-C. Wang, H.-S. Lee, H.-M. Wang, and S.-K. Jeng, “Learning the Similarity of Audio Music in Bag-of-frames Representation from Tagged Music Data,” in Proceedings of International Society for Music Information Retrieval Conference (ISMIR), pp. 85-90, 2011.
[43] J.-C. Wang, Y.-H. Yang, H.-M. Wang, and S.-K. Jeng, “The acoustic emotion gaussians model for emotion-based music annotation and retrieval,” in Proceedings of the 20th ACM International Conference on Multimedia, pp. 89-98, 2012.
[44] Y.-H. Yang, and H. H. Chen, “Machine Recognition of Music Emotion: A Review,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 3, no. 3, pp. 1-30, 2012.
[45] A. Yazdani, E. Skodras, N. Fakotakis, and T. Ebrahimi, “Multimedia content analysis for emotional characterization of music video clips,” EURASIP Journal on Image and Video Processing, 2013.
[46] M.-L. Zhang, and Z.-H. Zhou, “Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization,” IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 10, pp. 1338-1351, 2006.
[47] M.-L. Zhang, and Z.-H. Zhou, “ML-KNN: A lazy learning approach to multi-label learning,” Pattern Recognition, vol. 40, no. 7, pp. 2038-2048, 2007.
[48] M.-L. Zhang, and Z.-H. Zhou, “M3MIML: A Maximum Margin Method for Multi-instance Multi-label Learning,” in Proceedings of the 8th IEEE International Conference on Data Mining (ICDM'08), pp. 688-697, 2008.
[49] M.-L. Zhang, and Z.-H. Zhou, “A Review On Multi-Label Learning Algorithms,” IEEE Transactions on Knowledge and Data Engineering, 2013.
[50] Z. Zhao, X. Wang, Q. Xiang, A. M. Sarroff, Z. Li, and Y. Wang, “Large-scale music tag recommendation with explicit multiple attributes,” in Proceedings of the International Conference on Multimedia, pp. 401-410, 2010.
[51] Z.-H. Zhou, and M.-L. Zhang, “Multi-Instance Multi-Label Learning with Application to Scene Classification,” in Proceedings of Neural Information Processing Systems (NIPS), pp. 1609-1616, 2006.
校內:2018-08-29公開