| 研究生: |
呂政哲 Lu, Cheng-Che |
|---|---|
| 論文名稱: |
音樂分類、推薦與擷取技術之研究 A Study on Music Classification, Recommendation, and Retrieval Techniques |
| 指導教授: |
曾新穆
Tseng, Vincent S. |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2009 |
| 畢業學年度: | 98 |
| 語文別: | 英文 |
| 論文頁數: | 77 |
| 中文關鍵詞: | 音樂分類 、音樂推薦 、音樂擷取 、音樂表示法 、音樂特徵 、多媒體資料探勘 |
| 外文關鍵詞: | music classification, music recommendation, music retrieval, music representation, emotion-based features, content-based features, multimedia data mining |
| 相關次數: | 點閱:147 下載:8 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著電腦與網路技術的發展,數位音樂不斷地大量增加,並且它們已經普及到每個人的生活中,如今,音樂市場展示了符號音樂表示法與多媒體物件進行整合的做法,例如:音樂教育、音樂圖書館、卡拉OK。為了滿足未來音樂相關的應用,本論文針對音樂分類、音樂推薦與音樂擷取,提出了一系列新的方法。首先,在音樂分類的部分,為了吻合符號音樂表示法的相關音樂應用,從樂譜中所擷取出的音樂特徵,在將來可運用在音樂註解的工作上,另外,為了改善音樂分類的正確性,其中關鍵的問題,就是如何擷取有用的音樂特徵,然後將它們放進適合的分類器中。我們提出了一個新的概念,它能夠解析MusicXML檔案,然後擷取出具性質上與數量上的音樂特徵,另外,我們建議一些音樂特徵來改善分類的正確性,並且建立音樂分類器。實驗結果顯示,以分類的正確性來說,我們所提的方法優於其他既有的方法。
另一方面,雖然購物網站已經發展推薦系統來激勵消費者購買音樂,但是隱藏著兩點問題:第一點,如果其中有尚未被任何人評比過的音樂,那麼該首音樂將永遠不會被推薦到。第二點,使用者並不一定就對那些擁有高分評價的音樂,感到興趣。在本研究中,我們提出一個全新的方法,稱為個人化混合式音樂推薦(personalized hybrid music recommendation),它是根據使用者的喜好,計算出內涵式(content-based)、合作式(collaboration-based)和情感式(emotion-based)這三種推薦方法的權重,然後將這三種推薦方法組合起來,藉此,我們的方法就能達成三項目標:第一項,系統能夠推薦那些尚未被任何人所評比過的音樂給使用者。第二項,能避免重複推薦那些所不喜歡的音樂給使用者。第三項,除了推薦使用者已習慣聽的音樂之外,還能推薦給使用者,更令人感到興趣的音樂。經由實際之實驗評估顯示我們的方法可以達到90%的推薦正確率。
最後,目前在音樂擷取方法的研究領域中,大部分是集中在單音音樂,對於那些屬於多音的流行音樂來說,那些方法是不切實際的,並且對於非音樂專業的使用者而言,如果想要獲得所指定的情感音樂,以目前的音樂擷取方法是無法做到的,除此之外,使用者也可能要求系統所取回的音樂,必須同時包含個人所指定的旋律與音樂情感。所以我們提出一個新的方法,稱為整合式音樂資訊擷取(Integrated Music Information Retrieval),同時採用內涵式(content-based)與情感式(emotion-based)的音樂特徵,而為了使系統有效率地取回音樂,我們並建議先將音樂轉換成音樂表示法,然後進行索引編碼與儲存。實驗證明我們的方法能有效率地且正確地將使用者所需要的音樂加以取回。
With the progress of computer and network technologies, the amount of digital music works has been growing rapidly and digital music has become popular in everyone’s life. Nowadays the music market goes towards the integration of symbolic music representation and multimedia objects, such as applications in music education, music library and karaoke. To cater to future music-related applications, in this thesis, we proposed a series of novel methods for music classification, music recommendation and music retrieval. First, in terms of music classification, in order to meet music-related applications of the symbolic music representations, the features obtained from the score are high-level features that may be used to annotate music in the future. Furthermore, to improve the accuracy of music classification, the critical issue is how to obtain useful features as input for classifiers. In this study, we present a new conceptual framework that can automatically parse MusicXML files and extract their qualitative and quantitative features. In addition, we also propose appropriate features to improve classification accuracy and create an effective classifier for automatic music classification. To assess the proposed approach, music features extracted from a score were used to test the music classification accuracy. The experimental results show that the proposed approach outperforms other existing methods in terms of classification accuracy.
On the other hand, in terms of music recommendation, the existing recommendation methods incur two problems: 1) they can not recommend the music which is not rated by anyone, 2) users may be not interested in highly rated music. In this study, we propose a novel method called personalized hybrid music recommendation, which combines the content-based, collaboration-based and emotion-based methods by computing the weights of the methods according to users’ interests. The proposed method can achieve three goals: to recommend music which has not been rated by anyone, to avoid repeatedly recommending some ‘disfavored music’, and to recommend the more interesting music besides what the music users are used to listen to. Through experimental evaluation, it was shown that the recommendation accuracy achieved by our method is as good as 90%.
Finally, in considering the topic of music retrieval, although a number of studies have been done on this topic, most of them have focused only on the mono music of MIDI. In fact, the retrieval methods for mono music are unrealistic for popular music. Furthermore, current techniques on music retrieval are not sufficient for those who want to obtain music pieces that match the emotions they prefer. On the other hand, users may simultaneously assign the musical segments and musical emotions when searching for the needed music. In this study, we propose a novel method called Integrated Music Information Retrieval (IMIR) that utilizes both content-based and emotion-based features for music retrieval in order to match the users’ needs. To retrieve the music from large amounts of digital music more efficiently, we also propose that all music pieces can be transformed into proposed music representations and then recorded in the indexes. The experimental results show that the proposed method substantially outperforms existing methods in terms of efficiency in content-based music retrieval. We also present that our method is very effective for emotion-based music retrieval.
[1] Amazon. http://www.amazon.com/
[2] E. Bill, “Some advances in transformation-based parts of speech tagging,” in Proc. of the 12th national conference on Artificial intelligence, pp. 722–727, 1994.
[3] P. Bellini, P. Nesi and G. Zoia, “Symbolic music representation in MPEG,” Int. J. of IEEE Multimedia, vol.12, no.4, pp. 42-49, 2005.
[4] D. Bainbridge, “MELDEX: a web-based melodic index service,” Computing in Musicology, vol.11, no.12, pp. 223-230, 1998.
[5] H.C. Chen and A.L.P. Chen, “Continuous query processing over music streams based on approximate matching mechanisms,” ACM Multimedia Systems, vol.14, no.1, pp. 51-70, 2008.
[6] H. C. Chen and A. L. P. Chen, “A music recommendation system based on music data grouping and user interests,” in Proc. of the ACM Intl. Conference on Information and Knowledge Management, pp. 231-238, 2001.
[7] W. Chai and B. Vercoe, “Using user models in music information retrieval systems,” in Proc. of the Intl. Conference on Music Information Retrieval, 2000.
[8] W. Chai and B. Vercoe, “Folk music classification using hidden Markov models,” in Proc. of the Intl. Conference on Artificial Intelligence, 2001.
[9] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-based Learning Methods, Cambridge University Press, 2000.
[10] C. C. Chang and C. J. Lin, LIBSVM — A library for support vector machines. Available from <http://www.csie.ntu.edu.tw/~cjlin/libsvm/>
[11] Disney midi page. http://www.ginevra2000.it/disney/midi/allmidi.htm
[12] DownWithTheLoads. http://www.downwiththeloads.com/
[13] M. Dehghani and A. M. Lovett, “Efficient genre classification using qualitative representations,” in Proc. of the Intl. Conference on Music Information Retrieval, pp. 353-354, 2006.
[14] Folksongs. http://ingeb.org/
[15] Finale showcase. http://www.finalemusic.com/showcase/
[16] Ghibli midi page. http://www.wingsee.com/ghibli/
[17] J. Haitsma and T. Kalker, “A highly robust audio fingerprinting system,” in Proc. of the Intl. Conference on Music Information Retrieval, pp. 107-115, 2002.
[18] H. Hoos, K. Renz and M. Gorg, “GUIDO/MIR: an experimental musical information retrieval system based on GUIDO music notation,” in Proc. of the Intl. Conference on Music Information Retrieval, pp. 41-50, 2001.
[19] iTunes. http://www.apple.com/itunes/
[20] KIDiddles. http://www.kididdles.com/
[21] F. F. Kuo and M. K. Shan, “A personalized music filtering system based on melody style classification,” in Proc. of the IEEE Intl. Conference on Data Mining, pp. 649-652, 2002.
[22] F. F. Kuo, M. F. Chiang, M. K. Shan and S. Y. Lee, “Emotion-based music recommendation by association discovery from film music,” in Proc. of the Proceedings of the 13th annual ACM international conference on Multimedia, pp. 507-510, 2005.
[23] I. Karydis, A. Nanopoulos, A. Papadopoulos, D. Katsaros and Y. Manolopoulos, “Content-based music information retrieval in wireless ad-hoc networks,” in Proc. of the Intl. Conference on Music Information Retrieval, pp. 137-144, 2005.
[24] M. Kantrowitz, B. Mohit and V. Mittal, “Stemming and its effects on TFIDF ranking,” in Proc. of the 23th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 357-359, 2000.
[25] C. L. Krumhansl, “Music: a link between cognition and emotion,” Current Directions in Psychological Science, vol.11, no.2, pp. 45-50, 2002.
[26] A. Liess, W. Simon, M. Yutsis, J. E. Owen, K. A. Piemme, M. Golant and J. Giese-Davis, “Detecting emotional expression in face-to-face and online breast cancer support groups,” Journal of Consulting and Clinical Psychology, vol.76, no.3, pp. 517-523, 2008.
[27] L. Lu, D. Liu and H. J. Zhang, “Automatic mood detection and tracking of music audio signals,” IEEE Transactions on Audio, Speech, and Language Processing, vol.14, no.1, pp. 5-18, 2006.
[28] L. Lu and A. Hanjalic, “Audio keywords discovery for text-like audio content analysis and retrieval,” IEEE Transactions on Multimedia, vol.10, no.1, pp. 74-85, 2008.
[29] N. H. Liu, Y. H. Wu and A. L. P. Chen, “Efficient kNN search in polyphonic music databases using a lower bounding mechanism,” ACM Multimedia Systems Journal, vol.10, no.6, pp. 513-528, 2005.
[30] C. C. Liu, J. L. Hsu and A. L. P. Chen, “Discovering nontrivial repeating patterns in music data,” IEEE Transactions on Multimedia, vol.3, no.3, pp. 311-325, 2001.
[31] K. Lemstrom and P. Sami, “SEMEX-an efficient music retrieval prototype,” in Proc. of the Intl. Conference on Music Information Retrieval, 2000.
[32] MIDI Manufacturers Association. http://www.midi.org/
[33] MIDI Database. http://www.mididb.com/
[34] F. Moerchen, I. Mierswa and U. Alfred, “Understandable models of music collections based on exhaustive feature generation with temporal statistics,” in Proc. of the Knowledge Discovery and Data Mining, pp. 882-891, 2006.
[35] M. McKinney and J. Breebaart, “Features for audio and music classification,” in Proc. of the Intl. Conference on Music Information Retrieval, pp. 151-158, 2003.
[36] H. M. Miller and E. Williams, Introduction to Music, Perennial Press, 1991.
[37] R. J. McNab, L.S. Smith, I. H. Witten and C. L. Henderson, “Tune retrieval in the multimedia library,” Multimedia Tools and Applications, vol.10, no.2/3, pp. 113-132, 2000.
[38] G. Neve and N. Orio, “A comparison of melodic segmentation techniques for music information retrieval,” in Proc. of the European conference on digital libraries, pp. 49-56, 2005.
[39] W. Parrott, Emotions in social psychology, Philadelphia: Psychology Press, 2001.
[40] T. Pao, Y. Chen and Junheng Yeh, “Emotion recognition and evaluation from mandarin speech signals,” Int. J. of Innovative Computing, Information and Control, vol.4, no.7, pp. 1695-1709, 2008.
[41] J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, 1993.
[42] Recordare. http://www.recordare.com/software.html
[43] W. S. N. Reilly, Believable social and emotion agents, Doctorial dissertation, Carnegie Mellon University, 1996.
[44] U. Shardan, and P Maes, “Social Information Filtering: Algorithms for automating word of mouth,” in Proc. of the Conference on Human Factors in Computing Systems, pp. 210-217, 1995.
[45] S. Sadie and A. Latham, The Cambridge Music Guide, Cambridge University Press, 1990.
[46] M. K. Shan and F. F. Kuo, “Music style mining and classification by melody,” IEICE Trans. Inf. & Syst, vol. E86-D, no.3, pp. 655-659, 2003.
[47] G. Shakhnarovich, T. Darrell and P. Indyk, Nearest-Neighbor Methods in Learning and Vision, The MIT Press, 2005.
[48] Themefinder. http://essen.themefinder.org/
[49] M. Tiemann, S. Pauws and F. Vignoli, “Ensemble learning for hybrid music recommendation,” in Proc. of the Intl. Conference on Music Information Retrieval, pp. 179-180, 2007.
[50] G. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Transactions on Speech and Audio Processing, vol.10, no.5, pp. 293-302, 2002.
[51] K. Trohidis, G. Tsoumakas, G. Kalliris and I. Vlahavas, “Multilabel classification of music into emotions,” in Proc. of the Intl. Conference on Music Information Retrieval, pp. 325-330, 2008.
[52] R. E. Thayer, The biopsychology of mood and arousal, Oxford University Press, 1989.
[53] R. Typke, F. Wiering and R. C. Veltkamp, “A survey of music information retrieval systems,” in Proc. of the Intl. Conference on Music Information Retrieval, pp. 153-160, 2005.
[54] Y. H. Tseng, “Content-based retrieval for music collections,” in Proc. of the 22nd international ACM SIGIR conference on research and development in information retrieval, pp. 176-182, 1999.
[55] A. Uitdenbogerd and R. Schyndel, “A review of factors affecting music recommender success,” in Proc. of the Intl. Conference on Music Information Retrieval, pp. 204-208, 2002.
[56] A. L. Uitdenbogerd and J. Zobel, “Manipulation of music for melody matching,” in Proc. of the ACM Multimedia, pp. 235-240, 1998.
[57] A. L. Uitdenbogerd and J. Zobel, “Melodic matching techniques for large music databases,” in Proc. of the ACM Multimedia, pp. 57-66, 1999.
[58] W3Schools. http://www.w3schools.com/
[59] T. Weyde and C. Datzko, “Efficient melody retrieval with motif contour classes,” in Proc. of the Intl. Conference on Music Information Retrieval, pp. 686-689, 2005.
[60] B. Wei, C. Zhang and M. Ogihara, “Keyword generation for lyrics,” in Proc. of the Intl. Conference on Music Information Retrieval, pp. 121-122, 2007.
[61] C. Wang, J. Li and S. Shi, “N-gram inverted index structures on music data for theme mining and content-based information retrieval,” Pattern Recognition Letters, vol.27, no.5, pp. 492-503, 2006.
[62] M. Y. Wang, N. Y. Zhang and H. C. Zhu, “User-adaptive music emotion recognition,” in Proc. of the 7th international conference on signal processing, pp. 1352-1355, 2004.
[63] D. Yang and W. S. Lee, “Disambiguating music emotion using software agents,” in Proc. of the Intl. Conference on Music Information Retrieval, pp. 218-223, 2004.
[64] Y. H. Yang, Y. C. Lin, Y. F. Su and H. H. Chen, “A regression approach to music emotion classification,” IEEE Transaction on Audio, Speech, and Language Processing, vol.16, no.2, pp. 448-457, 2008.
[65] C. C. Yeh, S. S. Tseng, P. C. Tsai and J. F. Weng, “Building a personalized music emotion prediction system,” in Proc. of the 7th Pacific Rim Conference on Multimedia, pp. 730-739, 2006.