| 研究生: | 徐陞瑋 Syu, Sheng-Wei | 
|---|---|
| 論文名稱: | 混合音頻與歌詞之歌曲自動標籤方法 A Method of Music Auto-tagging Based on Audio and Lyric | 
| 指導教授: | 王惠嘉 Wang, Hei-Chia | 
| 學位類別: | 碩士 Master | 
| 系所名稱: | 管理學院 - 資訊管理研究所 Institute of Information Management | 
| 論文出版年: | 2019 | 
| 畢業學年度: | 107 | 
| 語文別: | 中文 | 
| 論文頁數: | 94 | 
| 中文關鍵詞: | 音樂自動標籤 、深度學習 、多目標學習 、多標籤分類 | 
| 外文關鍵詞: | Music Auto-tagging, Deep Learning, Multi-task Learning, Multi-tag Classification | 
| 相關次數: | 點閱:143 下載:1 | 
| 分享至: | 
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 | 
隨著網路與科技的進步,線上音樂平台與串流音樂蓬勃發展,大量的數位音樂使得使用者面臨資訊過載的問題。為了解決這個問題,這些平台需要利用使用者資訊與輔助資料來建構完善的推薦系統,協助使用者檢索、查詢或發現新的音樂,目前最常用來查詢的方法是使用關鍵字查詢。
關鍵字的查詢中,社交標籤(social tag)被認為能夠幫助推薦系統進行更完善的推薦,然而社交標籤卻面臨標籤稀疏性以及冷啟動(cold start)的問題,使得其幫助推薦系統的成效受限。為了解決這些問題,需要透過自動標籤(auto tagging)系統來補足標籤的不足,達到協助推薦系統的功能。過往的自動標籤的研究中,大多僅使用音頻來進行分析,然而已有許多研究證明了歌詞能夠幫助音樂分類系統取得更多資訊並且提升分類正確率。因此本研究將歌詞納入分類系統中與音頻共同進行特徵擷取,提出一個混合音頻與歌詞的音樂自動標籤系統。
近年來,由於類神經網路的發展,已有不少學者使用類神經網路來進行音頻以及文字特徵的擷取,並也證實其成效。其中,針對歌詞特徵擷取的部分,有不少研究指出考量歌詞的架構能更有效的提取歌詞特徵來完成分類任務。本研究將使用類神經網路的架構來進行音樂的特徵擷取以及自動標籤,針對歌詞特徵擷取,本研究將混合卷積神經網路(convolutional neural network)及循環神經網路(recurrent neural network)的架構進行特徵擷取,以達到擷取歌詞架構特徵的目的。
此外已有研究證實,使用多目標學習的方法能夠藉由學習標籤之間的關聯性達到提升分類表現的目的。本研究將多目標學習的方法應用於歌曲自動標籤之中來進行標籤分類。
經過本研究實驗證實,本研究透過混合音頻與歌詞來進行歌曲自動標籤並且以多目標學習的標籤分類器完成分類任務的方法,比起先前研究中只使用音頻的單目標學習方法有更好的分類表現。
With the development of the Internet and technology, online music platforms and music streaming services are booming, the large number of digital music makes users face the problem of information overloading. In order to solve this problem, these platforms need to construct a comprehensive recommendation system by using user information and meta data to help users in searching, querying or discovering new music.
Social tags are considered to help the music recommendation system to make better recommendations. However, social tags face the problem of tag sparsity and cold start, limiting their effectiveness in helping the recommendation system. To solve these problems, it is necessary to supplement the shortage of the tags through a music auto-tagging system. In the past, most of the research on auto-tagging used only audio for analysis. However, many studies have proved that the lyrics can help the music classification system to obtain more information and improve the classification accuracy.
This study proposed a method of music auto-tagging, which use both audio and lyric for analysis. Besides, we also experimented the different architecture of tag classification, the result shows that the structure using late fusion model and multi-task classification method has the best performance.
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural Machine Translation by Jointly Learning to Align and Translate. arXiv preprint arXiv:1409.0473. 
Bahuleyan, H. (2018). Music Genre Classification Using Machine Learning Techniques. arXiv preprint arXiv:1804.01149. 
Bertin-Mahieux, T., Ellis, D. P., Whitman, B., & Lamere, P. (2011). The Million Song Dataset. Paper presented at the International Society for Music Information Retrieval Conference, Miami,Florida (USA) 
Casey, M. A., Veltkamp, R., Goto, M., Leman, M., Rhodes, C., & Slaney, M. (2008). Content-Based Music Information Retrieval: Current Directions and Future Challenges. IEEE, 96(4), 668-696. 
Chen, Z., Zhan, Z., Shi, W., Chen, W., & Zhang, J. (2016). When Neural Network Computation Meets Evolutionary Computation: A Survey. Paper presented at the International Symposium on Neural Networks, St. Petersburg, Russia.
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. arXiv preprint arXiv:1406.1078. 
Choi, K. (2018). Deep Neural Networks for Music Tagging. Queen Mary University of London.   
Choi, K., Fazekas, G., & Sandler, M. (2016). Automatic Tagging Using Deep Convolutional Neural Networks. arXiv preprint arXiv:1606.00298. 
Choi, K., Fazekas, G., Sandler, M., & Cho, K. (2017). Convolutional Recurrent Neural Networks for Music Classification. Paper presented at the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Choi, K., Lee, J. H., Hu, X., & Downie, J. S. (2016). Music Subject Classification Based on Lyrics and User Interpretations. Paper presented at the ASIS&T Annual Meeting, Copenhagen, Denmark. 
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv preprint arXiv:1412.3555. 
Datta, A. K., Solanki, S. S., Sengupta, R., Chakraborty, S., Mahto, K., & Patranabis, A. (2017). Signal Analysis of Hindustani Classical Music: Springer Singapore.
Delbouys, R., Hennequin, R., Piccoli, F., Royo-Letelier, J., & Moussallam, M. (2018). Music Mood Detection Based On Audio And Lyrics With Deep Neural Net. arXiv preprint arXiv:1809.07276. 
Dieleman, S., & Schrauwen, B. (2014). End-to-End Learning for Music Audio. Paper presented at the IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, Italy.
Downie, J. S. (2003). Music information retrieval. Annual review of information science and technology, 37(1), 295-340. 
Duan, S. F., Zhang, J. L., Roe, P., & Towsey, M. (2014). A Survey of Tagging Techniques for Music, Speech and Environmental Sound. Artificial Intelligence Review, 42(4), 637-661. 
Elman, J. L. (1990). Finding Structure in Time. Cognitive science, 14(2), 179-211. 
Fang, J., Grunberg, D., Litman, D. T., & Wang, Y. (2017). Discourse Analysis of Lyric and Lyric-Based Classification of Music. Paper presented at the International Society for Music Information Retrieval Conference, Suzhou, China.
Fell, M., & Sporleder, C. (2014). Lyrics-Based Analysis and Classification of Music. Paper presented at the International Conference on Computational Linguistics, Dublin, Ireland.
Gossi, D., & Gunes, M. H. (2016). Lyric-Based Music Recommendation. In H. Cherifi, B. Gonçalves, R. Menezes, & R. Sinatra (Eds.), Complex Networks VII: Proceedings of the 7th Workshop on Complex Networks CompleNet 2016 (pp. 301-310). Cham: Springer International Publishing.
Hassan, A., & Mahmood, A. (2018). Convolutional Recurrent Deep Learning Model for Sentence Classification. IEEE Access, 6, 13949-13957. 
Horsburgh, B., Craw, S., & Massie, S. (2015). Learning Pseudo-Tags to Augment Sparse Tagging in Hybrid Music Recommender Systems. Artificial Intelligence Review, 219(C), 25-39. 
Hu, X., Choi, K., & Downie, J. S. (2017). A Framework for Evaluating Multimodal Music Mood Classification. Journal of the Association for Information Science and Technology, 68(2), 273-285. 
Huang, Y., Wang, W., & Wang, L. (2015). Unconstrained Multimodal Multi-Label Learning. Ieee Transactions on Multimedia, 17(11), 1923-1935. 
Huang, Y., Wang, W., Wang, L., & Tan, T. (2013). Multi-Task Deep Neural Network for Multi-Label Learning. Paper presented at the IEEE International Conference on Image Processing, Melbourne, Australia.
Hyung, Z., Park, J.-S., & Lee, K. (2017). Utilizing Context-Relevant Keywords Extracted from a Large Collection of User-Generated Documents for Music Discovery. Information Processing & Management, 53(5), 1185-1200. 
Kaminskas, M., Ricci, F., & Schedl, M. (2013). Location-Aware Music Recommendation Using Auto-tagging and Hybrid Matching. Paper presented at the 7th ACM conference on Recommender systems, Hong Kong, China. 
Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. arXiv preprint arXiv:1408.5882. 
Knees, P., & Schedl, M. (2013). A Survey of Music Similarity and Recommendation from Music Context Data. Acm Transactions on Multimedia Computing Communications and Applications, 10(1), 21. 
Labrosa. (2011a). Last.Fm Dataset. Retrieved from: http://labrosa.ee.columbia.edu/millionsong/lastfm
Labrosa. (2011b). musiXmatch dataset. Retrieved from: http://labrosa.ee.columbia.edu/millionsong/musixmatch
Lai, S., Xu, L., Liu, K., & Zhao, J. (2015). Recurrent Convolutional Neural Networks for Text Classification. Paper presented at the Association for the Advancement of Artificial Intelligence, Austin Texas, USA.
Lamere, P. (2008). Social Tagging and Music Information Retrieval. Journal of New Music Research, 37(2), 101-114. 
Lauren, P., Qu, G., Yang, J., Watta, P., Huang, G.-B., & Lendasse, A. (2018). Generating Word Embeddings from an Extreme Learning Machine for Sentiment Analysis and Sequence Labeling Tasks. Cognitive Computation, 10(4), 625-638. doi:10.1007/s12559-018-9548-y
Lee, J., & Nam, J. (2017). Multi-Level and Multi-Scale Feature Aggregation Using Pretrained Convolutional Neural Networks for Music Auto-Tagging. IEEE signal processing letters, 24(8), 1208-1212. 
Levy, O., Goldberg, Y., & Dagan, I. (2015). Improving Distributional Similarity with Lessons Learned from Word Embeddings. Transactions of the Association for Computational Linguistics, 3, 211-225. 
Lipton, Z. C., Berkowitz, J., & Elkan, C. (2015). A Critical Review of Recurrent Neural Networks for Sequence Learning. arXiv preprint arXiv:1506.00019. 
Liu, K., Li, Y., Xu, N., & Natarajan, P. (2018). Learn to Combine Modalities in Multimodal Deep Learning. arXiv preprint arXiv:1805.11730. 
Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., & Alsaadi, F. E. (2017). A Survey of Deep Neural Network Architectures and Their Applications. Neurocomputing, 234, 11-26. 
Malheiro, R., Panda, R., Gomes, P., & Paiva, R. P. (2018). Emotionally-Relevant Features for Classification and Regression of Music Lyrics. IEEE Transactions on Affective Computing(2), 240-254. 
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781. 
Murthy, Y. V. S., & Koolagudi, S. G. (2018). Content-Based Music Information Retrieval (CB-MIR) and Its Applications toward the Music Industry: A Review. ACM Computing Surveys, 51(3), 1-46. 
Nematzadeh, A., Meylan, S. C., & Griffiths, T. L. (2017). Evaluating Vector-Space Models of Word Representation, or, the Unreasonable Effectiveness of Counting Words Near Other Words. Paper presented at the Cognitive Science Society, London, UK.
Oğul, H., & Kırmacı, B. (2016). Lyrics Mining for Music Meta-Data Estimation. Paper presented at the International Conference on Artificial Intelligence Applications and Innovations, Thessaloniki, Greece.
Panwar, S., Das, A., Roopaei, M., & Rad, P. (2017). A Deep Learning Approach for Mapping Music Genres. Paper presented at the System of Systems Engineering Conference (SoSE), Wakoloa, Hawaii, USA.
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. Paper presented at the Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar.
PwC. (2017). Perspectives from the Global Entertainment and Media Outlook 2017–2021.   Retrieved from https://www.pwc.com/gx/en/entertainment-media/pdf/outlook-2017-curtain-up.pdf
ŘEHŮŘEK, R. (2014). Making sense of word2vec.   Retrieved from https://rare-technologies.com/making-sense-of-word2vec/
Scaringella, N., Zoia, G., & Mlynek, D. (2006). Automatic Genre Classification of Music Content: A Survey. IEEE Signal Processing Magazine, 23(2), 133-141. 
Schedl, M., Gómez, E., & Urbano, J. (2014). Music Information Retrieval: Recent Developments and Applications. Foundations and Trends® in Information Retrieval, 8(2-3), 127-261. 
Schuster, M., & Paliwal, K. K. (1997). Bidirectional Recurrent Neural Networks. IEEE Transactions on Signal Processing, 45(11), 2673-2681. 
Song, G., Wang, Z., Han, F., Ding, S., & Iqbal, M. A. (2018). Music Auto-Tagging Using Deep Recurrent Neural Networks. Neurocomputing, 292, 104-110. 
Tarwani, K. M., & Edem, S. (2017). Survey on Recurrent Neural Network in Natural Language Processing. International Journal of Engineering Trends and Technology, 48(6), 301-304. 
Tsaptsinos, A. (2017). Lyrics-Based Music Genre Classification Using a Hierarchical Attention network. arXiv preprint arXiv:1707.04678. 
Van Den Oord, A., Dieleman, S., & Schrauwen, B. (2014). Transfer Learning by Supervised Pre-Training for Audio-Based Music Classification. Paper presented at the the International Society for Music Information Retrieval Taipei, Taiwan.
Wang, S. Y., Wang, Y. C., Yang, Y. H., & Wang, H. M. (2014). Towards Time-Varying Music Auto-Tagging Based on CAL500 Expansion. Paper presented at the 2014 IEEE International Conference on Multimedia and Expo (ICME).
Yang, Y. H., & Liu, J. Y. (2013). Quantitative Study of Music Listening Behavior in a Social and Affective Context. Ieee Transactions on Multimedia, 15(6), 1304-1315. 
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016). Hierarchical Attention Networks for Document Classification. Paper presented at the Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
Zhang, M., & Zhou, Z. (2014). A Review on Multi-Label Learning Algorithms. IEEE transactions on knowledge and data engineering, 26(8), 1819-1837. 
Zhang, Y., & Yang, Q. (2017). A Survey on Multi-Task Learning. arXiv preprint arXiv:1707.08114. 
Zhuang, N., Yan, Y., Chen, S., Wang, H., & Shen, C. (2018). Multi-Label Learning Based Deep Transfer Neural Network for Facial Attribute Classification. Pattern Recognition, 80, 225-240. 
Zuo, Y., Zeng, J., Gong, M., & Jiao, L. (2016). Tag-Aware Recommender Systems Based on Deep Neural Networks. Neurocomputing, 204, 51-60.