| 研究生: |
王柏鈞 Wang, Po-Chun |
|---|---|
| 論文名稱: |
使用音樂樹狀轉換器於節奏結構分析 Rhythmic Structure Analysis using Music Tree Transformer |
| 指導教授: |
蘇文鈺
Su, Wen-Yu |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 英文 |
| 論文頁數: | 45 |
| 中文關鍵詞: | 音樂結構分析 、階層式音樂 、機器學習 、轉換器 、音樂樹狀轉換器 、非監督式學習 |
| 外文關鍵詞: | Music Structure Analysis, Hierarchical Music Structure, Machine Learning, Transformer, Music Tree Transformer, Unsupervised Learning |
| 相關次數: | 點閱:123 下載:10 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
電腦自動進行音樂的結構分析有助於音樂的風格分類、轉換與生成等應用。舉例 來說,David Cope 延伸 Heinrich Schenker 分析模型的 SPEAC 理論,將音樂的結構 以五個不同的元素加以分析,分別是:敘述 (Statement)、準備 (Preparation)、延伸 (Extension)、前樂句 (Antecedent) 與後樂句 (Consequent) 來延伸和聲學上緊張/鬆弛的 進行關係。而生成音樂理論 (Generative Theory of Tonal Music) 則透過四種包含時值、 力度、音高等面向的音樂描述來構成樂曲的階層結構。這些理論都有助於發展類似 語言中的文法的規則,進以分析樂句。
近年來,上述音樂理論藉由機器學習的輔助,進而協助電腦自動分析樂曲並加以應 用。在本篇論文中,我們將以基於自然語言常用的轉換器 (Transformer) 模型,堆疊 構成音樂樹狀轉換器 (Music Tree Transformer, MTT) 來分析樂譜。我們以節奏作為最 主要的輸入資訊,以非監督式學習的方式進行訓練,最終產 生如同 GTTM 的音樂樹狀結構。
為了驗證模型的表現,我們將比較 MTT 與 GTTM 的結構,與基於 MTT 的輸出做主 觀聽力測試。我們期待未來若加入音高、和弦等資訊,能有更好的結果。
Computer automatic music structure analysis is important to application such as genre clas sification, style transfer, music generation and so on. For example, David Cope proposed a functional analytic system called SP EAC, which is extended from Heinrich Schenker’s the ory. SPEAC analyzes music in five different elements: Statement, Preparation, Extension, Antecedent, and Consequent. They are related to the functional harmony. The Generative Theory of Tonal Music (GTTM) constructs the hierarchy structure of music based on four dif ferent rules based on tempo, velocity and pitch etc. These theories are very useful to develop grammarlike rules and analytic musical phrasing.
Recently, machine learning methods are combined with the works above. In this work, a Music Tree Transformer (MTT) based on the Transformer widely used in NLP is proposed with encouraging results. MTT is stacked by several encoders and trained under unsupervised learning. Rhythm is the major input of this work, and the MTT could build a musical tree structure as GTTM’s timespan tree in the inference stage.
To verify the performance, the parsing tree of the MTT is compared with GTTM’s hierarchy structure. Furthermore, the subjective listening test is taken for the rendition of the MTT parsing tree. The experimental results are encouraging. It is noted that MTT can also include information such as note pitches, chord or even the GTTM rules as the training data in the future.
[1] R.Cooper,“Abstractstructureandtheindianrāgasystem,”Ethnomusicology,pp.1–32, 1977.
[2] S. Feld, “A generative theory of tonal music,” 1984.
[3] H. Schenker, Free Composition: Volume III of new musical theories and fantasies, vol. 3. Pendragon Press, 2001.
[4] J. T. Titon, Early downhome blues: A musical and cultural analysis. UNC Press Books, 2014.
[5] D. Deutsch and J. Feroe, “The internal representation of pitch sequences in tonal mu¬ sic.,” Psychological review, vol. 88, no. 6, p. 503, 1981.
[6] G. Widmer, “Getting closer to the essence of music: The con espressione manifesto,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 8, no. 2, pp. 1–13, 2016.
[7] A. D. Patel, Music, language, and the brain. Oxford university press, 2010.
[8] P. Loui and D. Wessel, “Learning and liking an artificial musical system: Effects of set size and repeated exposure,” Musicae Scientiae, vol. 12, no. 2, pp. 207–230, 2008.
[9] P. Loui, D. L. Wessel, and C. L. H. Kam, “Humans rapidly learn grammatical structure in a new musical scale,” Music perception, vol. 27, no. 5, pp. 377–388, 2010.
[10] C.Palmer,“Musicperformance,”Annualreviewofpsychology,vol.48,no.1,pp.115–138, 1997.
[11] L. Barwick, “Creative (ir) regularities: the intermeshing of text and melody in per¬ formance of central australian song,” Australian Aboriginal Studies, no. 1, pp. 12–28, 1989.
[12] M. Clayton, “Le mètre et le tāl dans la musique de l'inde du nord,” Cahiers d'ethno¬ musicologie. Anciennement Cahiers de musiques traditionnelles, no. 10, pp. 169–189, 1997.
[13] C. Drake, “Psychological processes involved in the temporal organization of complex auditory sequences: Universal and acquired processes,” Music perception, vol. 16, no. 1, pp. 11–26, 1998.
[14] C. Drake and J. B. El Heni, “Synchronizing with music: Intercultural differences,” Annals of the new york academy of sciences, vol. 999, no. 1, pp. 429–437, 2003.
[15] Y. Nan, T. R. Knösche, and A. D. Friederici, “The perception of musical phrase struc¬ ture: a cross¬cultural erp study,” Brain research, vol. 1094, no. 1, pp. 179–191, 2006.
[16] B. McFee, O. Nieto, M. M. Farbood, and J. P. Bello, “Evaluating hierarchical structure in music annotations,” Frontiers in psychology, vol. 8, p. 1337, 2017.
[17] L. Bigo, M. Giraud, R. Groult, N. Guiomard¬Kagan, and F. Levé, “Sketching sonata form structure in selected classical string quartets,” in ISMIR 2017¬International Society for Music Information Retrieval Conference, 2017.
[18] A. T. Katz, “Heinrich schenker’s method of analysis,” The Musical Quarterly, vol. 21, no. 3, pp. 311–329, 1935.
[19] E. Narmour, The analysis and cognition of basic melodic structures: The implication¬ realization model. University of Chicago Press, 1990.
[20] E.Cambouropoulos,“Thelocalboundarydetectionmodel(lbdm)anditsapplicationin the study of expressive timing,” in ICMC, p. 8, 2001.
[21] D. Temperly, “The melisma music analyzer,” http://www. link. cs. cmu. edu/music¬ analysis/, 2003.
[22] M. Hamanaka, K. Hirata, and S. Tojo, “Atta: Automatic time¬span tree analyzer based on extended gttm.,” in ISMIR, vol. 5, pp. 358–365, 2005.
[23] M. Hamanaka, K. Hirata, and S. Tojo, “Gttm database and manual time¬span tree gen¬ eration tool,” in Proc. the 15th Sound and Music Computing Conference (SMC2018), pp. 462–467, 2018.
[24] M. Hamanaka, K. Hirata, and S. Tojo, “deepgttm¬i&ii: Local boundary and metrical structure analyzer based on deep learning technique,” in International Symposium on Computer Music Multidisciplinary Research, pp. 3–21, Springer, 2016.
[25] M. Hamanaka, K. Hirata, and S. Tojo, “deepgttm¬iii: Multi¬task learning with grouping and metrical structures,” in International Symposium on Computer Music Multidisci¬ plinary Research, pp. 238–251, Springer, 2017.
[26] Y.¬R. Lai and A. W.¬Y. Su, “Deep learning based detection of gpr6 gttm global feature rule of music scores,” in Proceedings of the 8th International Conference on New Music Concepts, 2021.
[27] S. Sawada, K. Yoshii, and K. Hirata, “Unsupervised melody segmentation based on a nested pitman¬yor language model,” in Proceedings of the 1st Workshop on NLP for Music and Audio (NLP4MusA), pp. 59–63, 2020.
[28] T. Hirai and S. Sawada, “Melody2vec: Distributed representations of melodic phrases based on melody segmentation,” Journal of Information Processing, vol. 27, pp. 278– 286, 2019.
[29] E. Nakamura, M. Hamanaka, K. Hirata, and K. Yoshii, “Tree¬structured probabilis¬ tic model of monophonic written music based on the generative theory of tonal mu¬ sic,” 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 276–280, 2016.
[30] M. T. Pearce, D. Müllensiefen, and G. A. Wiggins, “Melodic grouping in music infor¬ mation retrieval: New methods and applications,” in Advances in music information retrieval, pp. 364–388, Springer, 2010.
[31] Y.Guan,J.Zhao,Y.Qiu,Z.Zhang,andG.Xia,“Melodicphrasesegmentationbydeep neural networks,” arXiv preprint arXiv:1811.05688, 2018.
[32] K. Ullrich, J. Schlüter, and T. Grill, “Boundary detection in music structure analysis using convolutional neural networks.,” in ISMIR, pp. 417–422, 2014.
[33] J. Devlin, M.¬W. Chang, K. Lee, and K. Toutanova, “Bert: Pre¬training of deep bidi¬ rectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
[34] Y. Zhuang, Y. Chen, and J. Zheng, “Music genre classification with transformer classi¬ fier,” in Proceedings of the 2020 4th International Conference on Digital Signal Pro¬ cessing, pp. 155–159, 2020.
[35] C.¬Z. A. Huang, A. Vaswani, J. Uszkoreit, N. Shazeer, I. Simon, C. Hawthorne, A. M. Dai, M. D. Hoffman, M. Dinculescu, and D. Eck, “Music transformer,” arXiv preprint arXiv:1809.04281, 2018.
[36] Y.¬S. Huang and Y.¬H. Yang, “Pop music transformer: Generating music with rhythm and harmony,” arXiv preprint arXiv:2002.00212, 2020.
[37] T. Li, J. Chen, H. Hou, and M. Li, “Tf¬attention¬net: An end to end neural network for singing voice separation.,” CoRR, 2019.
[38] A.Vaswani,N.Shazeer,N.Parmar,J.Uszkoreit,L.Jones,A.N.Gomez,L.Kaiser,and I. Polosukhin, “Attention is all you need,” arXiv preprint arXiv:1706.03762, 2017.
[39] Y.¬S. Wang, H.¬Y. Lee, and Y.¬N. Chen, “Tree transformer: Integrating tree structures into self¬attention,” arXiv preprint arXiv:1909.06639, 2019.
[40] M. Hamanaka, K. Hirata, and S. Tojo, “Musical structural analysis database based on gttm,” 2014.
[41] E. T. Cone, “Musical form and musical performance,” 1968.
[42] F. Lerdahl and R. S. Jackendoff, A Generative Theory of Tonal Music, reissue, with a new preface. MIT press, 1996.
[43] T. Eerola, A. Friberg, and R. Bresin, “Emotional expression in music: contribution, lin¬ earity, and additivity of primary musical cues,” Frontiers in psychology, vol. 4, p. 487, 2013.
[44] D. Blum, Casals and the Art of Interpretation. University of California Press, 2015.
[45] G. Chew, “Articulation and phrasing,” The New Grove Dictionary of Music and Musicians, vol. 2, pp. 86–89, 2001.
[46] mutopiaproject, “Mutopia Project ¬ Homepage.” http://www.mutopiaproject.org/.
[47] K. Hirata, S. Tojo, and M. Hamanaka, “Techniques for implementing the generative theory of tonal music,” ISMIR, 2007.