| 研究生: |
蕭佑丞 Siao, You-Cheng |
|---|---|
| 論文名稱: |
使用演算法生成之樂譜偵測GTTM局部邊界規則 Detection of GTTM Local Boundary Rules by Using Algorithmically Generated Music Scores |
| 指導教授: |
蘇文鈺
Su, Wen-Yu |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 英文 |
| 論文頁數: | 41 |
| 中文關鍵詞: | 自動化樂譜分析 、人工數據生成 、音樂邊界偵測 、調性音樂生成理論 、機器學習 、雙向長短期記憶網路 、序列對序列自編碼器 |
| 外文關鍵詞: | Automated Symbolic Music Analysis, Synthetic Data Generation, Music Boundary Detection, A Generative Theory of Tonal Music, Machine Learning, Bidirectional Long Short-Term Memory Networks, Sequence-to-Sequence Autoencoder |
| 相關次數: | 點閱:202 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
局部邊界規則是調性音樂生成理論 (generative theory of tonal music, GTTM) 中一些特定的分組偏好規則 (grouping preference rules, GPRs),可以用來尋找一首音樂中的局部邊界。偵測符合這些局部邊界規則的音符間隔通常是 GTTM 中建構完整分組結構的第一個步驟。這個結構不僅被認為是 GTTM 中用來描述音樂理解最基本的一種組成,它也同時為音樂斷句或生成具表情之音樂等應用提供了大量的資訊。
然而,缺少訓練資料使得我們很難使用機器學習的力量來偵測這些規則。因此在這篇論文中,我們提出了一個可以大量生成具標記的樂譜的方法來減少收集資料以及人工標記所需要耗費的時間。藉由這些樂譜,我們使用監督式學習來訓練可以偵測 GPR2 與 GPR3 的雙向長短期記憶 (bidirectional long short-term memory, BLSTM) 網路;另一方面,我們也透過非監督式學習來訓練自編碼器,這些自編碼器可以用來生成音樂片段之間不受轉調影響的距離,使我們可以用這些距離來辨識 GPR6。
實驗的結果顯示,使用我們生成的資料來訓練的 BLSTM 網路可以比只使用現存經人工標記的資料來訓練的 BLSTM 網路表現得更好且更穩定,並且我們訓練的自編碼器可以生成合理的距離來幫助我們辨識 GPR6,更重要的是,我們的方法可以比現存的模型在辨識人工標記的 GPRs 上表現得更好。
Local boundary rules are some specific grouping preference rules (GPRs) that can be used to find the local boundaries of a piece of music in the generative theory of tonal music (GTTM). Detecting the note intervals that these local boundary rules should be applied to is usually the first step to construct the whole grouping structure in GTTM. This structure is not only considered as the most basic component of musical understanding in GTTM, but also provides a lot of useful information for applications such as musical phrasing or expressive music rendering.
However, the lack of training data makes it difficult to accurately detect these rules with the power of machine learning. Therefore, in this paper, we describe a procedure that can algorithmically generate a large amount of labeled music scores as training data to reduce the effort in data collecting and manual labeling. By using these data, we apply supervised learning to train bidirectional long short-term memory (BLSTM) networks that can detect GPR2 and GPR3. On the other hand, we also apply unsupervised learning to train autoencoders. These autoencoders can generate transposition-invariant distances between musical sequences, which can be used to detect GPR6.
The experimental results show that the BLSTM networks trained by our generated data can perform better and be more robust than the BLSTM networks only trained by existing manually labeled data. Our trained autoencoders also provide reasonable transposition-invariant distances to help us detect GPR6. Moreover, our method can outperform the existing models on manually labeled GPRs.
[1] Shahin Amiriparian, Michael Freitag, Nicholas Cummins, and Björn Schuller. Sequence to sequence autoencoders for unsupervised representation learning from audio. In Proceedings of the DCASE 2017 Workshop, 2017.
[2] Ching-Yeh Chen. Automated phrase analysis of sonatas. Master’s thesis, National Cheng Kung University, 2019.
[3] Tsung-Ping Chen and Li Su. Harmony transformer: Incorporating chord segmentation into harmony recognition. In Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR 2019), pages 259–267, 2019.
[4] Tsung-Ping Chen, Li Su, et al. Functional harmony recognition of symbolic music data with multi-task recurrent neural networks. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR 2018), pages 90–97, 2018.
[5] Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
[6] Yu-An Chung, Chao-Chung Wu, Chia-Hao Shen, Hung-Yi Lee, and Lin-Shan Lee. Audio word2vec: Unsupervised learning of audio segment representations using sequence-to-sequence autoencoder. In INTERSPEECH, pages 410–415, 2016.
[7] Michael Scott Cuthbert and Christopher Ariza. music21: A toolkit for computer-aided musicology and symbolic music data. In Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR 2010), pages 637–642, 2010.
[8] Johanna Devaney, Claire Arthur, Nathaniel Condit-Schultz, and Kirsten Nisula. Theme and variation encodings with roman numerals (tavern): A new data set for symbolic music analysis. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR 2015), pages 728–734, 2015.
[9] Michael Good. Musicxml for notation and analysis. The Virtual Score: Representation, Retrieval, Restoration, 12:113–124, 2001.
[10] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT press, 2016.
[11] Jianping Gou, Hongxing Ma, Weihua Ou, Shaoning Zeng, Yunbo Rao, and Hebiao Yang. A generalized mean distance-based k-nearest neighbor classifier. Expert Systems with Applications, 115:356–372, 2019.
[12] Ankush Gupta, Andrea Vedaldi, and Andrew Zisserman. Synthetic data for text localization in natural images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), pages 2315–2324, 2016.
[13] Gaëtan Hadjeres and Frank Nielsen. Deep rank-based transposition-invariant distances on musical sequences. arXiv preprint arXiv:1709.00740, 2017.
[14] Gaëtan Hadjeres, François Pachet, and Frank Nielsen. Deepbach: a steerable model for bach chorales generation. In Proceedings of the 34th International Conference on Machine Learning (ICML 2017), volume 70, pages 1362–1371. JMLR. org, 2017.
[15] Tilo Hähnel and Axel Berndt. Expressive articulation for synthetic music performances. In Proceedings of the 10th International Conference on New Interfaces for Musical Expression (NIME 2010), pages 277–282, 2010.
[16] Masatoshi Hamanaka, Keiji Hirata, and Satoshi Tojo. Implementing “a generative theory of tonal music". Journal of New Music Research, 35(4):249–277, 2006.
[17] Masatoshi Hamanaka, Keiji Hirata, and Satoshi Tojo. Musical structural analysis database based on gttm. In Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR 2014), pages 325–330, 2014.
[18] Masatoshi Hamanaka, Keiji Hirata, and Satoshi Tojo. deepgttm-i:
Local boundaries analyzer based on deep learning technique. In Proceedings of the 12th International Symposium on Computer Music Multidisciplinary Research (CMMR 2016), pages 6–20, 2016.
[19] Masatoshi Hamanaka, Keiji Hirata, and Satoshi Tojo. deepgttm-i&
ii: Local boundary and metrical structure analyzer based on deep learning technique. In Proceedings of the 12th International Symposium on Computer Music Multidisciplinary Research (CMMR 2016), pages 3–21, 2016.
[20] Masatoshi Hamanaka, Keiji Hirata, and Satoshi Tojo. deepgttm-iii:
Multitask learning with grouping and metrical structures. In Proceedings of the 13th International Symposium on Computer Music Multidisciplinary Research (CMMR 2017), pages 238–251, 2017.
[21] Masatoshi Hamanaka and Satoshi Tojo. Interactive gttm analyzer. In Proceedings of the 10th International Society for Music Information Retrieval Conference (ISMIR 2009), pages 291–296, 2009.
[22] Geoffrey E. Hinton. Deep belief networks. Scholarpedia, 4(5):5947, 2009.
[23] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
[24] Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Synthetic data and artificial neural networks for natural scene text recognition. In Workshop on Deep Learning, Advances in Neural Information Processing systems (NIPS 2014), 2014.
[25] Dasaem Jeong, Taegyun Kwon, and Juhan Nam. Virtuosonet: A hierarchical attention rnn for generating expressive piano performance from music score. In NeurIPS 2018 Workshop on Machine Learning for Creativity and Design, 2018.
[26] Hitomi Kaneko, Daisuke Kawakami, and Shigeki Sagayama. Functional harmony annotation database for statistical music analysis. In Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR 2010): Late Breaking session, 2010.
[27] M. Kemal Karaosmanoğlu, Barış Bozkurt, Andre Holzapfel, and Nilgün Doğrusöz Dişiaçık. A symbolic dataset of turkish makam music phrases. In Workshop on Folk Music Analysis (FMA2014), pages 10–14, 2014.
[28] Phillip B Kirlin. A data set for computational studies of schenkerian analysis. In Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR 2014), pages 213–218, 2014.
[29] Fred Lerdahl and Ray S. Jackendoff. A Generative Theory of Tonal Music. MIT press, 1996.
[30] Hyungui Lim, Seungyeon Rhyu, and Kyogu Lee. Chord generation from symbolic melody using blstm networks. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR 2017), pages 621–627, 2017.
[31] Kristen Masada and Razvan C. Bunescu. Chord recognition in symbolic music using semi-markov conditional random fields. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR 2017), pages 272–278, 2017.
[32] Marius Miron, Jordi Janer Mestres, and Emilia Gómez Gutiérrez. Generating data to train convolutional neural networks for classical music source separation. In Proceedings of the 14th Sound and Music Computing Conference (SMC 2017), pages 227–233. Aalto University, 2017.
[33] Markus Neuwirth, Daniel Harasim, Fabian C. Moss, and Martin Rohrmeier. The annotated beethoven corpus (abc): A dataset of harmonic analyses of all beethoven string quartets. Frontiers in Digital Humanities, 5:16, 2018.
[34] Jiyong Oh, Nojun Kwak, Minsik Lee, and Chong-Ho Choi. Generalized mean for feature extraction in one-class classification problems. Pattern Recognition, 46(12):3328–3340, 2013.
[35] Mike Schuster and Kuldip K. Paliwal. Bidirectional recurrent neural networks. IEEE transactions on Signal Processing, 45(11):2673–2681, 1997.
[36] Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27 (NIPS 2014), pages 3104–3112, 2014.
[37] Christopher William White and Ian Quinn. The yale-classical archives corpus. Empirical Musicology Review, 11(1), 2016.
[38] Adrien Ycart, Emmanouil Benetos, et al. A study on lstm networks for polyphonic music sequence modelling. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR 2017), pages 421–427, 2017.
校內:立即公開