簡易檢索 / 詳目顯示

研究生: 柳譯筑
Liu, Yi-Chu
論文名稱: 生成音樂演奏表情於巴哈無伴奏小提琴
Generative Music Performance Expression of Bach’s Sonatas and Partitas for Solo Violin
指導教授: 蘇文鈺
Su, Wen-Yu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 人工智慧科技碩士學位學程
Graduate Program of Artificial Intelligence
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 44
中文關鍵詞: 機器學習巴哈無伴奏小提琴音樂生成模擬樂器演奏條件變分自編碼器
外文關鍵詞: machine learning, Bach Solo violin, music generation, imitation of instrument playing, conditional variational autoencoder
相關次數: 點閱:106下載:27
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 音樂和語言有許多相似之處,像是他們都可以用符號去表達,還有我們透過聲波震動來傳遞他們。隨著自然語言處理中機器學習的成功發展,音樂家和研究人員也開始將深度學習應用於音樂創作。近年來,音樂創作一直是熱門研究主題,大致上可以將他分成兩個問題:自動音樂生成和模仿真實樂器演奏。最近許多自動音樂生成的研究都已經取得了良好的發展,相較之下,從樂譜或是沒有音樂情感的音樂符號中直接模仿人類演奏樂器仍然不夠成熟。
    而在模仿真實樂器演奏的研究領域中,主要又能將他們分為兩種類型:樂譜轉音頻和音頻轉音頻。通常在大多數音頻訊號相關的研究中不會進行太多與音樂有關的處理,而是利用機器學習直接進行端點對端點的學習,讓模型自己去學習音樂在符號領域和聲音訊號之間的關係。
    在巴赫的一些作品中,由於缺乏明確的演奏指示,演奏者可以自由詮釋和表達他們對音樂的個人理解。透過希拉蕊·哈恩 (Hilary Hahn) 和內森·米爾斯坦 (Nathan Milstein) 的演奏紀錄,我們使用了由先前研究提供的《巴哈無伴奏小提琴演奏錄音之資料集》。在本研究中,我們的目標是探索用於合成小提琴音樂的參數之推導方法,並希望通過使用生成模型捕捉和重現演奏者演奏中的情感表達。
    與先前的研究相比,我們修正了時間標記還有改進了音量的特徵。另外,我們提出了一種將音量特徵從音訊波形對應到 MIDI 參數的方法。最後,我們介紹了一個 LSTM-CVAE 生成模型來生成與時間和音量相關的音樂家表情參數。
    我們進行了七級李克特量表聆聽測試 (7-point Likert scale listening test) 來評估我們的生成結果,並使用皮爾森測試 (Pearson Test) 客觀地評估了所提出模型的性能。

    There are many similarities between music and language, such as they can both be expressed through symbols, and both be transmittable through sound wave vibrations. With the successful development of machine learning in natural language processing, musicians and researchers have also begun to apply deep learning to music. Music creation has been a popular research topic in recent years, and it can be roughly divided into two issues: automatic music generation and imitation of instrument playing. In recent years, there have been significant advancements in the field of automatic music generation. Directly imitating musician playing instrument from non-expressive sheet music, on the other hand, is still an area that received relatively less attention and remains underdeveloped.
    In the research domain of imitating real instrument playing, it is mainly divided into two types: score-to-audio and audio-to-audio. In most audio-related studies, researchers often do not do much music-related processing and instead directly perform end-to-end mapping, allowing the model to learn the relationship between music expressed in symbolic format and the sound signal on its own.
    In some of Bach's works, there is a lack of clear playing instructions, and players are free to express their personal interpretations. By leveraging the recordings made by Hilary Hahn and Nathan Milstein, the Dataset of Bach Sonatas and Partitas for solo Violin is provided by previous research, in this study, our aim is to explore methods for deriving the parameters for synthesizing violin music. Additionally, we aim to capture and reproduce the emotional expression demonstrated by human performers through the utilization of a generative model. Hence, the emotional expression of violinists could be reproduced.
    We present the improvements to the temporal annotations and volume features compared to previous studies. Furthermore, we propose a method for mapping volume features from the waveform to MIDI parameters. Finally, we introduce a LSTM-CVAE generative model to generate musician performance parameters related to both timing and volume.
    The 7-point Likert scale subjective listening tests to evaluate our generated result are conducted. We also objectively assessed the performance of the proposed model using the Pearson test.

    Abstract ii List of Tables vi List of Figures vii Introduction 1 Related Works 6 2.1 DATASET OF BACH SOLO VIOLIN WORKS 6 2.2 SCORE AND PERFORMANCE FEATURES FOR GENERATIVE EXPRESSIVE MUSIC PERFORMANCES 8 2.3 CONDITIONAL VAE 9 2.4 LONG SHORT-TERM MEMORY 12 Research Methods 14 3.1 DATA PREPROCESSING 14 3.2 TIME 17 3.3 VELOCITY 19 3.4 SCORE AND PERFORMANCE FEATURE ENCODING 23 3.5 THE PROPOSED GENERATED MODEL 25 Experiments 29 4.1 EXPERIMENTAL SETUP 29 4.2 MODEL CONFIGURATION 29 4.3 EXPERIMENT RESULTS 30 4.3.1 PEARSON TEST 30 4.3.2 OBJECTIVE EVALUATION 32 4.3.3 LISTENING TEST 34 Conclusion and Future works 37 5.1 CONCLUSION 37 5.2 FUTURE WORKS 37 References 39

    [1] Bach: Sonatas and Partitas for Solo Violin sheet. https://reurl.cc/b7Odoo
    [2] Wu, Shang-Yeh. Dataset of Bach Sonata and Partita for Solo Violin Records. [Master thesis], National Cheng Kung University. 2022.
    [3] Dasaem Jeong, Taegyun Kwon, Yoojin Kim, Kyogu Lee, and Juhan Nam. Virtuosonet: A hierarchical rnn-based system for modeling expressive piano performance. In ISMIR, pages 908–915, 2019.
    [4] Z. Wang, K. Chen, J. Jiang, Y. Zhang, M. Xu, S. Dai, X. Gu, and G. Xia, “POP909: A Pop-song dataset for music arrangement generation,” in Proc. Int. Society for Music Information Retrieval Conf., 2020, pp. 38– 45.
    [5] F. Foscarin, A. McLeod, P. Rigaux, F. Jacquemard, and M. Sakai, “ASAP: a dataset of aligned scores and performances for piano transcription,” in ISMIR 2020-21st International Society for Music Information Retrieval, pages 534–541, 2020.
    [6] H.-T. Hung, J. Ching, S. Doh, N. Kim, J. Nam, and Y.-H. Yang, “EMOPIA: A multi-modal pop piano dataset for emotion recognition and emotion-based music generation,” in Proc. Int. Soc. Music Information Retrieval Conf., 2021, accepted for publication.
    [7] W.-Y. Hsiao, J.-Y. Liu, Y.-C. Yeh, and Y.-H. Yang, “Compound Word Transformer: Learning to compose full-song music over dynamic directed hypergraphs,” in Proc. AAAI, 2021.
    [8] Sohn, K., Lee, H. & Yan, X. Learning structured output representation using deep conditional generative models. In Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M. & Garnett, R. (eds.) Advances in Neural Information Processing Systems 28, 3483–3491 (Curran Associates, Inc., 2015).
    [9] Bryan Wang and Yi-Hsuan Yang. Performancenet: Score-to-audio music generation with multi-band convolutional residual network. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 1174–1181, 2019.
    [10] Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.
    [11] Aaron van den Oord, A., Kalchbrenner, N., & Kavukcuoglu, K. Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759, 2016.
    [12] Carlos Eduardo Cancino Chacón and Maarten Grachten. An evaluation of score descriptors combined with non-linear models of expressive dynamics in music. In International Conference on Discovery Science, pages 48–62. Springer, 2015.
    [13] Kirke, A., and Miranda, E. R. (eds.). (2013). “An overview of computer systems for expressive music performance,” in Guide to Computing for Expressive Music Performance (London: Springer-Verlag), 1–48. (The process of extracting score features corresponds to the Music/Analysis module)
    [14] Dasaem Jeong, Taegyun Kwon, Yoojin Kim, and Juhan Nam. Score and performance features for rendering expressive music performances. In Proc. f Music Encoding Conf., 2019.
    [15] Eita Nakamura, Kazuyoshi Yoshii, and Haruhiro Katayose. Performance error detection and post-processing for fast and accurate symbolic music alignment. In 18th International Society for Music Information Retrieval Conference (ISMIR), 2017.
    [16] Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
    [17] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780.
    [18] Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11):2673–2681.
    [19] Musescore.com | The world's largest free sheet music catalog and community
    [20] F. Simonetta, C. E. C. Chacon, S. Ntalampiras, and G. Widmer, “A ´ convolutional approach to melody line identification in symbolic scores,” in Proc. Int. Soc. Music Information Retrieval Conf., 2019, pp. 924–931.
    [21] Martín, R., Mollineda, R.A., García, V. (2009). Melodic Track Identification in MIDI Files Considering the Imbalanced Context. In: Araujo, H., Mendonça, A.M., Pinho, A.J., Torres, M.I. (eds) Pattern Recognition and Image Analysis. IbPRIA 2009. Lecture Notes in Computer Science, vol 5524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02172-5_63
    [22] W. Chai and B. Vercoe, “Melody retrieval on the web,” Multimedia Computing and Networking, 2001.
    [23] L. Li, C. Junwei, W. Lei and M. Yan, "Melody Extraction from Polyphonic MIDI Files Based on Melody Similarity," 2008 International Symposium on Information Science and Engineering, Shanghai, China, 2008, pp. 232-235, doi: 10.1109/ISISE.2008.228.
    [24] Z. Jiang, "Automatic Analysis of Music in Standard MIDI Files", Music and Technology Carnegie Mellon University Pittsburgh, 2019.
    [25] MIDI Files of Bach Sonata and Partita for Solo Violin. http://www.jsbach.net/MIDI/MIDI_solo_violin.html
    [26] Oore, S., Simon, I., Dieleman, S., Eck, D., and Simonyan, K. (2018). This time with feeling: learning expressive musical performance. Neural Comput. Appl. 32, 955–967. doi: 10.1007/s00521-018-3758-9
    [27] Friberg, A., Bresin, R., and Sundberg, J. (2006). Overview of the KTH rule system for musical performance. Adv. Cogn. Psychol. 2, 145–161. doi: 10.2478/v10053-008-0052-x
    [28] Okumura, K., Sako, S., and Kitamura, T. (2014). “Laminae: a stochastic modeling-based autonomous performance rendering system that elucidates performer characteristics,” in Joint Proceedings of the 40th International Computer Music Conference (ICMC 2014) and the 11th Sound and Music Computing Conference (SMC 2014) (Athens), 1271–1276.
    [29] Chou, Y.H., Chen, I., Chang, C.J., Ching, J., Yang, Y.H., et al.: MIDIbert-piano: Large-scale pre-training for symbolic music understanding. arXiv:2107.05223 (2021)
    [30] Carlos Eduardo Cancino-Chac´on, Maarten Grachten, Werner Goebl, and Gerhard Widmer. Computational models of expressive music performance: A comprehensive and critical review. Frontiers in Digital Humanities, 5:25, 2018.
    [31] R. B. Dannenberg, “The interpretation of MIDI velocity,” in International Computer Music Conference (ICMC), 2006.
    [32] Li, W., & Yang, X. (2019). Loudness control in MIDI-driven orchestral music synthesis. IEEE Transactions on Multimedia, 21(7), 1685-1698. doi: 10.1109/TMM.2019.2902267
    [33] Goebl, Werner, “The role of timing and intensity in the production and perception of melody in expressive piano performance,” PhD Thesis, 2003.
    [34] Rodrigo Castellon, Chris Donahue, and Percy Liang. Towards realistic MIDI instrument synthesizers. In NeurIPS Workshop on Machine Learning for Creativity and Design (2020), 2020.
    [35] Rosin Tech. OrchestraVST synthesizer. https://rosin-tech.com/virtual-performer
    [36] Fundamental frequency. https://en.wikipedia.org/wiki/Pitch_(music)
    [37] Shulei Ji, Xinyu Yang, and Jing Luo. 2023. A Survey on Deep Learning for Symbolic Music Generation: Representations, Algorithms, Evaluations, and Challenges. ACM Comput. Surv. (may 2023). https: //doi.org/10.1145/3597493 Just Accepted.
    [38] Wen-Yi Hsiao, Jen-Yu Liu, Yin-Cheng Yeh, and Yi-Hsuan Yang. 2021. Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs. Proceedings of the AAAI
    Conference on Artificial Intelligence 35, 1 (May 2021), 178–186. https://doi.org/10.1609/aaai.v35i1.1
    [39] MIDI Files of Bach Sonata and Partita for Solo Violin. [n. d.]. http://www.jsbach.net/midi/midi_solo_violin.html.
    [40] Colin Raffel and Daniel P. W. Ellis. 2014. pretty_midi. https://github. com/craffel/pretty-midi

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE