簡易檢索 / 詳目顯示

研究生: 楊智弘
Yang, Chih-Hong
論文名稱: 基於機器學習之小提琴演奏表情合成及主觀聆聽評估
Machine Learning Based Expressive Violin Synthesis and its Subjective Listening Evaluation
指導教授: 謝孫源
Hsieh, Sun-Yuan
共同指導教授: 蘇文鈺
Su, Wen-Yu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2016
畢業學年度: 104
語文別: 英文
論文頁數: 42
中文關鍵詞: 音樂表情分析音樂表情情緒音樂表情合成
外文關鍵詞: Expressive Musical Term Analysis and Synthesis
相關次數: 點閱:57下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 音樂家根據他們的經驗在演奏中持續地控制演繹特徵,像是節奏、音量、抖音等等,來傳遞出不同的表情。雖然這些特徵和技巧是因人而異的,但專業小提琴家對於表情的演奏有一定的理解和觀點。我們好奇這些演繹上的細微差異是如何決定出不同表情,以及機器是否能控制這些演繹特徵以接近人類的詮釋,甚至演奏得跟音樂家一樣好。
    本篇論文結合了一些現有的技術提出了一個初步的合成系統,可以藉由控制表情參數,從無表情的演奏自動合成出不同的表情版本。我們根據音樂表情術語的分析,衍伸出一些重要的表情參數作為控制參數。合成音檔由支持向量機分類方法(機器,客觀)以及聽力測試(人類,主觀)評估。客觀評估方面值得一提的是:對於業餘演奏者的樣本,「機器合成版本」的表情特徵會比「學生演繹版本」更加顯著。再者,合成時使用我們的統計方法估出的能量曲線模型也會提升機器分類準確率。
    然而藉由機器的評斷只是第一步,我們也希望合成聲音之表情特性能夠被專家認可。因此我們也做了聽力測試,希望了解以人類的觀點去評估合成音檔,是否可以區分出不同表情。聽力測試結果與機器分類幾乎一致:我們的合成版本比起業餘學生詮釋的版本容易使聽者分辨表情,但合成版本的表情特性仍輸給專業小提琴家的詮釋。
    我們的研究動機是希望暸解人類如何演繹音樂表情,進而使機器也可以像人類一樣自如地操縱各種特徵,以詮釋各種表情。因此本研究之長期目標在於使合成出的表情豐富度接近於人類的演繹,甚至能夠合成出像是已故名演奏家的版本供世人欣賞。

    Musicians continuously manipulate the interpretational factors such as tempo, dynamics, vibrato, etc., by their experiences in order to convey different expressive intensions. Though the characteristics and skills differ from individual to individual, the professional violinists have certain viewpoints and understandings of expressive performance. We are curious about how these nuances human performers interpreted determine distinct expressions and if machine can control those interpretational factors close to or even as well as human beings in music domain in the future.
    This thesis combines some present techniques and delivers a synthesis system to automatically synthesize distinct expressions from deadpan performance by controlling expressive factors. We follow the works of expressive musical term analysis and derive a subset of essential features as the control parameters. The performance of synthetic sounds is evaluated by the Support Vector Machine classification task (machine, objective), and the listening test (human, subjective). Our classification results show that the synthesized results have highly significant differences from the original data of an amateur performer. Moreover, using the energy curve model based on our statistical method greatly increases accuracy.
    As the time we obtain great accuracy from the classification results, we still doubt whether the expressivity of our synthesized sounds can be approved. That is, the machine judgment is just the first step for evaluating our system, and we apply listening test for subjective evaluation. The listening test results are almost consistent with classification results: our synthetic version is easier to distinguish distinct expressions than amateur performance, but still has less expressivity than professional performance.
    Our motivation to this work is to know how human performing expressions, and to let machine manipulate as human performance. The long-term goal is to let our synthetic system perform expressivity as well as human or even can synthesize late violinists’ performance in the future.

    Content LIST OF TABLES VII LIST OF FIGURES VIII CHAPTER 1 INTRODUCTION 1 1.1 BACKGROUND 1 1.2 THIS WORK 4 1.3 ORGANIZATION 5 CHAPTER 2 RELATED WORK 6 2.1 EXPRESSIVE MUSICAL TERM ANALYSIS 6 2.2 ABOUT PHASE VOCODER 8 CHAPTER 3 METHOD 11 3.1 VIBRATO FEATURES 12 3.2 DYNAMIC FEATURES 15 3.3 DURATION FEATURES 21 3.4 CLASSIFICATION 23 CHAPTER 4 EVALUATION 24 4.1 SYNTHETIC RESULTS 24 4.2 OBJECTIVE EVALUATION 27 4.3 SUBJECTIVE EVALUATION 29 4.4 DISCUSSION 32 CHAPTER 5 CONCLUSION AND FUTURE WORK 34 REFERENCES 35 APPENDIX. 41

    References
    [1] M. Barthet, P. Depalle, R. Kronland-Martinet, and S. Ystad, “Analysis-by-synthesis of timbre, timing, and dynamics in expressive clarinet performance,” Music Perception, vol. 28, no. 3, pp. 265–278, 2011. 

    [2] A. Friberg, “Digital audio emotions - an overview of computer analysis and synthesis of emotional expression in music,” in Proc. of the 11th International Conference on Digital Audio Effects, 2008. 

    [3] G. De Poli, A. Rodà, and A. Vidolin, “Note-by-note analysis of the influence of expressive intentions and musical structure in violin performance,” Journal of New Music Research, vol. 27, no. 3, pp. 293–321, 1998. 

    [4] G. Widmer and W. Goebl, “Computational models of expressive music performance: The state of the art,” Journal of New Music Research, vol. 33, no. 3, pp. 203–216, 2004. 

    [5] R. Bresin and G. U. Battel, “Articulation strategies in expressive piano performance analysis of legato, staccato, and repeated notes in performances of the andante movement of mozart’s sonata in g major (k 545),” Journal of New Music Research, vol. 29, no. 3, pp. 211–224, 2000.
    [6] R. Bresin and A. Friberg, “Synthesis and decoding of emotionally expressive music performance,” in Proc. of the IEEE International Conference on Systems, Man, and Cybernetics. IEEE, 1999, vol. 4.
    [7] E. Maestre and E. Gómez, “Automatic characterization of dynamics and articulation of expressive monophonic recordings,” in Proc. of the 118th Audio Engineering Society Convention, 2005.
    [8] G. D’Incà and L. Mion, “Expressive audio synthesis: From performances to sounds,” in Proc. of the 12th International Conference on Auditory Display, 2006. 

    [9] C. Erkut, V. Välimäki, M. Karjalainen, and M. Laurson, “Extraction of physical and expressive parameters for model-based sound synthesis of the classical guitar,” in Proc. of the 108th Audio Engineering Society Convention, 2000. 

    [10] Hu, Xiao and J. Stephen Downie (2007). “Exploring mood metadata: Relationships with genre, artist and usage metadata,” in Proc. of the 16th International Society for Music Information Retrieval Conference
, Vienna, September 23-27, 2007.
    [11] Y.-C. Lin, Y.-H. Yang, and H. H. Chen, “Exploiting Online Music Tags for Music Emotion Classification, ACM Transactions on Multimedia Computing Communications, and Applications (TOMCCAP), vol.7s, no. 1, pp. 1-16, 2011.
    [12] E. Bigand, S.Vieillard, F. Madurell, J. Marozeau, and A. Dacquet. “Multidimensional scaling of emotional responses to music: The effect of musical expertise and of the duration of the excerpts,” Cognition & Emotion, 19(8), 1113-1139, 2005.
    [13] Y.-H. Yang, and H.-H. Chen, “Ranking-based emotion recognition for music organization and retrieval,” in Audio, Speech, and Language Processing, IEEE Transactions on, 19(4), 762-774, 2011.
    [13] P. G. Hunter, E. G. Schellenberg, and U. Schimmack, “Feelings and perceptions of happiness and sadness induced by music: Similarities, differences, and mixed emotions,” in Psychology of Aesthetics, Creativity, and the Arts, 4(1), 47, 2010.
    [14] O. Ladinig, and & E. G. Schellenberg, Liking unfamiliar music: Effects of felt emotion and individual differences. Psychology of Aesthetics, Creativity, and the Arts, 6(2), 146, 2012.
    [15] P. N. Juslin, and P. Laukka, “Expression, perception, and induction of musical emotions: A review and a questionnaire study of everyday listening,” in Journal of New Music Research, 33(3), 217-238, 2004.
    [16] Y.-H. Yang, Y.-C. Lin, Y.-F. Su, and H.-H. Chen, “A regression approach to music emotion recognition.” in Audio, Speech, and Language Processing, in IEEE Transactions on, 16(2), 448-457, 2008.
    [17] Y. E. Kim, E. M. Schmidt, R. Migneco, B. G. Morton, P. Richardson, J. Scott, J. A. Speck and D. Turnbull, “Music emotion recognition: A state of the art review,” In Proc. ISMIR (pp. 255-266), August, 2010.
    [18] J. Kim and E. André, ”Emotion recognition based on physiological changes in music listening.” Pattern Analysis and Machine Intelligence IEEE Transactions on, 30(12), 2067-2083, 2008.
    [19] J. A. Russell, “A circumplex model of affect,” in Journal of Personality and Social Psychology, vol. 39, no. 6, pp. 1161-1178, 1980.
    [20] P.-C. Li, L. Su, Y.-H. Yang, and A.W.Y. Su, “Analysis of expressive musical terms in violin using score-informed and expression-based audio features,” in Proc. of the 16th International Society for Music Information Retrieval Conference, 2015.
    [21] https://en.wikipedia.org/wiki/List_of_Italian_musical_terms_used_in_English
    [22] S. Davies, “Musical Meaning and Expression”, Cornell University Press, 170-184, 1994.
    [23] https://en.wikipedia.org/wiki/Symphony_No._7_(Beethoven)
    [24] M. Mellody and G. H. Wakefield, “The time-frequency characteristics of violin vibrato: Modal distribution analysis and synthesis,” The Journal of the Acoustical Society of America, vol. 107, no. 1, pp. 598–611, 2000. 

    [25] L. Yang, K. Rajab, and E. Chew, “Vibrato performance style: A case study comparing erhu and violin,” in Proc. of the 10th International Conference on Computer Music Multidisciplinary Research, 2013.
    [26] J. Sundberg, “Acoustic and psychoacoustic aspects of vocal vibrato,” Vibrato, pp. 35–62, 1995. 

    [27] A. Röbel and X. Rodet, “Efficient spectral envelope estimation and its application to pitch shifting and envelope preservation,” in Proc. of the 8th International Conference on Digital Audio Effects, 2005. 

    [28] A. Gabrielsson and P. N. Juslin, “Emotional expression in music performance: Between the performer’s intention and the listener’s experience,” Psychology of Music, vol. 24, no. 1, pp. 68–91, 1996. 

    [29] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, pp. 27, 2011. 

    [30] M. R. Robnik-Šikonja and I. Kononenko, “Theoretical and empirical analysis of ReliefF and RReliefF,” Machine Learning, vol. 53, no. 1-2, pp. 23–69, 2003. 

    [31] M. Dolson, “The phase vocoder: A tutorial,” Computer Music Journal, vol. 10, no. 4, pp. 14–27, 1986.
    [32] M.-H. Serra, “Musical signal processing, chapter introducing the phase vocoder,” in Studies on New Music Research. Swets & Zeitlinger, pp. 31–91, 1997.
    [33] M. R. Portnoff, “Implementation of the digital phase vocoder using the fast Fourier transform,” IEEE Transactions on Acoustics, Speech, Signal Processing, vol. ASSP-24, pp. 243-248, June 1976.
    [34] J. L. Flanagan and R. M. Golden, ``Phase vocoder,' Bell System Technical Journal, vol. 45, pp. 1493-1509, Nov. 1966
    [35] https://ccrma.stanford.edu/~jos/sasp/Phase_Vocoder.html
    [36] A. Roebel, “A new approach to transient processing in the phase vocoder,” in 6th International Conference on Digital Audio Effects (DAFx), (London, United Kingdom), pp. 344– 349, September 2003.
    [37] A. Roebel and X. Rodet, “Real time signal transposition with envelope preservation in the phase vocoder,” in International Computer Music Conference, (Barcelona, Spain), pp. 672– 675, September 2005.
    [38] A. Roebel, “Shape-invariant speech transformation with the phase vocoder,” in Inter Speech, (Makuhari, Japan), pp. 2146–2149, September 2010.
    [39] J. Laroche and M. Dolson, “Improved phase vocoder time-scale modification of audio,” in IEEE Transactions on Speech and Audio Processing, vol. 7, no. 3, pp. 323–332, 1999.
    [40] J. Laroche, & M. Dolson, “Phase-vocoder: About this phasiness business,” in Applications of Signal Processing to Audio and Acoustics, IEEE ASSP Workshop on. IEEE, October, 1997.
    [41] http://www.music-ir.org/mirex/wiki/2016:Audio_K-POP_Mood_Classification
    [42] ITU, R. S. Recommendation BS. 1534-2: Method for the subjective assessment of intermediate quality level of audio systems. 2014.
    [43] O. Lartillot and P.Toiviainen,“A matlab toolbox for musical feature extraction from audio,” in Proc. of the 10th International Conference on Digital Audio Effects, 2007.

    下載圖示 校內:2021-01-01公開
    校外:2021-01-01公開
    QR CODE