| 研究生: |
薛佳綾 Syue, Jia-Ling |
|---|---|
| 論文名稱: |
基於小提琴表情錄音的精確音樂訊號與樂譜比對 Accurate Audio-to-Score Alignment for Expressive Violin Recordings |
| 指導教授: |
蘇文鈺
Su, Wen-Yu |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2017 |
| 畢業學年度: | 105 |
| 語文別: | 英文 |
| 論文頁數: | 46 |
| 中文關鍵詞: | 音樂訊號與樂譜對齊 、起頭音時間 、音樂表情 、動態時間扭曲 |
| 外文關鍵詞: | Audio-to-Score Alignment, Onset, expressive musical term, DTW |
| 相關次數: | 點閱:108 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在音樂分析的領域中,音樂訊號與樂譜對齊一直是一個受到關注的研究課題,其適用於各種不同的演奏風格與技巧中,主要目的是將音樂錄音對應到樂譜上,透過比對結果得到各個音符的起頭音時間與音符長度等資訊。在本實驗室先前的小提琴表情分析研究中,我們發現音樂表情的判斷可以透過分析演繹方式的要素,包含時間長度、能量和抖音等特徵,而其中準確的音符長度資訊能讓我們得到正確的音樂表情分析結果,因此透過人工標記的方式來得到準確的音符時間長度,但此方法會花費許多人力資源,為了節省人力上的浪費,本篇論文針對小提琴的表情音樂,希望透過音樂訊號與樂譜對齊來得到高準確度的音符起頭與結尾時間,這是作為自動音樂表情分析研究關鍵的一步。
針對小提琴的音樂訊號與樂譜對齊,我們會碰到比對障礙包含靜音、重疊音符以及音符連續相同音高等會隨著表情而產生變化的特徵,在這篇論文中,我們提出了一個完整的比對系統來解決小提琴比對時會遇到的問題。首先我們比較一般音樂訊號與樂譜對齊系統會使用到的基本參數,如特徵向量、動態時間扭曲的成本函數,並根據能量比及音色找出一組最佳的因素設定。接著為了得到更準確的起頭音時間,我們針對弓弦樂器的表情音樂提出改善步驟,包括建立背景雜音模型、靜音檢測、模擬音符延音以及音符長度估測。若是小提琴的錄音有背景伴奏,我們會增加額外的步驟來降低伴奏對比對的影響,包括將歌曲分解成更小的片段及刪除受伴奏影響的頻帶。
在我們的實驗中,小提琴錄音資料庫包含10種表情詮釋,各個表情會根據不同的比對設定做比較與討論,同時也針對兩位偉大小提琴家歐伊斯特拉夫與海飛茲的五個錄音片段進行比對,結果說明我們提出的比對系統明顯大幅地改善了傳統基於動態時間扭曲方法的準確度。由於有了更精確的音符起頭音時間,未來可以針對更困難的音符結尾時間進行挑戰,這些研究成果將大大有助於減少音樂資訊標記的時間,也讓我們對自動音樂表情分析的結果更具信心。
Audio-to-score alignment is a continuously interested research topic in the field of music analysis, which adaptive to various playing styles and techniques. It is mainly to correspond score to music recording. Hence, the note information is obtained, such as onset and duration. In our previous research of the violin expression analysis, we find that the expression can be judged by analyzing interpretational factors, including duration, energy, vibrato and so on. Marking the exact duration of the note information allows us to obtain the more correct results of music expression analysis. Hence, we get the exact onsets and offsets through manual marking. But this method will waste a lot of human resources. In this thesis, we focus on the expressive music of violin and hoping to obtain high accuracy of onsets and offsets through audio-to-score alignment. It is the key step toward advanced research on automatic music expression analysis.
To audio-to-score alignment of violin, we may meet technical barriers include the processing of silence, overlapped notes, as well as consecutive note sequences with the same pitch. Most of these characteristics vary with expressions. In this thesis, the audio-to-score alignment problem of expressive violin performance is addressed. We first compare the choice of basic parameters considered in conventional audio-to-score alignment systems, such as features, cost functions of dynamic time warping (DTW), and find the optimal setting under various factors such as energy ratios and timbres. Then, to better capture the onset in expressive performance of bowed-string instruments, we propose additional steps including modeling the behavior of background noise, silence detection, simulation of overlapped sustain notes in the reference signal, and note length estimation. And if the recording has background accompaniment, we will add extra processing to reduce the effect of accompaniment for alignment, including splitting the song into smaller segments and cutting the frequency bands affected by the accompaniment.
In our experiment, a dataset of expressive violin recordings in which each piece is played with various expressive musical terms, which contain 10 types of expression, is used. Different settings on different expressions are compared and discussed. We also present the alignment result of 5 recording fragments which are performed by two outstanding violinists, David Oistrakh and Jascha Heifetz, respectively. Results show that the proposed method notably improves the conventional DTW-based alignment methods. Because of the more precise onsets, our next target will be the detection of offsets, which is fuzzier than onset. These researches will greatly help us reduce the time of marking note information and hence obtain the more reliable results of automatic music expression analysis.
[1] A. Arzt, G. Widmer, and S. Dixon. Automatic Page Turning for Musicians via Real-Time Machine Listening. In ECAI, pages 241-245, 2008.
[2] M. A. Bartsch, and G. H. Wakefield. Audio thumbnailing of popular music using chroma-based representations. IEEE Transactions on multimedia, pages 96-104, 2005.
[3] A. Cont. Realtime audio to score alignment for polyphonic music instruments, using sparse non-negative constraints and hierarchical HMMs. In ICASSP, pages 245–248, 2006.
[4] B. Catteau, J. P. Martens, and M. Leman. A probabilistic framework for audio-based tonal key and chord recognition. Advances in Data Analysis, pages 637–644, 2007.
[5] J. J. Carabias-Orti, F. J. Rodríguez-Serrano, P. Vera-Candeas, N. Ruiz-Reyes and F. J. Cañadas-Quesada. An Audio to Score Alignment Framework Using Spectral Factorization and Dynamic Time Warping. In ISMIR, pages 742-748, 2015.
[6] R. B. Dannenberg. An on-line algorithm for real-time accompaniment. In ICMC, pages 193-198, 1984.
[7] S. Dixon. Live tracking of musical performances using on-line time warping. In DAFx, pages 92–97, 2005.
[8] R. B. Dannenberg and C. Raphael. Music score alignment and computer accompaniment. Communications of the ACM, pages 38-43, 2006.
[9] Z. Duan and B. Pardo. A state space model for online polyphonic audio-score alignment. In ICASSP, pages 197–200, 2011.
[10] C. Joder, S. Essid, and G. Richard. Optimizing the mapping from a symbolic to an audio representation for music-to-score alignment. In WASPAA, pages 121-124, 2011.
[11] C. Joder and B. Schuller. Off-line refinement of audio-to-score alignment by observation template adaptation. In ICASSP, pages 206–210, 2013.
[12] C. Joder, S Essid, and G. Richard. Learning optimal features for polyphonic audio-to-score alignment. IEEE Transactions on Audio, Speech, and Language Processing, pages 2118-2128, 2013.
[13] P.-C. Li, L. Su, Y.-H. Yang, and A. W. Y. Su. Analysis of expressive musical terms in violin using score informed and expression-based audio features. In ISMIR, pages 809–815, 2015.
[14] R. Macrae and S. Dixon. Accurate Real-time Windowed Time Warping. In ISMIR, pages 423-428, 2010.
[15] B. Niedermayer and G. Widmer. A multi-pass algorithm for accurate audio-to-score alignment. In ISMIR, pages 417–422, 2010.
[16] N. Orio and F. D"e" ́chelle. Score following using spectral analysis and hidden markov models. In ICMC, pages 151–154, 2001.
[17] C. Raphael. A Bayesian network for real-time musical accompaniment. Advances in Neural Information Processing Systems, pages 1433-1439, 2002.
[18] C. Raphael. A hybrid graphical model for aligning polyphonic audio with musical scores. In ISMIR, pages 387–394, 2004.
[19] C. Raphael. Aligning music audio with symbolic scores using a hybrid graphical model. Machine learning, pages 389–409, 2006.
[20] C. Raffel and D. P. W. Ellis. Optimizing DTW-based audio-to-midi alignment and matching. In ICASSP, pages 81–85, 2016.
[21] K. Suzuki, Y. Ueda, S. A. Raczyński, N. Ono and S. Sagayama. Real-time audio to score alignment using locally-constrained dynamic time warping of chromagrams. In MIREX, 2011.
[22] T. Virtanen. Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE transactions on audio, speech, and language processing, pages 1066-1074, 2007.
[23] T.-M. Wang, P.-Y. Tsai, and A. W. Y. Su. Note-based alignment using score-driven non-negative matrix factorization for audio recordings. IET Signal Processing, pages 1–9, 2014.
[24] S. Wang, S. Ewert, and S. Dixon. Robust and efficient joint alignment of multiple musical performances. IEEE/ACM Transactions on Audio, Speech, and Language Processing, pages 2132–2145, 2016.