簡易檢索 / 詳目顯示

研究生: 蔡佩听
Tsai, Pei-Yin
論文名稱: 基於音符之音樂訊號對齊-使用樂譜引導之非負矩陣分解
Note-based Alignment Using Score-driven Non-negative Matrix Factorization
指導教授: 蘇文鈺
Su, Wen-Yu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2011
畢業學年度: 99
語文別: 英文
論文頁數: 66
中文關鍵詞: 音樂訊號對齊非負矩陣分解
外文關鍵詞: music transcription, score alignment, non-negative matrix factorization, chroma, piano-roll, dynamic time warping
相關次數: 點閱:109下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著電腦計算能力的增加,複音音樂分析是近年來越來越受到關注的課題。此篇論文著重在音樂訊號與譜對齊之研究,目的是利用譜的資訊和對齊後的結果,得到音樂訊號中屬於各個音符的資訊,例如音高、起頭音時間、音色、與音符的長度。然而現今幾乎所有的音樂訊號與譜對齊之演算法都沒有辦法處理一個議題:譜上是同時多音一起演奏,但實際音樂錄音裡面卻是不同時間點的情況。
    在這篇論文中,我們提出了一個基於音符之音樂訊號對齊的演算法,此對齊是基於一個橫軸為時間縱軸為音高的特徵格式,通常被稱為鋼琴捲。我們的主要貢獻之一是提供了一個將音樂錄音訊號轉換為類似傳統鋼琴捲的方法與流程。此轉換方法的核心是樂譜引導之非負矩陣分解,它可將頻譜分解成一系列和諧音結構與其能量的組成,因此我們不但能取得某個音高在每個時間點的能量,還能得到音色的資訊。貢獻其二是我們提出了一個基於音高的對齊方法,可將音高能量序列分別地對齊,使得不同音高間的音符有屬於自己對齊結果。
    在我們的實驗中,對於MAPS資料庫,我們提出的方法可以達到使88%的音符的偵測起頭音時間與資料庫所提供的標記時間之間的誤差小於50毫秒,此資料庫的錄音檔案是由自動鋼琴彈奏的。此外,對於使用人工標記的著名音樂家所演奏的樂曲錄音來做實驗,誤差小於50毫秒的音符總數大約是佔總數的70%。整體來說,我們在兩種形式的資料庫的效果都比前人提出來的演算法有較好的結果。由於有了更精確的起頭音時間與音色資訊,更進一步地分析不同音樂家的演奏手法與音訊來源分解課題是可以期待的。

    Polyphonic music transcription is an increasingly interesting topic in recent years. This thesis focuses on the task of score alignment, which aims at aligning audio recordings and its corresponding score. Hence, the note information is obtained, such as pitch, onset, and duration. However, most of alignment methods cannot deal the issue: the asynchrony in recordings of simultaneous multiple notes in score.
    In this thesis, we proposed a note-based alignment by means of aligning with pitch-by-time format, which is so-called piano-roll feature. One of our main contributions is that we developed an approach of converting audio spectrogram to piano-roll like feature. The score-driven non-negative matrix factorization (NMF) plays an important role in the transformation, which provides both intensity of each harmonic structure and timbre information. On the other hand, the pitch-wise alignment is proposed, considering each pitch sequence, i.e. the row of piano-roll, separately.
    For evaluation, about 88% of notes have their onsets deviated from ground truth for less than 50ms in MAPS database which is played by a digital player piano. For the database containing pieces performed by famous musicians is 70% accuracy in average. On the whole, the proposed method performs better than previous methods. As a result, interpretation analysis and source separation are expected with the precise onset and timbre information.

    中文摘要 III Abstract IV 誌謝 V List of Tables VII List of Figures VIII Chapter 1 Introduction 1 1.1 Motivation 3 1.2 Related Work 4 1.3 This Work 5 Chapter 2 Background 6 2.1 Chroma feature 6 2.2 Dynamic Time Warping 12 2.3 Peak Structure Distance 16 2.4 Non-negative Matrix Factorization 19 Chapter 3 Method and System of Note-Based Alignment 22 3.1 Piano-roll Feature 22 3.2 System Flow 24 3.3 Zero-pass : Uniform Music Segmentation 26 3.4 First-pass : Separation 30 3.5 Second-pass : Pitch-wise alignment 35 Chapter 4 Evaluation 40 4.1 MAPS Database 42 4.2 SCREAM Music Annotation Project (SMAP) 46 4.3 Comparison with previous methods 49 4.4 Discussion 52 Chapter 5 Conclusion and Future Work 56 5.1 Conclusion 56 5.2 Future Work 57 Reference 59 Appendix 62

    [1] J. S. Downie, "Music information retrieval," Annual review of information science and technology, vol. 37, pp. 295-340, 2003.
    [2] R. B. Dannenberg and C. Raphael, "Music score alignment and computer accompaniment," Commun. ACM, vol. 49, pp. 38-43, 2006.
    [3] V. Emiya., et al., "Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle," IEEE Transactions on Audio, Speech and Language Processing, 2010.
    [4] M. Vaidyanathan. Midi Sheet Music Player. Available: http://midisheetmusic.sourceforge.net/
    [5] K. Schutte. MIDI toolbox for matlab. Available: http://www.kenschutte.com/midi
    [6] C. Raphael, "A hybrid graphical model for aligning polyphonic audio with musical scores," Proc. ISMIR, Barcelona, Spain, 2004.
    [7] N. Orio and F. Dechelle, "Score following using spectral analysis and hidden Markov models," 2001, pp. 151-154.
    [8] C. Raphael, "Music plus one: A system for flexible and expressive musical accompaniment," in Proceedings of the International Computer Music Conference (ICMC), La Havana, Cuba, 2001.
    [9] H. Ning, et al., "Polyphonic audio matching and alignment for music retrieval," in Applications of Signal Processing to Audio and Acoustics, IEEE Workshop., 2003, pp. 185-188.
    [10] B. Niedermayer and G. Widmer, "A multi-pass algorithm for accurate audio-to-score alignment," in Proc. ISMIR, Utrecht, Netherlands, 2010.
    [11] M. A. Bartsch and G. H. Wakefield, "To catch a chorus: using chroma-based representations for audio thumbnailing," in Applications of Signal Processing to Audio and Acoustics, 2001 IEEE Workshop on the, 2001, pp. 15-18.
    [12] D. P. W. Ellis and G. E. Poliner, "Identifying `Cover Songs' with Chroma Features and Dynamic Programming Beat Tracking," in Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on, 2007, pp. IV-1429-IV-1432.
    [13] N. Orio and D. Schwarz, "Alignment of monophonic and polyphonic music to a score," in Proceedings of the ICMC 2001, Havana, Cuba, 2001, pp. 155-158.
    [14] F. Soulez, et al., "Improving polyphonic and poly-instrumental music to score alignment," Proc. ISMIR, Baltimore, USA, pp. 143-148, 2003.
    [15] L. Rabiner and B. H. Juang, Fundamentals of speech recognition: Prentice-Hall, Inc., 1993.
    [16] E. J. Keogh and M. J. Pazzani, "Derivative dynamic time warping," in In First SIAM International Conference on Data Mining, Chicago, Illinois., 2001.
    [17] S. Salvador and P. Chan, "Toward accurate dynamic time warping in linear time and space," Intelligent Data Analysis, vol. 11, pp. 561-580, 2007.
    [18] M. Muller, et al., "An efficient multiscale approach to audio synchronization," in Proc. ISMIR, Victoria, Canada, 2006, pp. 192-197.
    [19] P. Smaragdis and J. C. Brown, "Non-negative matrix factorization for polyphonic music transcription," in Applications of Signal Processing to Audio and Acoustics, 2003 IEEE Workshop on., 2003, pp. 177-180.
    [20] R. N. Shepard, "Circularity in judgments of relative pitch," Journal of Acoustical Society of America, vol. 36, pp. 2346-2353, 1964.
    [21] M. A. Bartsch and G. H. Wakefield, "Audio thumbnailing of popular music using chroma-based representations," Multimedia, IEEE Transactions on, vol. 7, pp. 96-104, 2005.
    [22] H. Sakoe and S. Chiba, "Dynamic programming algorithm optimization for spoken word recognition," Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 26, pp. 43-49, 1978.
    [23] D. D. Lee and H. S. Seung, "Learning the parts of objects by non-negative matrix factorization," Nature, vol. 401, pp. 788-791, 1999.
    [24] D. D. Lee and H. S. Seung, "Algorithms for non-negative matrix factorization," Advances in neural information processing systems, vol. 13, 2001.
    [25] E. Battenberg and D. Wessel, "Accelerating nonnegative matrix factorization for audio source separation on multi-core and many-core architectures," in 10th International Society for Music Information Retrieval Conference, Kobe, Japan, 2009.
    [26] V. Tuomas, "Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 15, pp. 1066-1074, 2007.
    [27] Piano roll. Available: http://en.wikipedia.org/wiki/Piano_roll
    [28] Y. L. Chen, et al., "Analysis and Trans-Synthesis of Acoustic Bowed-String Instrument Recordings - A Case Study Using Bach Cello Suites," in Proc. of the 14th Int. Conference on Digital Audio Effects (DAFx-11), Paris, French, 2011.
    [29] E. Vincent, et al., "Harmonic and inharmonic Nonnegative Matrix Factorization for Polyphonic Pitch transcription," in Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on, 2008, pp. 109-112.
    [30] SCREAM Manual Annotation Project (SMAP). Available: http://scream.csie.ncku.edu.tw/index.php/research/asp/smap
    [31] J. S. Bach, "Bach: Sonatas & Partitas, BWV 1001-1006," ed: Deutsche Harm Mundi, 1990.
    [32] J. S. Bach, "Bach: 6 Suites for Cello Solo BWV1007-1012," ed: Archiv Produktion, 1997.
    [33] W. A. Mozart, "The Gulda Mozart Tapes: 10 Sonatas and a Fantasy," ed: DG Deutsche Grammophon, 2006.

    下載圖示 校內:2016-09-02公開
    校外:2016-09-02公開
    QR CODE