簡易檢索 / 詳目顯示

研究生: 王添明
Wang, Tien-Ming
論文名稱: 以樂譜驅動複數矩陣分解法之遞迴式對譜系統
Note-based Recursive Alignment System Using Score-Driven Complex Matrix Factorization
指導教授: 蘇文鈺
Su, Wen-Yu
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2014
畢業學年度: 103
語文別: 英文
論文頁數: 87
中文關鍵詞: 複數矩陣因式分解法動態時間校正非負矩陣因式分解法鋼琴滾筒特徵樂譜校正
外文關鍵詞: Complex Matrix Factorization, Dynamic Time Warping, Nonnegative Matrix Factorization, Piano-roll Feature, Score Alignment
相關次數: 點閱:133下載:6
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文旨在討論對譜演算法,它可適當地將樂曲與其樂譜對齊,傳統的演算法對於非同步出現的音符,但樂譜中卻需要同時出現的情況下的處理較顯困難。在本論文中,作者建構一個對譜系統以處理以上的情況,並將該系統分為兩個部件:編譜與分離。首先,作者利用一個稱作「鋼琴滾筒」的特徵提出一套以音符為基礎的對譜演算法作為編譜部件。基於動態時間校正演算法上,作者提出一個以音高為導向的對譜演算法,透過鋼琴滾筒特徵的每一列各別去對不同的音高進行樂譜對齊的動作。其次,為了精確的將每個音符編入樂譜中,本論文採用一個稱為「樂譜驅動複數矩陣分解法」的樂音分離演算法作為分離部件。作者改良其方法,透過樂譜資訊,提出了一個受限的複數矩陣分解法,該方法可用於將樂曲分成數個單一音符的音檔,此法並將作法作者所提出的系統的分離部件。此外,作者觀察到,該系統的編譜與分離部件所產出的結果彼此可成為對方的先驗知識。如此一來,透過執行兩種分析部件並同時提高作品的性能,該發現引導作者進一步提出一個迭代方法。本論文同時將會展示這些方法如何應用到單通道信號分離/編譜上,並與現行方法進行比較。

    This dissertation presents a discussion on the task of score alignment, which properly aligns an audio recording with its corresponding score. Conventional methods have difficulty in performing this task because of asynchrony in the recording of simultaneous notes in the score. We approach this target by contributing an alignment system in two manners: transcription and separation. Firstly, we propose a note-based score alignment employing the pitch-by-time feature, some called it the piano-roll feature, which presents the processing of converting audio spectrogram to a piano-roll-like feature. Based on the dynamic time warping algorithm, we propose a pitch-wise alignment algorithm considering every single pitch sequence (i.e. the row of piano roll) using such a feature. Secondly, to transcribe each musical note precisely, a musical sound source separation algorithm called the score-driven complex matrix factorization (CMF) is adopted in this dissertation. We propose a constrained CMF method with the score information, which can be used to separate a musical piece into notes for the separation part of the proposed system. Furthermore, we observe that transcription and separation parts of the system give a priori knowledge to each other. Such findings lead to the proposed iterative approach by performing the two analysis jobs alternatively to improve the qualities of both works. We also show how these methods can be applied to single-channel source separation/transcription and compare them with the current state-of-the-art methods.

    LIST OF TABLES 9 LIST OF FIGURES 10 CHAPTER 1 INTRODUCTION 12 1.1 MOTIVATION 13 1.2 CONTRIBUTION 14 1.3 OUTLINE 14 CHAPTER 2 RELATED WORKS 15 2.1 MUSICAL TRANSCRIPTION 15 2.1.1 The Chroma Representation and Other Features 15 2.1.2 Dynamic Time Warping 23 2.2 MUSICAL SOURCE SEPARATION 28 2.2.1 Nonnegative Matrix Factorization 28 2.2.2 Probabilistic Latent Component Analysis 32 2.2.3 Complex Matrix Factorization 36 CHAPTER 3 FACTORS FOR IMPROVING MUSIC INFORMATION RETRIEVAL 42 3.1 CHROMAGRAM 43 3.2 CONSTRAINED NMF 44 3.3 CONSTRAINED CMF 49 CHAPTER 4 SCORE-DRIVEN RECURSIVE ALIGNMENT SYSTEM 52 4.1 PIANO-ROLL FEATURE 52 4.2 SYSTEM FLOW 54 4.3 PREPROCESSING 55 4.4 SEPARATION 57 4.5 PITCH-WISE ALIGNMENT 61 CHAPTER 5 RESULTS AND DISCUSSIONS 63 5.1 OVERVIEW 63 5.2 CLIPS 63 5.3 ALGORITHMS AND PARAMETERS 64 5.4 EVALUATION METRICES 65 5.5 RESULTS 66 5.5.1 Separation using NMF and CMF 66 5.5.2 Onset error and accuracy 75 5.5.3 Iterations 76 CHAPTER 6 SUMMARY 82 REFERENCES 83

    [1] J. S. Downie, "Music information retrieval," Annual review of information science and technology, vol. 37, pp. 295-340, 2003.
    [2] MIREX. (May 28th). Music Information Retrieval Evaluation eXchange (MIREX.). Available: http://www.music-ir.org/mirex/wiki/MIREX_HOME
    [3] N. Hu, R. B. Dannenberg, and G. Tzanetakis, "Polyphonic audio matching and alignment for music retrieval," Computer Science Department, p. 521, 2003.
    [4] M. A. Bartsch and G. H. Wakefield, "To catch a chorus: Using chroma-based representations for audio thumbnailing," presented at the IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics, 2001.
    [5] D. P. W. Ellis and G. E. Poliner, "Identifyingcover songs' with chroma features and dynamic programming beat tracking," presented at the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2007.
    [6] S. Ewert, M. Muller, and P. Grosche, "High resolution audio synchronization using chroma onset features," presented at the International Conference on Acoustics Speech and Signal Processing (ICASSP), 2009.
    [7] C. Joder, S. Essid, and G. Richard, "A comparative study of tonal acoustic features for a symbolic level music-to-score alignment," presented at the International Conference on Acoustics Speech and Signal Processing (ICASSP), 2010.
    [8] N. Orio and D. Schwarz, "Alignment of monophonic and polyphonic music to a score," in Proceedings of the International Computer Music Conference, 2001, pp. 155-158.
    [9] L. Rabiner and B. H. Juang, "Fundamentals of speech processing," ed: Englewood Cliffs, Prentice Hall, 1993.
    [10] B. Niedermayer and G. Widmer, "A multi-pass algorithm for accurate audio-to-score alignment," presented at the Proceedings of the 11th International Society for Music Information Retrieval Conference, ISMIR, Utrecht, Netherlands, 2010.
    [11] P. Smaragdis and J. Brown, "Non-negative matrix factorization for polyphonic music transcription," presented at the Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, 2003.
    [12] R. N. Shepard, "Circularity in judgments of relative pitch," The Journal of the Acoustical Society of America, vol. 36, pp. 2346-2353, 1964.
    [13] M. A. Bartsch and G. H. Wakefield, "Audio thumbnailing of popular music using chroma-based representations," Multimedia, IEEE Transactions on, vol. 7, pp. 96-104, 2005.
    [14] B. Niedermayer and G. Widmer, "A Multi-pass Algorithm for Accurate Audio-to-Score Alignment," in ISMIR, 2010, pp. 417-422.
    [15] H. Sakoe and S. Chiba, "Dynamic programming algorithm optimization for spoken word recognition," Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 26, pp. 43-49, 1978.
    [16] S. Salvador and P. Chan, "Toward accurate dynamic time warping in linear time and space," Intelligent Data Analysis, vol. 11, pp. 561-580, 2007.
    [17] E. J. Keogh and M. J. Pazzani, "Derivative Dynamic Time Warping," in SDM, 2001, pp. 5-7.
    [18] E. J. Keogh and M. J. Pazzani, "Derivative dynamic time warping," in In First SIAM International Conference on Data Mining, Chicago, Illinois., 2001.
    [19] P. Paatero and U. Tapper, "Positive matrix factorization: A non‐negative factor model with optimal utilization of error estimates of data values," Environmetrics, vol. 5, pp. 111-126, 1994.
    [20] D. D. Lee and H. S. Seung, "Learning the parts of objects by non-negative matrix factorization," Nature, vol. 401, pp. 788-791, 1999.
    [21] D. Guillamet, M. Bressan, and J. Vitria, "A weighted non-negative matrix factorization for local representations," in Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, 2001, pp. I-942-I-947 vol. 1.
    [22] N. Ho, P. Van Dooren, and V. Blondel, "Weighted nonnegative matrix factorization and face feature extraction," submitted to Image and Vision Computing, 2007.
    [23] S. Z. Li, X. Hou, H. Zhang, and Q. Cheng, "Learning spatially localized, parts-based representation," in Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, 2001, pp. I-207-I-212 vol. 1.
    [24] F. Shahnaz, M. W. Berry, V. P. Pauca, and R. J. Plemmons, "Document clustering using nonnegative matrix factorization," Information Processing & Management, vol. 42, pp. 373-386, 2006.
    [25] W. Xu, X. Liu, and Y. Gong, "Document clustering based on non-negative matrix factorization," in Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, 2003, pp. 267-273.
    [26] M. Cooper and J. Foote, "Summarizing video using non-negative similarity matrix factorization," in Multimedia Signal Processing, 2002 IEEE Workshop on, 2002, pp. 25-28.
    [27] Y. Gao and G. Church, "Improving molecular cancer class discovery through sparse non-negative matrix factorization," Bioinformatics, vol. 21, pp. 3970-3975, 2005.
    [28] H.-T. Gao, T.-H. Li, K. Chen, W.-G. Li, and X. Bi, "Overlapping spectra resolution using non-negative matrix factorization," Talanta, vol. 66, pp. 65-73, 2005.
    [29] C.-J. Lin, "Projected gradient methods for nonnegative matrix factorization," Neural computation, vol. 19, pp. 2756-2779, 2007.
    [30] M. W. Berry, M. Browne, A. N. Langville, V. P. Pauca, and R. J. Plemmons, "Algorithms and applications for approximate nonnegative matrix factorization," Computational Statistics & Data Analysis, vol. 52, pp. 155-173, 2007.
    [31] M. Chu, F. Diele, R. Plemmons, and S. Ragni, "Optimality, computation, and interpretation of nonnegative matrix factorizations," in SIAM Journal on Matrix Analysis, 2004.
    [32] J. E. Jackson, A user's guide to principal components vol. 587: John Wiley & Sons, 2005.
    [33] L. De Lathauwer, B. De Moor, and J. Vandewalle, "A multilinear singular value decomposition," SIAM journal on Matrix Analysis and Applications, vol. 21, pp. 1253-1278, 2000.
    [34] T. Hofmann, "Unsupervised learning by probabilistic latent semantic analysis," Machine learning, vol. 42, pp. 177-196, 2001.
    [35] P. Smaragdis, B. Raj, and M. Shashanka, "A probabilistic latent variable model for acoustic modeling," Advances in models for acoustic processing, NIPS, vol. 148, 2006.
    [36] M. Shashanka, "Latent variable framework for modeling and separating single-channel acoustic sources," BOSTON UNIVERSITY Boston, 2007.
    [37] P. Smaragdis, B. Raj, and M. Shashanka, "Sparse and shift-invariant feature extraction from non-negative data," presented at the Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Las Vegas, Nevada, USA, 2008.
    [38] E. Gaussier and C. Goutte, "Relation between PLSA and NMF and implications," in Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, 2005, pp. 601-602.
    [39] M. D. Hoffman, "Poisson-uniform nonnegative matrix factorization," in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, 2012, pp. 5361-5364.
    [40] J. Le Roux, H. Kameoka, N. Ono, and S. Sagayama, "On the interpretation of I-divergence-based distribution-fitting as a maximumlikelihood estimation problem," The University of Tokyo, Tech. Rep. METR, vol. 11, 2008.
    [41] P. Smaragdis, "Convolutive speech bases and their application to supervised speech separation," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 15, pp. 1-12, 2007.
    [42] H. Kameoka, N. Ono, K. Kashino, and S. Sagayama, "Complex NMF: A new sparse representation for acoustic signals," in Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on, 2009, pp. 3437-3440.
    [43] C. Févotte, "Majorization-minimization algorithm for smooth Itakura-Saito nonnegative matrix factorization," in Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, 2011, pp. 1980-1983.
    [44] K. Jensen, "Envelope model of isolated musical sounds," in Proceedings of the 2nd COST G-6 Workshop on Digital Audio Effects (DAFx99), 1999.
    [45] J. J. Burred, A. Robel, and T. Sikora, "Dynamic spectral envelope modeling for timbre analysis of musical instrument sounds," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 18, pp. 663-674, 2010.
    [46] K. Jensen, "Timbre models of musical sounds," Department of Computer Science, University of Copenhagen, 1999.
    [47] N. H. Fletcher and T. D. Rossing, The physics of musical instruments: Springer, 1998.
    [48] P. O. Hoyer, "Non-negative matrix factorization with sparseness constraints," The Journal of Machine Learning Research, vol. 5, pp. 1457-1469, 2004.
    [49] P. O. Hoyer, "Non-negative sparse coding," in Neural Networks for Signal Processing, 2002. Proceedings of the 2002 12th IEEE Workshop on, 2002, pp. 557-565.
    [50] M. Schmidt and R. Olsson, "Single-channel speech separation using sparse non-negative matrix factorization," 2006.
    [51] Z. Chen and A. Cichocki, "Nonnegative matrix factorization with temporal smoothness and/or spatial decorrelation constraints," Laboratory for Advanced Brain Signal Processing, RIKEN, Tech. Rep, vol. 68, 2005.
    [52] Z. Chen, A. Cichocki, and T. M. Rutkowski, "Constrained non-negative matrix factorization method for EEG analysis in early detection of Alzheimer disease," in Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on, 2006, pp. V-V.
    [53] T. Virtanen, "Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 15, pp. 1066-1074, 2007.
    [54] N. Bertin, R. Badeau, and E. Vincent, "Enforcing harmonicity and smoothness in Bayesian non-negative matrix factorization applied to polyphonic music transcription," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 18, pp. 538-549, 2010.
    [55] E. Vincent, N. Berlin, and R. Badeau, "Harmonic and inharmonic nonnegative matrix factorization for polyphonic pitch transcription," presented at the Proc. of International Conference on Acoustics, Speech and Signal Processing, Las Vegas, Nevada, USA, 2008.
    [56] D. Lee and H. Seung, "Algorithms for Non-negative Matrix Factorization," Advances in Neural Information Processing Systems (NIPS), vol. 13, pp. 556-562, 2001.
    [57] M. Goto, H. Hashiguchi, T. Nishimura, and R. Oka, "RWC music database: Music genre database and musical instrument sound database," presented at the Proc. 4th International Conference on Music Information Retrieval (ISMIR), 2003.
    [58] B. Moore, R. Peters, and B. Glasberg, "Thresholds for the detection of inharmonicity in complex tones," The Journal of the Acoustical Society of America, vol. 77, p. 1861, 1985.
    [59] T. M. Wang, Y.L. Chen, W.H. Liao, and A. Su, "Analysis and Trans-Synthesis of Acoustic Bowed-String Instrument Recordings–A Case Study Using Bach Cello Suites," presented at the International Conference on Digital Audio Effects (Dafx), IRCAM, Paris, French, 2011.
    [60] E. J. Keogh and M. J. Pazzani, "Derivative dynamic time warping," presented at the First SIAM International Conference on Data Mining, 2001.
    [61] M. Every and J. Szymanski, "A spectral-filtering approach to music signal separation," in Proc. DAFx, 2004, pp. 197-200.
    [62] T. Virtanen, "Algorithm for the separation of harmonic sounds with time-frequency smoothness constraint," in Proc. Int. Conf. on Digital Audio Effects (DAFx), 2003, pp. 35-40.
    [63] H. Viste and G. Evangelista, "A method for separation of overlapping partials based on similarity of temporal envelopes in multichannel mixtures," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 14, pp. 1051-1061, 2006.
    [64] C. Yeh, A. Roebel, and X. Rodet, "Multiple fundamental frequency estimation and polyphony inference of polyphonic music signals," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 18, pp. 1116-1126, 2010.

    下載圖示 校內:2018-01-07公開
    校外:2018-01-07公開
    QR CODE