| 研究生: |
林宗翰 Lin, Tsung-Han |
|---|---|
| 論文名稱: |
具群組稀疏性之貝氏非負矩陣分解應用於音樂訊號分離 Bayesian NMF with Group Sparsity and Its Application for Music Source Separation |
| 指導教授: |
簡仁宗
Chien, Jen-Tzung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2011 |
| 畢業學年度: | 99 |
| 語文別: | 中文 |
| 論文頁數: | 152 |
| 中文關鍵詞: | 貝氏 、非負矩陣分解 、群組稀疏性 、音樂訊號分離 |
| 外文關鍵詞: | Bayesian, NMF, Group Sparsity, Music Source Separation |
| 相關次數: | 點閱:102 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
非負矩陣分解(Non-negative Matrix Factorization, NMF)演算法目前已被廣泛地發展並運用在許多實用之多媒體系統,這套演算法是基於部分表示法(Parts Representation)或是加入稀疏度限制的特性來表示或分解資料。然而,傳統非負矩陣分解缺乏統計模型分析及模型稀疏度詮釋,造成其延展性及模型規則化(Model Regularization)不足的問題並且凸顯出如何控制矩陣分解中稀疏度大小的重要性。本研究提出以貝氏(Bayesian)理論為基礎的群組稀疏性的非負矩陣分解並應用於音樂訊號分離,分解出具韻律性(Rhythmic)及具諧波性(Harmonic)的音訊來源,我們的作法是將音樂訊號轉換到對數強度頻譜(Logarithmic Magnitude Spectrum)形成非負矩陣,基於貝氏理論的基礎,透過拉普拉斯比例混合機率分佈(Laplacian Scale Mixture Distribution)為主的稀疏事前機率分佈(Sparse Prior Distribution)架構出能分解矩陣中基底向量間的關聯性並達到稀疏表示效果及解決模型過度估測(Over-Estimation)問題,本論文將基底矩陣群組化並劃分為兩個群組,所有音樂訊號皆由一組基底向量(Shared Basis)及另一組獨特性基底向量(Individual Basis)所共同表示,共享性基底架構不同訊號間的共有統計特性,獨特性基底向量是用來補償共享性資料表示(Shared Data Representation)之外的殘差資訊(Residual Information),使用相關性高的共享性基底及獨特性基底來完整表示音樂訊號並完成整套群組稀疏性貝氏非負矩陣分解(Bayesian NMF with Group Sparsity, GS-BNMF),然後再反轉換到時域(Time Domain)得到分解出來時間訊號。在本研究中,我們發展出Gibbs取樣(Gibbs Sampling) 演算法透過近似推論(Approximate Inference)及模型事後機率遞迴式地取樣出收斂後的模型參數並實現出GS-BNMF演算法。最後我們將本論文提出來的方法應用於具節奏性音樂之單一通道訊號分離(Single-Channel Signal Separation),分離出具韻律性之鼓音及其它具諧波性之音源,在不同實驗分析比較中驗證了本方法之有效性。
Non-negative matrix factorization (NMF) has been well developed and applied for many practical multimedia systems. In general, NMF is a kind of parts representation which factorizes a data matrix into product of a basis matrix and a weight matrix. NMF is solved by imposing the sparseness constraint so that the observed signals are robustly represented by a set of basis vectors and its corresponding sensing weights. However, conventional NMF is lack of statistical modeling and interpretation and is difficult to control degrees of sparseness. The extension of considering model regularization is limited. Also, controlling the sparseness in data representation becomes a crucial research topic. In this dissertation, we propose a Bayesian NMF with group sparsity (GS-BNMF) and apply it for music source separation, or specifically separation of single-channel music signal into rhythmic source signal and harmonic source signal. In the beginning, we first transform music signals within a time segment into the corresponding log magnitude spectral signals and establish a non-negative data matrix. Our idea is to comply with Bayesian theory and introduce a Laplacian scale mixture distribution as a sparse prior to construct a GS-BNMF procedure. We fulfill NMF through investigating the relevance of basis vectors for data representation and tackling the over-estimation problem via sparse coding. We build up two groups of basis vectors for representation of music signals. One is the shared basis and the other is the individual basis. The shared basis vectors are estimated to cover the shared statistics of music signals and the individual basis vectors are calculated to compensate the residual information that shared basis vectors could not characterize. Due to incorporation of sparse prior, GS-BNMF identifies the relevant shared basis vectors and individual basis vectors for data representation. The irrelevant basis vectors are not used. After having the factorized matrices, the demixed signals in log magnitude spectrum domain are then converted to the corresponding time signals via inverse Fourier transform. In this study, we develop the approximate inference based on Gibbs sampling and apply it to recursively sample the model parameters according to posterior distributions. The GS-BNMF is implemented accordingly. The experiments on single-channel signal separation of music signal into drum (or rhythmic) signal and harmonic signal show the effectiveness of proposed methods.
[1] H. Attias, “A variational Bayesian framework for graphic model”, Neural Information on Processing Systems (NIPS), vol. 12, pp. 209-215, 2000.
[2] C. M. Bishop, Pattern Recognition and Machine Learning, 2006.
[3] A. T. Cemgil, “Bayesian inference for nonnegative matrix factorisation models”, Technical Report CUED/F-INFENG/TR.609, Cambridge University Engineering Department, 2008.
[4] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit”, Society for Industrial and Applied Mathematics (SIAM), vol. 43, no. 1, pp. 129-159, 2001.
[5] J.-T. Chien and B.-C. Chen, “A new independent component analysis for speech recognition and separation”, IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 4, pp. 1245-1254, 2006.
[6] J.-T. Chien, H.-L. Hsieh and S. Furui, “A new mutual information measure for independent component analysis”, Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 1817-1820, 2008.
[7] A. Cichocki, R. Zdunek, and S. Amari, “New algorithms for non-negative matrix factorization in applications to blind sources separation”, International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 5, pp. 621-624, 2006.
[8] P. Comon, “Independent component analysis, a new concept?” Signal Processing, vol. 36, pp. 287-314, 1994.
[9] N. Dobigeon, S. Moussaoui, J.-Y. Tourneret, and C. Carteret, “Bayesian separation of spectral sources under non-negativity and full additivity constraints”, Signal Processing, vol. 89, no. 12, pp. 2657-2669, 2009.
[10] D. Donoho and Y. Tsaig, “Fast solution of l1-norm minimization problems when the solution may be sparse,” IEEE Transactions on Information Theory, vol. 54, no. 11, pp. 4789–4812, 2008.
[11] Z. Duan, Y. Zhang, C. Zhang and Z. Shi, “Unsupervised single-channel music source separation by average harmonic structure modeling”, IEEE Transactions On Audio Speech And Language Processing, vol. 16, pp. 766-778, 2008.
[12] C. Févotte and S. J. Godsill, “A Bayesian approach for blind separation of sparse sources”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 6, pp. 2174 – 2188, 2006.
[13] D. FitzGerald, M. Cranitch, and E. Coyle, “Shifted nonnegative matrix factorisation for sound source separation”, in IEEE Workshop on Statistical Signal Processing, pp. 1132-1137, 2005.
[14] D. FitzGerald, M. Cranitch, and E. Coyle, “Sound source separation using shifted non-negative tensor factorization”, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 5, 2006.
[15] P. J. Garrigues and B. A. Olshausen, “Group sparse coding with a Laplacian scale mixture prior”, Neural Information on Processing Systems(NIPS), pp. 1-9, 2010.
[16] P. O. Hoyer, “Non-negative matrix factorization with sparseness constraints”, Journal of Machine Learning Research (JMLR), vol. 5, pp. 1457-1469, 2004.
[17] H.-L. Hsieh and J.-T. Chien, “A new nonnegative matrix factorization for independent component analysis”, Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 2026-2029, 2010.
[18] J. Huang, X. Huang and D. Metaxas, “Learning with dynamic group sparsity”, IEEE 12th International Conference on Computer Version, pp. 64-71, 2009.
[19] A. Hyvärinen, “Survey on independent component analysis,” Neural Computing Surveys, vol. 2, pp. 94-128, 1999.
[20] A. Hyvärinen and E. Oja, “Independent component analysis: algorithm and application,” Neural Networks, vol. 13, pp. 411-430, 2001.
[21] A. Hyvärinen, “Fast and robust fixed-point algorithms for independent component analysis,” IEEE Transactions on Neural Network, vol. 10, pp. 626-634, 1999.
[22] S. Ikeda, “Factor analysis preprocessing for ICA,” in Proceedings of the Third International Workshop on Independent Component Analysis and Blind Signal Separation, pp. 29-35, 2001.
[23] M. Kim and S. Choi, “On spectral basis selection for single channel polyphonic music separation” in Proceedings of the International Conference on Artificial Neural Networks (ICANN), vol. 2, pp. 157-162, 2005.
[24] M. Kim, J. Yoo, K. Kang and S. Choi, “Blind rhythmic source separation: nonnegativity and repeatability”, International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 2006-2009, 2010.
[25] K. Kiviluoto and E. Oja, “Independent component analysis for parallel financial time series,” In Proceedings of the International Conference on Neural Information Processing (ICONIP), vol. 2, pp. 895-898, 1998.
[26] A. Klapuri, “Signal processing methods for the automatic transcription of music”, Ph.D. thesis, pp. 952-15, 2004.
[27] D. D. Lee and H. S. Seung, “Algorithm for non-negative matrix factorization”, Neural Information Processing System (NIPS), pp. 556-562, 2000.
[28] H. Lee and S. Choi, “Group nonnegative matrix factorization for EEG classification”, in Proc. Int. Conf. Artificial Intelligence and Statistics (AISTATS), pp. 320-327, 2009.
[29] J. -H. Lee, T. -W. Lee, F. A. Jolesz, and S. -S. Yoo, ”Independent vector analysis (IVA): multivariate approach for fMRI group study”, NeuroImage, vol. 40, pp. 86-109, 2008.
[30] A. Lefévre, F. Bach and C. Févotte, “Itakura-Saito nonnegative matrix factorization with group sparsity”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2011.
[31] W. Liu, N. Zheng, and X. Lu. “Non-negative matrix factorization for visual coding” Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 3, pp. 293-296, 2003.
[32] S. Makeig, A. Bell, T. Jung, and T. Sejnowski, “Independent component analysis of electroencephalographic,” Advances in Neural Information Processing System, vol. 8, Cambridge, MA: MIT Press, pp. 145-151, 1996.
[33] S. Moussaoui, D. Brie, A. Mohammad-Djafari, and C. Carteret. “Separation of non-negative mixture of non-negative sources using a Bayesian approach and MCMC sampling”, IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 4133-4145, 2006.
[34] V. P. Pauca , F. Shahnaz, M. W. Berry and R. J. Plemmons, “Text mining using non-negative matrix factorizations”, Proceedings of SIAM International Conference on Data Mining, vol. 54, pp. 452-456, 2004.
[35] R. Salakhutdinov, and A. Mnih, “Bayesian probabilistic matrix factorization using Markov Chain Monte Carlo”. Proceedings of the International Conference on Machine Learning (ICML), pp. 880-887, 2008.
[36] M. N. Schmidt and R. S. Olsson, “Single-channel speech separation using sparse non-negative matrix factorization”, International Conference on Spoken Language Processing (INTERSPEECH), 2006.
[37] M. N. Schmidt, O. Winther, and L. K. Hansen, “Bayesian non-negative matrix factorization”, Proceeding of International Conference on Independent Component Analysis and Signal Separation, vol. 5441, LNCS, pp. 540-547, 2009.
[38] R. Raina, A. Battle, H. Lee, B. Packer and A. Y. Ng, “Self-taught learning: Transfer learning from unlabeled data”, Proceedings of the Twenty-fourth International Conference on Machine Learning, pp. 759-766, 2007.
[39] S. Senecal and P. -O. Amblard, “Bayesian separation of discrete sources via Gibbs sampling”, Proceeding of International Conference on Independent Component Analysis and Blind Signal Separation, pp. 556-572, 2000.
[40] P. Smaragdis, B. Raj, and M. V. Shashanka, “Supervised and semi-supervised separation of sounds from single-channel mixtures”, Proceeding of International Conference on Independent Component Analysis and Blind Signal Separation, pp. 414-421, 2007.
[41] Y. W. Teh, “A hierarchical Bayesian language model based on Pitman-Yor processes”, Proceeding of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pp. 985–992, 2006.
[42] R. Tibshirani, “Regression shrinkage and selection via the lasso”, Journal of the Royal Statistical Society, Series B , vol. 58, no. 1, pp. 267-288, 1996.
[43] Y. Tsaig and D. L. Donoho, “Extensions of compressed sensing”, Signal Processing, pp. 549-571, 2006.
[44] T. O. Virtanen, “Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria,” IEEE Transactions on Audio, Speech and Language Processing, vol. 15, no. 3, pp. 1066-1074, 2007.
[45] S. Yildirim and M. Saraclar, “Single channel music and speech separation using non-negative matrix factorization”, Signal Processing and Communications Applications Conference, pp. 301-304, 2009.
[46] J. Yoo, M. Kim, K. Kang and S. Choi, “Nonnegative matrix partial co-factorization for drum source separation”, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 1942-1945, 2010.
[47] Y. Zhang and Y. Fang, “A NMF algorithm for blind separation of uncorrelated signals” International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR), vol. 3, pp. 999-1003, 2007.
[48] M. Zhong and M. Girolami, “Reversible Jump MCMC for Non-Negative Matrix Factorization”, Proceedings of International Conference on Artificial Intelligence and Statistics (AISTATS), vol. 5, pp. 663-670, 2009.