簡易檢索 / 詳目顯示

研究生: 李孟穎
Lee, Meng-Ying
論文名稱: 感知因素分析法應用於語音強化
Perceptual Factor Analysis for Speech Enhancement
指導教授: 簡仁宗
Chien, Jen-Tzung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2005
畢業學年度: 93
語文別: 中文
論文頁數: 78
中文關鍵詞: 因素分析雜訊語音噪音抑制訊號子空間語音強化
外文關鍵詞: noise reduction, speech enhancement, noisy speech, factor analysis
相關次數: 點閱:84下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文的目的在於如何改善語音辨識時背景噪音造成辨識率降低的問題,由於測試環境的不同,對於語音辨識模型的不匹配會是造成語音辨識錯誤的主要來源之一。傳統上我們常用頻譜相減法(spectrum subtraction)濾除噪音,此方法在實作上有其簡便的優點,但是濾除的效果不能很乾淨。另外有所謂訊號子空間(signal subspace)的方法,以這種分辨噪音子空間的方式來做語音強化其展現的效果頗佳,並且求解上有運
    用到人類聽覺上的感知特性,即人類對於訊號的失真反應較為敏銳,而對殘存噪音的部份相對而言較不靈敏,因此儘可能壓抑噪音到不令人耳察覺。於是本論文提出以因素分析法(Factor Analysis)為主的雜訊語音訊
    號強化技術,透過巴雷特檢定(Bartlett test)的方式對於訊號的主要因素做出選取後,再針對主要因素作噪音的抑制。很重要的我們發現因素分析法與訊號子空間法在理論上有其互通性,並且我們在本論文發展出一套
    以感知因素分析為主之語音訊號加強技術。在實驗中,我們使用Aurora2雜訊語音資料庫做語音訊號強化,實驗評估發現經過感知因素分析法強化過的語音訊號可以有效提升不同雜訊干擾下的語音辨識率。

    This paper presents a new speech enhancement approach originated from factor analysis (FA) framework. FA is a data analysis model where the relevant common factors can be extracted from observations. A factor loading matrix is found and a resulting model error is introduced for each observation. Interestingly, FA is a subspace approach properly representing the noisy speech. This approach partitions the space of noisy speech into a principal subspace containing clean speech and a complimentary (minor) subspace containing the residual speech and noise. We show that FA is a generalized data model compared to signal subspace approach. To perform FA speech enhancement, we present a perceptual optimization procedure that minimizes the signal distortion subject to the energies of residual speech and noise under a specified level. Importantly, we present a hypothesis testing approach to optimally perform subspace decomposition. In the experiments, we implement perceptual FA speech enhancement using Aurora2 corpus. We find that proposed approach achieves desirable speech recognition rates especially when signal-to-noise ratio is lower than 5 dB.

    第 一 章 導論 13 1.1 前言 13 1.2 研究動機 14 1.3 研究方法簡介 14 1.4 章節概要 15 第 二 章 相關文獻探討 17 2.1 語音辨識系統 17 2.2 訊號子空間 20 2.2.1 時間域限制的估測 22 2.2.2 頻譜域限制的估測 27 2.3 因素分析 30 2.4 使用主要成分估計因素分析模型 32 2.5 因素分析的相關應用 35 第 三 章 感知因素分析法 37 3.1 因素分析法與訊號子空間的關係比較 37 3.2 因素分析法於語音強化 37 3.3 因素分析法加入聽覺感知限制39 3.4 因素個數的預測與評估41 第 四 章 實驗 49 4.1 實驗設定 49 4.2 實驗結果 56 4.2.1 波形圖及頻譜圖之評估 56 4.2.2 訊噪比的分析 60 4.2.3 語音辨識率之分析 61 4.3 實驗討論 67 4.4 系統展示 68 第 五 章 結論與未來研究方向 69 5.1 結論 69 5.2 未來研究方向 69 Appendix 71 Reference 73 作者簡歷 78

    [1] FKyungim Baek, Bruce A. Draper, “Factor Analysis or Background
    Suppression”, International Conference on Pattern Recognition 2002,
    vol. 02, no. 2, p. 20643, 16 .
    [2] Bartlett, M. S. Tests of significance in factor analysis. Brit. J. Psychol.
    Statist. Section, 3, 77-85.
    [3] S. F. Boll, “Suppression of acoustic noise in speech using spectral
    subtraction,” IEEE Trans. Acoust., Speech, Signal Processing, vol.
    ASSP-27, pp. 113–120, Apr. 1979
    [4] Cattell. The scree test for the number of factors. J. Multiv. Behav. Res.,
    1,245-276. 1966.
    [5] Peng Ding, Yang Liu and Bo Xu, “Factor Analyzed Gaussian Mixture
    Models For Speaker Identification ”, International Conference on Spoken
    Language Processing. 2002.
    [6] Y. Ephraim and H. L. Van Trees, “A Signal Subspace Approach for
    Speech Enhancement”, IEEE Transactions on Speech and Audio
    Processing, Vol. 3, No. 4, pp. 251-266, July 1995.
    [7] Farmer, S. A. , “An investigation into the results of principal component
    analysis of data derived from random numbers ”, Statistician, 20,
    63-72.1971.
    [8] Kris Hermus and Patrick Wambacq,“Assessment of Signal Subspace
    Based Speech Enhancement For Noise Robust Speech Recognition,"Int.
    Conf. Acoustics, Speech, and Signal Processing, 2004.
    [9] Yi Hu, and Philipos C. Loizou, “A Generalized Subspace Approach for
    Enhancing Speech Corrupted by Colored Noise”, IEEE Transactions on
    Speech and Audio Processing, Vol. 11, No. 4, pp. 334-341, July 2003.
    [10] Yi, Hu, and Philipos C. Loizou, “A Perceptually Motivated Approach for
    Speech Enhancement”, IEEE Transactions on Speech and Audio
    Processing, Vol. 11, No. 5, pp. 457-465, September 2003.
    [11] Y. Hu and P. C. Loizou, “A subspace approach for enhancing speech
    corrupted by colored noise,” in Proc. IEEE Int. Conf. Acoust., Speech,
    Signal Processing, vol. 1, Orlando, FL, May 2002, pp. 573–576.
    [12] Hu and P. Loizou,“Perceptual weighting motivated subspace based
    speech enhancement approach, " in Proceeding Spoken Language
    Processing, Denver, Colorado, U.S.A., Sept. 2002
    [13] V. Ivancevic, A.K. Kaine, B.A. McLindin, J. Sunde, “Factor Analysis of
    Essential Facial Features”,
    [14] F. Jabloun and B. Champagne, “A Perceptual Signal Subspace Approach
    for Speech Enhancement in Colored Noise,” Proc. IEEE International.
    Conference in Acoustics, Speech, and Signal Processing, Vol. 1, pp.
    569-572, 2002.
    [15] Firas Jabloun and Benoît Champagne, “Incorporating the Human Hearing
    Properties in the Signal Subspace Approach for Speech Enhancement,”
    IEEE Transactions on Speech and Audio Processing, Vol. 11, NO. 6,
    NOVEMBER 2003
    [16] F. Jabloun and B. Champagne, “On the use of masking properties of the
    human ear in the signal subspace speech enhancement approach,” in Proc.
    Int. Workshop Acoust. Echo Noise Control, Darmstadt, Germany, Sep.
    2001, pp. 199–202.
    [17] J. D. Johnston, “Transform coding of audio signal using perceptual noise
    criteria,” IEEE J. Select. Areas Commun., vol. 6, pp. 314–323, Feb. 1988.
    [18] Jollife, I. T. Discarding Variables in a principle component analysis, I:
    Artificial data. Appl. Statist., 21, 160-173.
    [19] Kaiser, H. F. The application of electronic computers to factor analysis.
    Educ. Psychol. Meas., 20,141-151.
    [20] Krzanowski, Cross-validatory choice in principal component
    analysis:some sampling results. J. Statist. Comput., 18:294-314.
    [21] Hanoch Lev-Ari, Yariv Ephraim, “Extension of the Signal Subspace
    Speech Enhancement Approach to Colored Noise”, IEEE Signal
    Processing Letters, Vol. 10, No. 4, pp. 104-106, April 2003
    [22] A. Machado, C. Marinho, M. Campos, “An image retrieval method
    based on factor analysis,” SIBGRAPI03.
    [23] U. Mittal and N. Phamdo, “Signal/Noise KLT based Approach for
    Enhancing Speech Degraded by Colored Noise,” IEEE Transactions on
    Speech Audio Processing, Vol. 8, No. 2, pp.159-167, Mar 2000.
    [24] T. Painter and A. Spanias, “Perceptual Coding of Digital Audio,” Proc.
    IEEE, Vol. 88, pp. 451-515, 2000.
    [25] S. James Press and K. Shigemasu, “A Note on Choosing the Number of
    Factors”, Communications in Statistics: Theory and Methods, Vol. 28,
    Issue 8, 1999.
    [26] L. R. Rabiner, B. H. Juang, Fundamentals of Speech Recognition.
    Englewood Cliffs, NJ: Prentice-Hall, 1993
    [27] R. Reyment and K. J‥oreskog. Applied Factor Analysis in the Natural
    Sciences. Cambridge University Press, Cambridge, 1996.
    [28] A-V.I. Rosti and M.J.F. Gales, “Factor analysis hidden markov models”,
    International. Conference in Acoustics, Speech, and Signal Processing,
    2002, volume 1, pages 949–952.
    [29] S. B. Searle, Matrix Algebra Useful for Statistics. New York: Wiley,1982
    [30] M. Stone. Cross-validatory choice and assessment of statistical
    predictions (with discussion). J. R. Statist. Soc. B, 36:111.147, 1974.
    [31] Velicer, W. F. determine the number of components from the matrix of
    partial correlations. Psychometrika, 41,321-327.
    [32] N. Virag, “Single Channel Speech Enhancement Based on Masking
    Properties of the Human Auditory System,” IEEE Transactions on Speech
    Audio Processing, Vol. 7, No.2, pp. 126-137, Mar 1999.
    [33] J.F Wang, C-H Yang and K-H Chang , “Subspace Tracking For Speech
    Enhancement In Car Noise Environments,” International. Conference on
    Acoustics, Speech, and Signal Processing, 2004. Proceedings.
    [34] Wold, Cross-validatory estimation of the number of components in factor
    and principal cpmponents models, Technometrics, 20,397-405.1978
    [35] Wold, pattern recognition by means of disjoint principal components
    model.patt. recog.,8,1976
    [36] N. Wu and J. Zhang. , “Factor Analysis based Intrusion Detection,”
    Proceedings 4th Annual IEEE Workshop on Information Assurance, West
    Point, NY, June 2003.
    [37] S. Young, et al., The HTK Book, Version 3.0, July 2000.
    [38] http://htk.eng.cam.ac.uk, the HTK homepage..
    [39] Schroeder, M. R., Atal, B. S., and Hall, J. L. , “Optimizing digital
    speech coders by exploiting masking properties of the human ear,” JASA,
    66,1647-1652, 1979.
    [40] Ghahramani, Z., and Hinton, G. E.: The EM algorithm for mixtures of
    factor analyzers. Technical Report CRG-TR-96-1, University of Toronto,
    Dept. of Computer Science, 1997.
    [41] Giri, N. C. Multivariate statistical analysis. New York: Marcel Dekker.
    1996.
    [42] Jollife, I. T. Principal Component Analysis New York: Springer-Verlag,
    1986.

    下載圖示 校內:2007-07-28公開
    校外:2007-07-28公開
    QR CODE