簡易檢索 / 詳目顯示

研究生: 葉俊宜
Yeh, Jiun-Yi
論文名稱: 人耳聽覺濾波器應用於強健性語音辨識系統
Application of Human Auditory Filters to the Robustness for Speech Recognition System
指導教授: 雷曉方
Lei, Sheau-Fang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2016
畢業學年度: 105
語文別: 中文
論文頁數: 101
中文關鍵詞: 人耳基底膜濾波器語音辨識倒頻譜係數人耳聽覺模型
外文關鍵詞: Gammachirp Filterbank, Speech Recognition, Cepstral Coefficients, Auditory Modeling
相關次數: 點閱:70下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在本篇論文中,提出一個基於人耳聽覺特性的聽覺濾波器,利用此聽覺濾波器去應用在特徵演擷取上,進而提出一個特徵值擷取算法使用於強健語音的辨識系統上。在此篇研究中,語音訊號將會經由一個新的特徵演算法稱做珈瑪啁啾調頻率倒頻譜係數(Gammachirp Frequency Cepstral Coefficient, GcFCC)被特徵化,與現在普遍使用的梅爾倒頻譜係數(Mel Frequency Cepstral Coefficient, MFCC),以及改良前的珈瑪調頻率倒頻譜係數(Gammatone Frequency Cepstral Coefficient, GFCC)此兩種方法相比,MFCC、GFCC使用複利葉轉換後經過各自的濾波器,分別為梅爾三角濾波器(Mel Triangle Filterbank)和珈瑪調濾波器(Gammatone Filterbank)來產生頻譜,而GcFCC則是使用基於人耳基底膜的濾波器(Gammachirp)來產生頻譜,因為梅爾三角濾波器、加碼調濾波器與人耳基底膜濾波器的不同轉換特性使得GcFCC產生的頻譜可以更準確的模仿人耳聽覺的特性以及改善雜訊的干擾。此外,本篇論文使用 HTK 工具在訓練以及測試時產生隱藏式馬可夫模型(Hidden Markov Model, HMM)。本篇論文使用AURORA 2.0 做為訓練以及測試時的資料庫,測試使用 AURORA 2.0 裡的 testA 做測試資料,雜訊分別有地鐵、人聲、汽車、展覽廳,辨識結果顯示出所提出的 GcFCC方與MFCC在雜訊比範圍-5dB 到 20dB 裡,平均四種雜訊的語音辨識率改善了6 %,並與珈瑪調頻率倒頻譜係數(Gammatone Frequency Cepstral Coefficient, GFCC)做比較,平均語音辨識率與GFCC相比約改善5 %。

    An auditory filter based-on the human-auditory characteristic is proposed. It is used to extract feature, and then applied to enhancing the robustness of automatic speech recognition. In the proposed approach, the speech signal is characterized using a new feature referred to as the Gammachirp Frequency Cepstral Coefficient (GcFCC). In contrast to the conventional Mel Frequency Cepstral Coefficient (MFCC) and unimproved Gammatone Frequency Cepstral Coefficient (GFCC) methods based on a fourier spectrogram and Then through the respective filter , the proposed GcFCC method uses an auditory spectrogram based on a gammachirp filter in order to more accurately mimic the auditory response of the human ear and improve the noise immunity. In addition, a Hidden Markov Model (HMM) is used for both training and testing purposes. The evaluation results obtained using the AURORA 2 noisy speech database show that compared to the MFCC method, and Gammatone Frequency Cepstral Coefficient (GFCC), the proposed scheme improves the speech recognition rate by 6% ,and 5 % on average given speech samples with Siganl-to-Noise Ratios (SNRs) ranging from -5 to 20 dB, respectively .

    中文摘要 I EXTENDED ABSTRACT III 誌謝 XII 目錄 XIII 表目錄 XV 圖目錄 XVI 第一章 緒論 18 1.1. 噪音訊號簡介 18 1.1.1 噪音訊號種類 19 1.1.2 噪音訊號影響 22 1.2. 研究目的與動機 24 1.3. 語音強健性 25 1.3.1. 語音強化 (Speech Enhancement) 26 1.3.2. 強健性語音特徵值 (Robust Speech Feature) 26 1.3.3. 聽覺濾波器 (Auditory Filterbank) 27 1.4. 論文章節組織 27 第二章 相關文獻介紹與分析 28 2.1. 人耳聽覺特性 28 2.1.1 耳蝸與臨界頻帶 28 2.1.2 遮蔽效應 31 2.1.3 遮蔽能力特性 34 2.2. 基於聽覺濾波器下特徵值擷取 36 2.2.1 前處理 38 2.2.2 離散傅立葉轉換 40 2.2.3 聽覺濾波器 41 2.2.4 對數能量 & 離散餘弦轉換 46 2.3. 語音辨識中特徵值用處 46 2.3.1 動態特徵值參數 47 2.3.2 GMM與 HMM語音模組 48 2.3.3 維特比演算法 54 第三章 基於遮蔽特性的聽覺濾波器 56 3.1 Gammatone濾波器 56 3.2. Gammachirp濾波器 65 3.2.1 Gammachirp濾波器 66 3.2.2 高低頻非對稱性 69 3.2.3 聲音強度依賴性 72 3.3. 特徵值擷取演算法 73 第四章 演算法的分析比較與結果 78 4.1. MATLAB模擬 78 4.1.1. SNR值模擬 78 4.1.2. 特徵值係數模擬 88 4.2. 實驗模擬 90 4.2.1 辨識系統流程 90 4.2.2 語料庫 91 4.2.3 HTK toolkit 92 4.2.3. 模擬結果 94 第五章 結論與未來展望 99 參考文獻 100

    [1] P. C. Loizou, Speech enhancement: theory and practice: CRC press, 2013.
    [2] S. F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 27, pp. 113-120, 1979.
    [3] A. D. Berstein and I. D. Shallom, "An hypothesized Wiener filtering approach to noisy speech recognition," in Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference on, 1991, pp. 913-916.
    [4] S. Furui, "Cepstral analysis technique for automatic speaker verification," Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 29, pp. 254-272, 1981.
    [5] O. Viikki and K. Laurila, "Cepstral domain segmental feature vector normalization for noise robust speech recognition," Speech Communication, vol. 25, pp. 133-147, 1998.
    [6] H. Hermansky, "Should recognizers have ears?," Speech communication, vol. 25, pp. 3-27, 1998.
    [7] J. Tchorz and B. Kollmeier, "A model of auditory perception as front end for automatic speech recognition," The Journal of the Acoustical Society of America, vol. 106, pp. 2040-2050, 1999.
    [8] H. Zwicker, "Fastl:«Psychoacoustics»," ed: Springer Verlag, 1990.
    [9] E. Zwicker and H. Fastl, Psychoacoustics: Facts and models vol. 22: Springer Science & Business Media, 2013.
    [10] P. Noll, "MPEG digital audio coding," Signal Processing Magazine, IEEE, vol. 14, pp. 59-81, 1997.
    [11] W. Jesteadt, S. P. Bacon, and J. R. Lehman, "Forward masking as a function of frequency, masker level, and signal delay," The journal of the Acoustical Society of America, vol. 71, pp. 950-962, 1982.
    [12] G. E. Legge and J. M. Foley, "Contrast masking in human vision," Josa, vol. 70, pp. 1458-1471, 1980.
    [13] R. A. Lutfi and R. D. Patterson, "On the growth of masking asymmetry with stimulus intensity," The Journal of the Acoustical Society of America, vol. 76, pp. 739-745, 1984.
    [14] G. A. Miller, "Sensitivity to changes in the intensity of white noise and its relation to masking and loudness," The Journal of the Acoustical Society of America, vol. 19, pp. 609-619, 1947.
    [15] J. Cadore, F. J. Valverde-Albacete, A. Gallardo-Antolín, and C. Peláez-Moreno, "Auditory-inspired morphological processing of speech spectrograms: Applications in automatic speech recognition and speech enhancement," Cognitive Computation, vol. 5, pp. 426-441, 2013.
    [16] W. Han, C.-F. Chan, C.-S. Choy, and K.-P. Pun, "An efficient MFCC extraction method in speech recognition," in Circuits and Systems, 2006. ISCAS 2006. Proceedings. 2006 IEEE International Symposium on, 2006, p. 4 pp.
    [17] 張智星. (2005). 音訊處理與辨識. Available: http://mirlab.org/jang/books/audiosignalprocessing/
    [18] R. Patterson, I. Nimmo-Smith, J. Holdsworth, and P. Rice, "An efficient auditory filterbank based on the gammatone function," in a meeting of the IOC Speech Group on Auditory Modelling at RSRE, 1987.
    [19] L. H. Carney and T. Yin, "Temporal coding of resonances by low-frequency auditory nerve fibers: single-fiber responses and a population model," Journal of Neurophysiology, vol. 60, pp. 1653-1677, 1988.
    [20] Y. Shao, S. Srinivasan, and D. Wang, "Incorporating auditory feature uncertainties in robust speaker identification," in Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on, 2007, pp. IV-277-IV-280.
    [21] M. A. Hossan, S. Memon, and M. Gregory, "A novel approach for MFCC feature extraction," in Signal Processing and Communication Systems (ICSPCS), 2010 4th International Conference on, 2010, pp. 1-5.
    [22] 王小川, 語音訊號處理: 全華圖書, 2008.
    [23] N. Harte, S. Vaseghi, and B. Milner, "Dynamic features for segmental speech recognition," in Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on, 1996, pp. 933-936.
    [24] J. Lyons, "Mel frequency cepstral coefficient (MFCC) tutorial," ed, 2012.
    [25] S. R. Eddy, "Hidden markov models," Current opinion in structural biology, vol. 6, pp. 361-365, 1996.
    [26] P. Blunsom, "Hidden markov models," Lecture notes, August, vol. 15, pp. 18-19, 2004.
    [27] S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, et al., The HTK book vol. 2: Entropic Cambridge Research Laboratory Cambridge, 1997.
    [28] !!! INVALID CITATION !!! [].
    [29] X. D. Huang, Y. Ariki, and M. A. Jack, Hidden Markov models for speech recognition vol. 2004: Edinburgh university press Edinburgh, 1990.
    [30] W.-H. Liao. (2008). GMM : 高斯混合模型. Available: http://www.cs.nccu.edu.tw/~whliao/acv2008/08gmm.pdf
    [31] 陳鍾城. (2011). 貝氏定理(Bayes' Theorem). Available: http://ccckmit.wikidot.com/st:bayes
    [32] M. Slaney, "An efficient implementation of the Patterson-Holdsworth auditory filter bank," Apple Computer, Perception Group, Tech. Rep, vol. 35, p. 8, 1993.
    [33] J. O. Smith III and J. S. Abel, "Bark and ERB bilinear transforms," Speech and Audio Processing, IEEE Transactions on, vol. 7, pp. 697-708, 1999.
    [34] W. David, "A First Course in Fourier Analysis," ed: Prentice Hall, NY, 2000.
    [35] T. Irino and M. Unoki, "A time-varying, analysis/synthesis auditory filterbank using the gammachirp," in Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on, 1998, pp. 3653-3656.
    [36] A. Adiga, M. Magimai, and C. S. Seelamantula, "Gammatone wavelet Cepstral Coefficients for robust speech recognition," in TENCON 2013-2013 IEEE Region 10 Conference (31194), 2013, pp. 1-4.

    下載圖示 校內:2021-08-01公開
    校外:2021-08-01公開
    QR CODE