簡易檢索 / 詳目顯示

研究生: 莊芳甄
Juang, Fang-Chen
論文名稱: 單一通道語音增強:使用頻譜刪減法與遞迴類神經模糊網路為基礎之字緣偵測法
Single-Channel Speech Enhancement: Using Spectral Subtraction with Recurrent Neuro-Fuzzy-Network-Based Word Boundary Detection
指導教授: 王振興
Wang, Jeen-Shing
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2005
畢業學年度: 93
語文別: 英文
論文頁數: 45
中文關鍵詞: 類神經模糊網路語音增強
外文關鍵詞: speech enhancement, neuro-fuzzy network
相關次數: 點閱:102下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   本論文採用標準頻譜刪減法(standard spectral subtraction)、頻譜過多刪減法(spectral oversubtraction)及非線性頻譜刪減法(nonlinear spectral subtraction)的單一通道語音增強(single-channel speech enhancement)系統來達到噪音消除的效果。首先,在一段摻有雜訊的語音訊號中,利用自適應性遞迴類神經模糊網路(RSANFIS)偵測出真正語音信號存在的區間,然後在這些聲音區段中將雜訊消除。
    偵測語音範圍是利用字緣偵測演算法(word boundary detection algorithm)擷取語音信號頻譜能量的特徵值後,將此特徵值經平滑化及正規化處理所得到的時間頻率參數輸入至RSANFIS,來找出正確語音所在的區間,接著消除雜訊以得到原始語音信號。在噪音消除方面,採用最小頻率能量參數(minimum frequency energy parameter)來預測每段區間內的背景噪音資訊,然後利用三種頻譜刪減法來消除雜訊達到語音增強的效果。
      經由實驗結果發現,三種頻譜刪減法皆能有效的消除雜訊,但利用非線性頻譜刪減法來做語音增強的效果比其他兩者較好。

     This thesis investigated the effectiveness of noise elimination using a single-channel speech enhancement system based on each of the following three spectral subtraction algorithms—standard spectral subtraction, spectral oversubtraction, and nonlinear spectral subtraction. First, we detected the boundaries of speech signal from a noisy speech signal by a recurrent self-adaptive neuro-fuzzy inference system (R-SANFIS). Then, the noise was eliminated by the speech enhancement system from the speech segments.
     The boundaries of speech signal were obtained by a world boundary detection algorithm that extracts the parameters of frequency energy from the noisy speech signal. These parameters were smoothed and normalized to obtain a time-frequency (TF) parameter. The TF parameter was used in the R-SANFIS to identify the segments of speech signal. To eliminate noise, we used a minimum-frequency-energy (MFE) parameter to represent the characterizations of the background noise in the speech segments and then applied each of the three spectral subtraction algorithms to evaluate the performance of enhancement results.
     In our simulations, the proposed approach can effectively reduce noise in different noisy environment. According to the results, the enhancement system with nonlinear spectral subtraction outperforms the other two algorithms.

    CHINESE ABSTRACT i ABSTRACT ii LIST OF TABLES v LIST OF FIGURES vi 1 Introduction 1-1 1.1 Motivation 1-1 1.2 Literature Survey 1-2 1.3 Purpose of the Study 1-4 1.4 Organization of the Thesis 1-5 2 Feature Extraction and Word Boundary Detection 2-1 2.1 Feature Extraction of the Speech Signal 2-2 2.1.1 Speech Energy of Mel-Scale Filter Bank 2-2 2.1.2 Effect of Additive Noise and Estimation of Background Noise Level 2-4 2.1.3 Zero-Crossing Rate 2-5 2.2 Word Boundary Detection 2-6 2.2.1 Structure of Recurrent SANFIS 2-7 2.2.2 Word Boundary Detection by Using R-SANFIS 2-14 3 Spectral Subtraction for Speech Enhancement 3-1 3.1 Generalities of Spectral Subtraction 3-3 3.2 Spectral Subtraction Variations and Generalizations 3-5 3.3 Three Spectral Subtraction Algorithms 3-5 3.3.1 Standard Spectral Subtraction Algorithm 3-5 3.3.2 Spectral Oversubtraction Algorithm 3-7 3.3.3 Nonlinear Spectral Subtraction Algorithm 3-8 3.4 Reduction of Spectral Error 3-10 3.4.1 Magnitude Averaging 3-10 3.4.2 Half-Wave Rectification 3-11 3.4.3 Residual Noise Reduction 3-11 3.5 Scheme of the Speech Enhancement 3-12 4 Simulations 4-1 4.1 Noise Data 4-1 4.2 The Speech Signal 4-5 4.3 The Enhanced Speech Signal by Three Spectral Subtraction Algorithms 4-6 5 Conclusions and Future Works 5-1 References

    [1] J. R. Deller, J. G. Proakis, and J. H. L. Hansen, Discrete-time processing of speech signals, New York: Macmillan Pub. Co., 1993.
    [2] S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. on Acoustics, Speech, and Signal Processing, vol. 27, no. 2, pp. 113-120, Apr. 1979.
    [3] B. Widrow and J. R. Glover, “Adaptive noise canceling: principles and applications,” in Proceedings of IEEE, vol. 63, pp. 1692-1716, 1975.
    [4] C. F. Jauang and C. T. Lin, “Noisy speech processing by recurrently adaptive fuzzy filters,” IEEE Trans. on Fuzzy System, vol. 9, no. 1, pp. 139-152, Feb. 2001.
    [5] S. N. Wu and J. S. Wang, “An adaptive recurrent neuro-fuzzy filter for noisy speech enhancement,” in Proceedings of 2004 IEEE Int’l Joint Conference on Neural Network, vol. 4, pp. 3083-3088, July 2004.
    [6] P. M. Clarkson, Optimal and adaptive signal processing, Boca Raton: CRC Press, 1993.
    [7] L. Jae, A. Oppenheim, and L. Braida, “Evaluation of an adaptive comb filtering method for enhancing speech degraded by white noise addition,” IEEE Trans. on Acoustics, Speech, and Signal Pro¬cessing, vol. 26, no. 4, pp. 354-358, Aug. 1978.
    [8] J. S. Wang and C. S. G. Lee, “Self-adaptive recurrent neuro-fuzzy control of an autonomous underwater vehicle,” IEEE Trans. on Robotics and Automation, vol. 19, no. 2, pp. 283-295, 2003.
    [9] C. T. Lin, “Single-channel speech enhancement in variable noise-level environment,” IEEE Trans. on Systems, Man, and Cybernetics-Part A: Systems and Humans, vol. 33, no. 1, pp. 137-144, Jan. 2003.
    [10] G. D. Wu and C. T. Lin, “A recurrent neural fuzzy network for word boundary detection in variable noise-level environments,” IEEE Trans. on Systems, Man, and Cybernetics-Part B: Cybernetics, vol. 31, no. 1, pp. 84-97, Feb. 2001.
    [11] C. T. Lin and C. S. G. Lee, Neural fuzzy systems: a neural-fuzzy syner¬gism to intelligent systems, Englewood Cliffs, NJ: Prentice-Hall, May 1996.
    [12] G. D. Wu and C. T. Lin, “Word boundary detection with Mel-scale frequency bank in noisy environment,” IEEE Trans. on Speech and Audio Processing, vol. 8, no. 5, pp. 541-554, Sep. 2000.
    [13] C. T. Lin, J. Y. Lin, and G. D. Wu, “A robust word boundary detection algorithm for variable noise-level environment in cars,” IEEE Trans. on Intelligent Transportation Systems, vol. 3, no. 1, pp. 89-101, Mar. 2002.
    [14] N. Virag, “Single channel speech enhancement based on masking prop¬erties of the human auditory system,” IEEE Trans. on Speech and Audio Pro¬cessing, vol. 7, no. 2, pp. 126-137, Mar. 1999.
    [15] J. O. Garcia and J. G. Rodriguez, “Overview of speech enhancement techniques for automatic speaker recognition,” in Proceedings of 1996 ICSLP 96 4th Int’l Conference on Spoken Language, vol. 2, pp.929-932, 3-6 Oct. 1996.
    [16] J. Poruba, “Speech enhancement based on nonlinear spectral subtraction,” in Proceedings of 2002 IEEE 4th Int’l Caracas Conference on Devices, Circuits and Systems, pp. T031-1 - T031-4, 17-19 April 2002.
    [17] J. S. Lim, Speech enhancement, Englewood Cliffs, NJ: Prentice-Hall, 1983.
    [18] D. O’Shaughnessy, Speech communication: human and machine, Reading, Mass., Addison-Wesley Pub. Co., 1987.
    [19] L. Rabiner and B. H. Juang, Fundamentals of speech recognition, Englewood Cliffs, NJ: PTR Prentice Hall, 1993.
    [20] F. J. Owens, Signal processing of speech, New York: McGraw-Hill, 1993.
    [21] L. Lamel, L. Rabiner, A. Rosenberg, and J. Wilpon, “An improved endpoint detector for isolated word recognition,” IEEE Trans. on Acoustics, Speech, and Signal Processing, vol. 29, no. 4, pp. 777-785, Aug. 1981.
    [22] M. H. Savoji, “A robust algorithm for accurate endpointing of speech,” Speech Communication, vol. 8, pp. 45-60, 1989.
    [23] C. Tsao and R. M. Gray, “An endpoint detector for LPC speech using residual error look-ahead for vector quantization applications,” in Proceedings of 1984 IEEE ICASSP-84 Int’l Conference on Acoustics, Speech, and Signal Processing, vol. 9, pp. 97-100, 1984.
    [24] L. R. Robiner and M. R. Sambur, “An algorithm for determining the endpoints of isolated utterances,” The Bell System Technical Journal, vol. 54, no. 2, pp. 297-315, Feb. 1975.
    [25] M. Hamada, Y. Takizawa, and T. Norimatsu, “A noise robust speech recognition system,” in Proceedings ICSLP-90, pp. 893-896, 1990.
    [26] B. Reaves, “Comments on an improved endpoint detector for isolated word recognition,” IEEE Trans. on Acoustics, Speech, and Signal Processing, vol. 39, no. 2, pp. 526-527, Mar. 1991.
    [27] J. C. Junqua, B. Mak, and B. Reaves, “A robust algorithm for word boundary detection in the presence of noise,” IEEE Trans. on Speech and Audio Processing, vol. 2, no. 3, July 1994.
    [28] S. J. Kia and G. G. Coghill, “A mapping neural network and its application to voiced-unvoiced-silence classification,” in Proceedings of 1993 First New Zealand Int’l Two-Stream Conference on Artificial Neural Networks and Expert Systems, pp. 104-108, Nov. 1993.
    [29] Y. Qi and B. R. Hunt, “Voiced-unvoiced-silence classification of speech using hybrid features and a network classifier,” IEEE Trans. on Speech and Audio Processing, vol. 1, no. 2, pp. 250-255, Apr. 1993.
    [30] M. Gori, M. Mozer, A. C. Tosi, and R. L. Watrous, “Presenting the special issue on recurrent neural networks for sequence processing,” Neurocomputing, vol. 15, no. 3-4, pp. 181-182, 1997.
    [31] A. C. Tosi and A. D. Back, “Locally recurrent globally feedforward networks: a critical review of architectures,” IEEE Trans. on Neural Networks, vol. 5, no. 2, pp. 229-239, 1994.
    [32] A. C. Tosi and A. D. Back, “Discrete time recurrent neural network architectures: a unifying review,” Neurocomputing, vol. 15, no. 3-4, pp. 183-223, June 1997.
    [33] L. X. Wang and J. M. Mendel, “Fuzzy basis functions, universal approximation, and orthogonal least-squares learning,” IEEE Trans. on Neural Networks, vol. 3, no. 5, pp. 807-814, Sept. 1992.
    [34] F. Gurgen and C. S. Chen, “Speech enhancement by Fourier-Bessel coef¬ficients of speech and noise,” in Proceedings of IEE Communication, Speech and Vision, vol. 137, no. 5, pp. 290-294, Oct. 1990.
    [35] P. Lockwood and J. Boundy, “Experiments with a nonlinear spectral subtraction (NSS), hidden Markov models and the projection for ro¬bust speech recognition in cars,” Speech Communication, vol. 11, pp. 215-228, 1992.
    [36] M. Lorber and R. Hoeldrich, “A combined approach for broadband noise reduction,” in IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustic, pp. 1-4, 1997.
    [37] M. S. Ahmed, “Comparison of noisy speech enhancement algorithms in terms of LPC perturbation,” IEEE Trans. on Acoustics, Speech, and Signal Processing, vol. 37, no. 1, pp. 121-125, Jan. 1989.
    [38] N. B. Yoma, F. R. McInnes, and M. A. Jack, “Improving performance of spectral subtraction in speech recognition using a model for additive noise,” IEEE Trans. on Speech and Audio Processing, vol. 6, no. 6, pp. 579-582, Nov. 1998.
    [39] L. S. Boh, C. T. Yit, J. S. Chang, and T. T. Chin, “A parametric formulation of the generalized spectral subtraction method,” IEEE Trans. on Speech and Audio Processing, vol. 6, no. 4, pp. 328-337, July 1998.
    [40] H. Gustafsson, S. E. Nordholm, and I. Claesson, “Spectral subtraction using reduced delay convolution and adaptive averaging,” IEEE Trans. on Speech and Audio Processing, vol. 9, no. 8, pp. 799-807, Nov. 2001.
    [41] M. K. Hasan, S. Salahuddin, and M. R. Khan, “A modified a priori SNR for speech enhancement using spectral subtraction rules,” IEEE Signal Processing Letters, vol. 11, no. 4, pp. 450-453, April 2004.
    [42] A. Varga, H. J. M. Steeneken, M. Tomlinson, and D. Jones, “The NOISEX-92 study on the effect of additive noise automatic speech recognition,” in Description of RSG. 10 and Esprit SAM Experiment and Database, Malvern, U.K.: DRA Speech Res., 1992.

    下載圖示 校內:2008-08-05公開
    校外:2008-08-05公開
    QR CODE