簡易檢索 / 詳目顯示

研究生: 董英凱
Tung, Ying-Kai
論文名稱: 基於小波轉換之語音增強系統
Wavelet-based Speech Enhancement System
指導教授: 雷曉方
Lei, Sheau-Fang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2005
畢業學年度: 93
語文別: 英文
論文頁數: 85
中文關鍵詞: 小波縮減法小波波包轉換語音暫停區段非穩定的背景噪音
外文關鍵詞: speech pause detection, wavelet packet transform, wavelet shrinkage
相關次數: 點閱:88下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   本篇論文的主要目的是設計一個基於小波轉換的語音增強系統,並將其實現與驗證。論文內容主要分成系統演算法與硬體實現兩部分:
    在系統演算法的部分,我們提出一個基於小波波包轉換及適應性噪音估計法的語音增強系統。所計算出來的小波臨界值可以根據SNR的變化,做適應性時變的調整,其SNR是由適應性噪音估計演算法得到。適應性噪音估計演算法在計算上簡單並且可靠,另外這個演算法不需要複雜的機制來偵測語音暫停的區段,因此所提出的方法可以有效的抑制噪音並且減少語音的失真。我們以各種非穩定的背景噪音做評估,實驗結果顯示,所提出的方法可以有效的應用在各種噪音環境下,效能勝過傳統的方法和其它基於小波轉換的語音增強方法。
      在硬體實現與驗證的部分,將針對這個語音增強演算法提出一個有效率的硬體架構。我們將比較串列輸入及平行輸入架構的優缺點。最後我們的實現方式是以Verilog HDL描述所提出的電路架構,在 FPGA上進行合成及佈局繞線後驗證。再使用Synopsys® Design Compiler進行邏輯合成。最後採用自動佈局繞線軟體以0.18um 1P6M製程做晶片佈局規劃。

     In this thesis, we focus on the design of a wavelet-based speech enhancement system. The contents of this thesis consists of the development and hardware of the system algorithm:
    In the system algorithm, we propose a speech enhancement algorithm based on wavelet packet transform and adaptive noise estimation. The wavelet threshold in this algorithm is temporally adapted to SNR variations which can be calculated by adaptive noise estimation. Adaptive noise estimation can be computationally simple and reliable to estimate the noise levels from noisy signal itself without complicated speech pause detection. Thus the proposed algorithm can efficiently suppress the noise while reducing speech distortion. A speech signal corrupted by nonstationary noises is used for the performance evaluation of the proposed algorithm. Experimental results show that the proposed algorithm outperforms the conventional spectral subtraction and other wavelet based denoising approaches for speech enhancement.
     In the hardware implementation and verification, we present an efficient VLSI architecture for adaptive noise estimation algorithm. And comparisons of the advantages and disadvantages between serial-in and parallel-in architecture. Finally, the proposed VLSI architecture is described in Verilog HDL, synthesized and verified on FPGA to post place-and-route simulation, and then synthesized by the Synopsys® Design Compiler. The chip is designed in a 0.18um 1P6M CMOS technology.

    摘要..........................................................................i Abstract.....................................................................ii Acknowledgment..............................................................iii List of Tables..............................................................vii List of Figures............................................................viii Chapter 1 Introduction........................................................1 1.1 Background................................................................1 1.2 Relative Topics...........................................................2 1.2.1 Spectral Subtraction....................................................2 1.2.2 Wiener Filtering........................................................4 1.2.3 Wavelet Shrinkage.......................................................4 1.2.4 Signal Subspace.........................................................7 1.3 Motivation................................................................7 1.4 Organization of Thesis....................................................8 Chapter 2 Overview of Wavelet Transform.......................................9 2.1 Introduction..............................................................9 2.2 The Multiresolution Pyramid..............................................11 2.3 Subband Coding Schemes...................................................12 2.4 Discrete Wavelet Transform...............................................13 2.5 Discrete Wavelet Packet Transform........................................15 2.5.1 Wavelet Packet Transform for Noisy Speech Signals......................16 2.6 Implementation of Discrete Wavelet Transform.............................16 2.6.1 Biorthogonal Wavelets..................................................17 2.6.2 Lifting-based DWT......................................................17 2.7 Boundary Effect and Compensation Method..................................19 2.7.1 Boundary Extension for Orthogonal Wavelets.............................20 2.7.2 Boundary Extension for Odd Symmetric Orthogonal Wavelets...............20 Chapter 3 The Proposed Speech Enhancement Algorithm..........................22 3.1 Speech Enhancement System................................................22 3.2 Perceptual Filterbank....................................................23 3.3 Adaptive Noise Estimation................................................26 3.4 The Relationship between Sigmoid Function and Adaptive Noise Estimate....29 3.5 Time Adaptive Threshold..................................................31 3.6 Soft Thresholding........................................................32 Chapter 4 Hardware Design and implementation of Adaptive Noise Estimation algorithm.........................................................33 4.1 Issue of System Architecture.............................................33 4.1.1 Serial Architecture....................................................35 4.1.2 Parallel-Serial Architecture...........................................35 4.2 Architecture of the Adaptive Noise Estimation Algorithm..................38 4.2.1 Average Signal Power Unit..............................................39 4.2.2 Implementation of Posteriori SNR and Sigmoid Function..................41 4.2.3 Noise Estimation and Smoothing Unit....................................42 4.2.4 Noise Storage and Frame Average Noise Estimation Unit..................43 4.2.5 Precision Analysis.....................................................44 4.3 FPGA Implementation of Parallel-Serial Architecture......................44 4.3.1 Design Flow............................................................44 4.3.2 Implementation and Verification........................................46 4.4 ASIC Implementation......................................................47 4.4.1 Design Flow............................................................47 4.4.2 Area Analysis..........................................................48 4.4.3 Gate-Level Simulation..................................................50 4.4.4 Layout View of Chip....................................................53 Chapter 5 Experimental Results and Performance Evaluation....................54 5.1 Experimental setup.......................................................54 5.2 Noise Database Description...............................................55 5.3 Experimental and Performance Evaluation..................................56 5.3.1 Proposed Algorithm Evaluation..........................................56 5.3.2 Different a and T Value for Sigmoid Function Testing...................65 5.4 Comparisons of Speech Enhancement Method.................................66 Chapter 6 Conclusions and Future Works.......................................78 References...................................................................80 List of Tables Table 3-1 The characteristics of critical bands under 4kHz..................24 Table 4-1 Computation complexity analysis...................................34 Table 4-2 Comparison of various architecture for proposed method............37 Table 4-3 Area report of each unit for serial architecture..................48 Table 4-4 Area report of each unit for parallel-serial architecture.........50 Table 5-1 The average SegSNR results of T=2.................................65 Table 5-2 The average SegSNR results of T=3.................................66 Table 5-3 Speech enhancement system characteristic comparison...............67 Table 5-4 Comparison of average SegSNR results for the enhanced speech in various noises...............................................................68 List of Figures Figure 1-1 Subtractive-type speech enhancement based on masking properties...3 Figure 1-2 Speech enhancement using DWT and TEO..............................6 Figure 2-1 Corresponding basis functions and time-frequency resolutions of the (a) short-time Fourier transform (STFT), and (b) wavelet transform (WT).........................................................................10 Figure 2-2 Subband coding scheme............................................12 Figure 2-3 A three-octave filter bank for DWT...............................14 Figure 2-4 The full binary tree for the two-scale wavelet packet transform..15 Figure 2-5 The lifting scheme - forward transform..........................18 Figure 2-6 Boundary extension...............................................19 Figure 2-7 Analysis processes of an even-length signal segment using orthogonal wavelet transforms................................................20 Figure 2-8 Analysis processes of an even-length signal segment using biorthogonal wavelet transforms..............................................21 Figure 3-1 System block diagram for speech enhancement......................22 Figure 3-2 Adaptive noise estimation and thresholding of wavelet coefficients.................................................................23 Figure 3-3 The tree structure of the perceptual wavelet packet transform....25 Figure 3-4 (a)Bark scale as a function of center frequency, (b)Critical bandwidth as a function of center frequency..................................25 Figure 3-5 Subband: 1000Hz~1250Hz...........................................28 Figure 3-6 The enhanced speech signal.......................................28 Figure 3-7 Plot of against the a posteriori SNR...........................30 Figure 4-1 The block diagram of the proposed method for adaptive noise estimation...................................................................34 Figure 4-2 The serial architecture..........................................35 Figure 4-3 The whole palrallel architecture.................................36 Figure 4-4 The parallel - serial architecture...............................37 Figure 4-5 initial / normal state...........................................38 Figure 4-6 The average signal power unit: (a)for serial architecture(b)for parallel architecture....................................................40 Figure 4-7 Architecture of posteriori SNR and Sigmoid function..............41 Figure 4-8 Architecture of noise estimation unit :(a)type 1(b)type 2....42 Figure 4-9 (a)Noise storage and frame average noise estimation unit ( 2 frames), and(b)takes 4 frames..............................................43 Figure 4-10 Data path widths of system......................................44 Figure 4-11 Design flow.....................................................45 Figure 4-12 Summary report..................................................46 Figure 4-13 Timing summary..................................................47 Figure 4-14 The result of post place-and-route simulation...................47 Figure 4-15 Critical path view by Design Vision.............................49 Figure 4-16 (a)simulation results of 1st to 3rd frame(b)The time spent for one frame(c)simulation results of speech-dominanted frame..............51 Figure 4-17 The gate-level simulation results for parallel-serial architecture.................................................................52 Figure 4-18 Layout view of chip for serial architecture.....................53 Figure 5-1 Various noise signals............................................54 Figure 5-2 (a)clean speech, (b)noisy speech.............................56 Figure 5-3 Smoothing parameter..............................................57 Figure 5-4 The estimated noise power........................................58 Figure 5-5 posteriori SNR...................................................58 Figure 5-6 Voiced frame.....................................................59 Figure 5-7 Unvoiced frame...................................................60 Figure 5-8 (a)thresholded wavelet coefficients for voiced frame, and(b)for unvoiced frame...........................................................60 Figure 5-9 The enhanced speech signal with an average segmental SNR of 11.86 dB...........................................................................61 Figure 5-10 Speech enhancement results of a speech signal corrupted by additive white noise.........................................................62 Figure 5-11 Spectrogram of clean speech(from top to bottom), speech corrupted by additive white noise, and enhanced speech.......................62 Figure 5-12 Speech enhancement results of a speech signal corrupted by additive destroyer engine noise..............................................63 Figure 5-13 Spectrogram of clean speech(from top to bottom), speech corrupted by additive destroyer engine noise, and enhanced speech............63 Figure 5-14 Speech enhancement results of a speech signal corrupted by additive Volvo car noise.....................................................64 Figure 5-15 Spectrogram of clean speech(from top to bottom), speech corrupted by additive Volvo car noise, and enhanced speech...................64 Figure 5-16 Speech enhancement results of a speech signal corrupted by additive white noise.........................................................70 Figure 5-17 Spectrogram of Fig. 5-16........................................71 Figure 5-18 Speech enhancement results of a speech signal corrupted by additive F16 cockpit noise...................................................72 Figure 5-19 Spectrogram of Fig. 5-18........................................73 Figure 5-20 Speech enhancement results of a speech signal corrupted by additive destroyer engine noise..............................................74 Figure 5-21 Spectrogram of Fig. 5-20........................................75 Figure 5-22 Speech enhancement results of a speech signal corrupted by additive Volvo car noise.....................................................76 Figure 5-23 Spectrogram of Fig. 5-22........................................77

    References

    [1] S.F. Boll, ”Suppression of acoustic noise in speech using spectral subtraction,”IEEE Trans. Acoustics, Speech, and Signal Processing, vol. ASSP-29, no.2, pp. 113-120 , 1979.
    [2] Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean square error short time spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-32, pp. 1109–1121, 1984.
    [3] B. Yegnanarayana , C. Avendano, H. Hermansky. And P. Satyanarayana Murthy,“Speech Enhancement Using Linear Predicion Residual,” Speech communication, vol. 28, pp. 25-42, 1999.
    [4] D. L. Donoho, “De-Noising by Soft-Thresholding, “IEEE Trans. Inform. Theory, vol. 41, pp. 613-627, 1995.
    [5] D.L. Donoho and I.M. Johnstone, “Ideal Spatial Adaptation by Wavelet Shrinkage,“ Biometrika, vol. 81, pp. 425-455, 1994.
    [6] M. Berouti, R. Schwartz and J. Makhoul ,”Enhancement of speech corrupted by acoustic noise,” Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '79, vol. 4, pp.208 - 211, 1979.
    [7] N. Virag,” Speech enhancement based on masking properties of the auditory system,” Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on, vol. 1, pp.796 - 799, 1995.
    [8] N. Virag, ”Single channel speech enhancement based on masking properties of the human auditory system,” IEEE Trans. Speech and Audio Processing, vol. 7 , Issue: 2, pp.126 - 137, 1999.
    [9] J.D Johnston, “Transform coding of audio signals using perceptual noise criteria,” IEEE Journal on Selected Areas in Communications, vol. 6, Issue: 2, pp.314-323 , 1988.
    [10] T. Painter and A. Spanias, ”A review of algorithms for perceptual coding of digital audio signals,”13th International Conference on Digital Signal Processing Proceedings, vol. 1 , pp.179 - 208, 1997.
    [11] S. Chang, Y. Kwon, S. I. Yang, and I. J. Kim,“Speech enhancement for non-stationary noise environment by adaptive wavelet packet,”Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-2002, Vol. 1, pp. 561-564, 2002.
    [12] C.T. Lu and H.C. Wang ,“Enhancement of single channel speech based on masking property and wavelet transform,” Speech Commun., vol. 41, pp. 409–427, 2003.
    [13] I. Pinter, “Perceptual Wavelet-Representation of Speech Signals and its Application to Speech Enhancement,” Computer Speech and Language, vol. 10, no. 1, pp. 1-22, 1996.
    [14] B. Carnero and A. Drygajlo ,” Perceptual speech coding and enhancement using frame-synchronized fast wavelet packet transform algorithms,” Signal Processing, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 47 , Issue: 6, pp.1622- 1635, 1999.
    [15] M. Bahoura and J. Rouat ,” Wavelet speech enhancement based on the Teager energy operator,” Signal Processing Letters, vol. 8 , Issue: 1 , pp.10 - 12 , 2001
    [16] S. H. Chen and J. F. Wang, “Speech Enhancement Using Perceptual Wavelet Packet Decomposition and Teager Energy Operator,” Journal of VLSI Signal Processing, vol. 36, Issue 2-3 , pp. 125 – 139, 2004.
    [17] J.F. Kaiser ,”On a simple algorithm to calculate the ‘energy’ of a signal,” Acoustics, Speech, and Signal Processing, 1990. ICASSP-90., 1990 International Conference on , pp.381 - 384 vol.1, 1990.
    [18] J.F. Kaiser ,”Some useful Properties of Teager’s Energy Operator, “ in Proceedings of IEEE Int. Conf. Acoustics, Speech, and Signal Processing ’93, pp. 149-152, 1993.
    [19] Y. Ephraim and H. L. Van-Trees, “A signal subspace approach for speech enhancement,” IEEE Trans. Speech Audio Processing, vol. 3, pp. 251–266, July 1995.
    [20] A.-J. D. Veen, E. F. Deprettere and A. L. Swindlhurst, “Subspace-based signal analysis using singular value decomposition,” Proc. IEEE, vol. 91, pp. 1277-1308, 1993.
    [21] S. Yoon and C. D. Yoo, "Speech enhancement based on Speech/noise-dominant decision," IEICE Transactions on Information and Systems, Vol. E85-D, no.4, 2002.
    [22] H. G. Hirsch and C. Ehrlicher, “Noise estimation techniques for robust speech recognition,” Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 1, pp. 153–156, 1995.
    [23] J. Sohn and W. Sung, “A voice activity detector employing soft decision based noise spectrum adaptation,” Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 1, pp. 365–368, 1998.
    [24] S. G. Mallat, “A theory for multiresolution signal decomposition_the wavelet representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, Issue 7, pp.674 - 693, 1989.
    [25] I. Daubechies, “The wavelet transform, time-frequency localization and signal analysis,” IEEE Transactions on Information Theory, vol. 36, Issue 5, pp.961 – 1005, 1990.
    [26] O. Rioul and M. Vetterli, ”Wavelets and signal processing,” IEEE Signal Processing Magazine, vol. 8, Issue 4 pp.14 - 38, 1991.
    [27] W. Sweldens, “The Lifting Scheme: A construction of second generation wavelets,” SIAM Journal on Mathematical Analysis, 1996.
    [28] K.Andra, C. Chakrabarti, T. Acharya, “A VLSI architecture for lifting-based forward and inverse wavelet transform,” IEEE Transactions on Acoustics, Speech, and Signal Processing, Volume: 50 , Issue: 4, pp.966 - 977, 2002.
    [29] J. M. Jou, Y. H. Shiau, and C. C. Liu, “Efficient VLSI architectures for the biorthogonal wavelet transform by filter bank and lifting scheme,” IEEE International Symposium on Circuits and Systems, vol. 2, pp.529 - 532 vol. 2, 2001.
    [30] Shipeng Li and Weiping Li, “Shape-adaptive discrete wavelet transforms for arbitrarily shaped visual object coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 10 , Issue: 5, pp.725 - 743, 2000.
    [31] L. Lin, W.H. Holmes, and E. Ambikairajah, “Adaptive noise estimation algorithm for speech enhancement,” Electronics Letters, vol. 39. no. 9, pp.754-755, 2003.
    [32] R. Martin, “Noise power spectral density estimation based on optimal smoothing and minimum statistics,” IEEE Transactions on Speech and Audio Processing, Volume 9, Issue 5, pp.504 - 512, 2001.
    [33] G. Doblinger, Computationally efficient speech enhancement by spectral minima tracking in subbands, Proceedings of the EUROSPEECH'95, Madrid, Vol. 2, pp.1513-1516, 1995.
    [34] I. M. Johnstone and B. W. Silverman,”wavelet threshold estimators for data with correlated noise,”J. Roy. Statist. Soc. B, vol. 59, pp.319-351, 1997.
    [35] Altera Corporation, “Quartus II handbook,” Literature, [on-line] Available: http://www.altera.com/literature/lit-index.html
    [36] Altera corporation, ”Stratix device handbook,” Literature, [on-line] Available: http://www.altera.com/literature/lit-index.html
    [37] Synopsys corporation,”Synopsys On-Line Documentation,” National Chip Implementation Center, Taiwan, 2004.
    [38] Thomas F. Quatieri, “Discrete-time speech signal processing: principles and practice,” Prentice Hall PTR, 2002.
    [39] NOISEX-92 noise database, Signal Processing Information Base by the Signal Processing Society and the National Science Foundation, [on-line] Available: http://spib.rice.edu/spib.html
    [40] E. Zwicker and E. Terhardt,” Analytical Expressions for Critical-Band Rate and Critical Bandwidth as a Function of Frequency,” JASA, vol. 68, pp. 1523-1525, 1980.
    [41] D.L. Donoho, “Unconditional bases are optimal bases for data compression and statistical estimation,” Applied and Computational Harmonic Analysis, vol. 1, pp.100-115, 1994.
    [42] A. Cohen, I. Daubechies, and J. C. Feauveau, “Biorthogonal bases of compactly supported wavelets,” Comm. Pure & Appl. Math. 45, pp. 485-560, 1992.
    [43] 王小川,“語音訊號處理,”全華科技圖書, 2003.

    下載圖示 校內:2008-09-05公開
    校外:2008-09-05公開
    QR CODE