簡易檢索 / 詳目顯示

研究生: 楊宗憲
Yang, Chung-Hsien
論文名稱: 小波訊號處理相關之演算法與VLSI硬體架構設計
Algorithms and VLSI Architecture Design of Wavelet Related Signal Processing
指導教授: 王駿發
Wang, Jhing-Fa
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2008
畢業學年度: 96
語文別: 英文
論文頁數: 95
中文關鍵詞: 訊號子空間小波聽覺模型語音強化單晶片系統設計
外文關鍵詞: psycho-acoustic model, signal subspace, wavelet, speech enhancement, VLSI
相關次數: 點閱:111下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文研究主題為結合小波轉換與訊號子空間分析在語音信號強化之應用,內容涵蓋軟體演算法的提出與硬體架構的設計。
    在軟體的演算法方面,我們展示三個基於訊號子空間分析的語音強化技術。這三個方法分別為:1. 使用小波封包轉換於訊號子空間語音強化、2. 基於聽覺分頻與訊號子空間追蹤法的語音強化 、和3. 利用噪訊比及多頻帶的人耳聽覺遮蔽效應分析的語音強化。訊號子空間的分解是極為耗時的運算,因此,在第一個方法中,我們利用小波封包來做近似的Karhunen-Loeve轉換,亦即快速的本徵值分解來減低運算量,並將其應用在訊號子空間的語音強化。在第二個方法中,我們結合小波聽覺分頻分析與訊號子空間分解來實現語音強化。此一聽覺分頻係利用小波封包轉換來逼近人耳的聽覺頻帶,而此方法中的訊號子空間分解則是利用訊號子空間追蹤法來完成。經由TAICAR資料庫實驗驗證,我們研究中所提出的方法比起傳統訊號子空間強化法,更適用於車內噪音消除,低頻噪音能更有效抑制。在第三個方法中,我們基於第二個演算法的多頻設計,著重在每個次頻帶的增益調整。次頻帶的增益調整,決定噪音消除的效能,我們設計的增益調整考量到了噪訊比及人耳聽覺遮蔽效應。在實驗中,此一方法的效能表現,比起傳統訊號子空間強化法、頻譜相減法更為優越。
    在硬體架構的設計方面,我們展示了一個二維小波轉換的硬體架構設計和訊號子空間語音強化法的單晶片系統架構設計。二維小波轉換的硬體架構設計可分為:基於列向和基於區塊兩類。基於列向的硬體設計有著低複雜度的優點,但在二維訊號的應用中,有著下述缺點:大量資料緩衝的記憶體需求和資料輸出有著較長的潛伏期。在本論文中,我們提出了一個基於區塊的二維小波轉換硬體設計,支援JPEG2000標準且跟基於列向的硬體設計比較起來,有著低緩衝、短潛伏期的優勢及近乎100%的硬體使用率。此一系統已實現於ALTERA EPXA10發展版上,時脈運作為44.33MHz。另外,在語音訊號即時處理的應用上,我們提出了一個單晶片系統架構,其內含管線化訊號子空間追蹤法的VLSI硬體設計。此一系統也實現於ALTERA EPXA10發展版上,時脈運作為9.7MHz,在效能分析上能符合即時系統的運作。

    This dissertation presents a research on wavelet transform and signal subspace speech enhancement from algorithms to hardware implementations.
    In the design of algorithms, we present three subspace-based approaches for speech enhancement. These approaches are signal subspace speech enhancement using wavelet packet expansion (SSUW), speech enhancement using critical band and subspace tracking (CBST), and SNR and auditory masking aware technique for multiband speech enhancement (SAMA). The decomposition of signal subspace is a time-consuming processing. In the algorithm of SSUW, we utilize wavelet packet to perform approximate Karhunen-Loeve transform, i.e., fast eigendecomposition for signal subspace speech enhancement to overcome this problem. In the algorithm of CBST, we incorporate a perceptual wavelet filterbank that is derived from psycho-acoustic model with signal subspace processing. The projection approximation subspace tracking deflation (PASTd) algorithm is used to track the signal subspace. The experimental results which were obtained by testing TAICAR databases show that this approach is better than conventional subspace methods. The low frequency noises in car noisy environments are suppressed efficiently after applying the perceptual filterbank and subspace processing. In the algorithm of SAMA, we focus our design on the gain adaptation of each critical band. The gain adaptation plays a crucial role in the critical band signal estimation. An attenuation factor based on auditory masking and prior SNR of each critical band is presented to adjust the estimator’s gain. According to the experimental evaluation, our method achieved enhancement performances better than conventional subspace and spectral subtraction methods.
    In the design of VLSI architecture, a hardware design of 2D discrete wavelet transform (DWT) and system-on-a-programmable-chip (SoPC) architecture of subspace based speech enhancement are presented respectively. 2D DWT architectures can be classified into line-based and block-based architectures. Line-based architectures are simple with low complexity. They are efficient for 1D applications. In case of 2D transforms, they suffer from two main problems: memory requirements and latency. These problems are inherent in line-based architectures. In this dissertation, a novel block-based architecture for computing the lifting-based 2-D DWT coefficients is presented. This architecture makes the significant reduction of buffer size and speeds up the calculation of 2D wavelet coefficients as compared with those line-based fashion architectures. In addition, the proposed architecture supports the JPEG2000 default filters and the hardware utilization is nearly 100%. As compared with line-based architectures, the latency is reduced from N^2 down to 3N. The architecture has been realized in ARM-based ALTERA EPXA10 Development Board with frequency at 44.33MHz. For real-time speech enhancement, a SoPC architecture and VLSI design of the PASTd algorithm are proposed. To realize pipeline computation, we present a pipelined PASTd algorithm without data-dependent hazards. The maximum clock rate is 9.7 MHz and the typical clock rate which achieves real time requirement is 4.6 MHz. The corresponding architecture was also experimentally verified via an ALTERA EPXA10 development board.

    Abstract (Chinese) I Abstract (English) III Acknowledgement VI Contents VII Table Captions IX Figure Captions X Chapter 1 Introduction 1 1.1 Background and Related Work 1 1.1.1 Signal Subspace Speech enhancement 1 1.1.2 VLSI design for 2D Wavelet Transform 4 1.2 Contribution of the Dissertation 5 1.3 Organization of the Dissertation 6 Chapter 2 Signal Subspace Speech Enhancement Using Wavelet Packet Expansion (SSUW) 9 2.1 The Approximate KLT 9 2.2 Wavelet Packet 11 2.3 The Cost Function for Search Algorithm 13 2.4 Finding the Approximate KL-basis 14 2.5 The Linear Estimator 15 2.6 Performance Evaluation of SSUW 16 Chapter 3 Speech Enhancement Using Critical Band and Subspace Tracking (CBST) 21 3.1 Overview 21 3.2 Perceptual Filterbank 23 3.3 Subspace Tracking Algorithm 28 3.4 Auditory Gain Adaptation 30 3.5 Performance Evaluation of CBST 32 3.5.1 TAICAR Database 32 3.5.2 Experimental Results 36 Chapter 4 SNR and Auditory Masking Aware Technique for Multiband Speech Enhancement (SAMA) 42 4.1 Overiew 42 4.2 Prior SNR Aware Gain Estimation in Perceptual Filterbank 43 4.3 Incorporation of the Human Auditory Masking Properties 44 4.4 Performance Evaluation of SAMA 47 4.5 Combining SAMA with SVMs for Speaker Identification and Verification 52 Chapter 5 Hardware Design for Wavelet Transform and Subspace Tracking 56 5.1 Block-based Architecture for Lifting Scheme Wavelet Transform 56 5.1.1 Lifting Scheme Wavelet Transform 56 5.1.2 Precision Analysis 60 5.1.3 Block-based VLSI Architecture 64 5.1.3.1 Proposed Data Flow Diagram 64 5.1.3.2 Proposed Architectures 67 5.1.3.3 Block Controller Modules 68 5.1.3.4 Processing Elements (PE) Modules 68 5.1.3.5 Memory Modules 70 5.1.3.6 Scheduling 71 5.1.4 FPGA Implementation 73 5.2 Hardware/Software Codesign for Critical Band and Subspace Tracking 76 5.2.1 SoPC Architecture 76 5.2.2 Hardware Design of Subspace Tracking Algorithm 78 5.2.2.1 Pipelined PASTd Algorithm 78 5.2.2.2 VLSI Architecture for Pipelined PASTd Algorithm 80 5.2.2.3 Convergence of the Pipelined PASTd Algorithm 82 5.2.2.4 Real-Time Issues 83 5.2.2.5 Chip Features 84 5.2.3 Software Procedure 86 Chapter 6 Conclusions 87 References 89 Author's Biographical Notes 93 Publications 94

    [1] S. Nordholm, I. Claesson, and B. Bengtsson, “Adaptive array noise suppression of handsfree speaker input in cars,” IEEE Transactions on Vehicular Technology, vol. 42, no. 4, pp. 514-518, Nov. 1993.
    [2] M. M. Goulding and J. S. Bird, “Speech enhancement for mobile telephony,” IEEE Transactions on Vehicular Technology, vol. 39, no. 4, pp. 316-326, Nov. 1990.
    [3] Y. Ephraim and H. L. Van Trees, “A signal subspace approach for speech enhancement,” IEEE Transactions on Speech and Audio Processing, vol. 3, no. 4, pp. 251-266, July 1995.
    [4] M. Dendrinos, S. Bakamidis, and G. Carayannis, “Speech enhancement from noise: a regenerative approach,” Speech Communication, vol. 10, no. 1, pp. 45-57, Feb. 1991.
    [5] S. H. Jensen, P. C. Hansen, S. D. Hansen, and J. A. Sorensen, “Reduction of broad-band noise in speech by truncated qsvd,” IEEE Transactions on Speech and Audio Processing, vol. 3, no. 6, pp. 439-448, Nov. 1995.
    [6] A. Rezayee and S. Gazor, “An adaptive KLT approach for speech enhancement,” IEEE Transactions on Speech and Audio Processing, vol. 9, no.2, pp. 87-95, Feb. 2001.
    [7] J. F. Wang, J. C. Wang, H. C. Chen, T. L. Chen, C. C. Chang, and M. C. Shih, “Chip design of portable speech memopad suitable for persons with visual disabilities,” IEEE Transactions on Speech and Audio Processing, vol. 10, no. 8, pp. 644-658, Nov. 2002.
    [8] J. C. Wang, J. F. Wang, and Y. S. Weng, “Chip design of MFCC extraction for speech recognition,” Integration, the VLSI Journal, vol. 32, no.1-3, pp.111-131, Nov. 2002.
    [9] I. Karasalo, “Estimating the covariance matrix by signal subspace averaging,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34, no. 1, pp. 8-12, Feb. 1986.
    [10] R. DeGroat, “Noniterative subspace tracking,” IEEE Transactions on Signal Processing, vol. 40, no. 3, pp. 571-577, Mar. 1992.
    [11] B. Yang, “Projection approximation subspace tracking,” IEEE Transactions on Signal Processing, vol. 43, no. 1, pp. 95-107, Jan. 1995.
    [12] P. Strobach, “Bi-iteration SVD subspace tracking algorithms,” IEEE Transactions on Signal Processing, vol. 45, no. 5, pp. 1222-1240, May 1997.
    [13] F. Xu and A. N. Willson, Jr., “Novel systolic architectures for signal subspace tracking,” in Proc. of 43rd IEEE Midwest Symposium on Circuits and Systems (MWSCAS2000), Aug. 2000, vol. 2, pp. 880-833.
    [14] W. Sweldens, “The lifting scheme: a new philosophy in biorthogonal wavelet constructions,” in Proc. SPIE:Wavelet Applications in Signal and Image Processing III, vol. 2569, pp. 68-79, 1995.
    [15] I. Daubechies and W. Sweldens, “Factoring wavelet transforms into lifting schemes,” The Journal of Fourier Analysis and Applications, vol. 4, pp. 247-269, 1998.
    [16] K. Andra, C. Chakrabarti, and T. Acharya, “A VLSI architecture for lifting based wavelet transform,” in Proc. IEEE Workshop Signal Process. Syst. , Oct. 2000, pp.70-79.
    [17] T. Acharya, K. Andra, and C. Chakrabarti, ”A VLSI architecture for lifting-based forward and inverse wavelet transform,” IEEE Trans. on Signal Processing, vol. 50, pp. 966-977, Apr. 2002.
    [18] B. F. Cockburn, H. Liao, and M. K. Mandal, ”Novel architectures for the lifting-based discrete wavelet transform,” in Proc. IEEE Conf. on Electrical and Computer Engineering, 2002, vol. 2, pp. 1020-025.
    [19] C. C. Liu, Y. H. Shiau, and J. M. Jou, “Design and implementation of a progressive image coding chip based on the lifted wavelet transform,” in Proc. The 11th VLSI Design/CAD Symposium, August 2000, pp. 49-52.
    [20] C. Lian, et al., “Lifting based discrete wavelet transform architecture for JPEG2000,” in Proc. IEEE International Symposium on Circuits and Systems, 2001, vol. 2, pp. 445-448.
    [21] U. Mittal and N. Phamdo, "Signal/noise KLT based approach for enhancing speech degraded by colored noise," IEEE Trans. Speech Audio Processing, vol. 8, pp. 159-167, Mar. 2000.
    [22] M. V. Wickerhauser. Adapted Wavelet Analysis from Theory to Software. A K Peters Press, Wellesley, MA, 1994.
    [23] J. Zhang and G. Walter, "A KL-like expansion for wide sense stationary random processes," IEEE Trans. Signal Processing, vol. 42, pp. 1737-1745, Jul. 1994.
    [24] G. G. Walter and J. Zhang, "Orthonormal wavelets with simple closed-form expressions," IEEE Trans. Signal Processing, vol. 46, pp. 2248-2251, Aug. 1998.
    [25] S. Mallat. A Wavelet Tour of Signal Processing. Academic Press, Chestnut Hill, MA, 1998.
    [26] H. Krim, D. Tucker, S. Mallat, D. Donoho, "On denoising and best signal representation," IEEE Trans. Information Theory, vol. 45, Nov. 1999.
    [27] R. R. Coifman and M. V. Wickerhauser, "Entropy-based algorithm for best basis selection," IEEE Trans. Information Theory, vol. 38, Mar. 1992.
    [28] O. Ghitza, “Auditory model and human performance in tasks related to speech coding and speech recognition,” IEEE Transactions on Speech and Audio Processing, vol. 2, no. 1, pp. 115-132, Jan. 1994.
    [29] L. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Englewood Cliffs, NJ: Prentice-Hall, 1993.
    [30] V. Solo and X. Kong, “Performance analysis of adaptive eigenanalysis algorithms,” IEEE Transactions on Signal Processing, vol. 46, no. 3, pp. 636-646, Mar. 1998.
    [31] T. Gustafson, “Instrumental variable subspace tracking using projection approximation,” IEEE Transactions on Signal Processing, vol. 46, no. 3, pp. 669-681, Mar. 1998.
    [32] B. Yang, “Projection approximation subspace tracking,” IEEE Transactions on Signal Processing, vol. 43, no. 1, pp. 95-107, Jan. 1995.
    [33] D. W. Robinson and R. S. Dadson, “A re-determination of the equal loudness relations for pure tones,” British Journal of Applied Physics, vol. 7, pp. 166-181, May 1956.
    [34] Y. Hu and P. C. Loizou, “A perceptually motivated approach for speech enhancement,” IEEE Transactions on Speech and Audio Processing, vol. 11, no 5, pp. 457-465, Sept. 2003.
    [35] J. C. Wang, C. H. Yang, J. F. Wang, and H. P. Lee, “Robust speaker identification and verification,” IEEE Computational Intelligence Magazine, pp. 52-59, May 2007.
    [36] M. R. Schroeder, B. S. Atal, and J. L. Hall, “Optimizing digital speech coders by exploiting masking properties of the human ear,” J. Acoust. Soc. Amer., vol. 66, pp. 1647-1652, Dec. 1979.
    [37] B. Scharf, Foundations of Modern Auditory Theory, New York: Academic, 1970.
    [38] R. P. Hellman, “Asymmetric of masking between noise and tone,” Perception and Psychophysics, vol. 11, pp. 241-246, 1972.
    [39] J. D. Johnston, “Transform coding of audio signal using perceptual noise criteria,” IEEE Journal on Selected Areas in Communications, vol. 6, pp. 314-323, Feb. 1988.
    [40] http://htk.eng.cam.ac.uk, the HTK home page.
    [41] http://www.icp.inpg.fr/ELRA/home.html, the ELRA home page.
    [42] F. Itakura, “Minimum prediction residual principle applied to speech recognition,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-23, no. 1, pp. 67-72, Feb. 1975.
    [43] H. C. Wang, C. H. Yang, J. F. Wang, C. H. Wu, and J. T. Chien “TAICAR-the collection and annotation of an in-car speech database created in Taiwan,” International Journal of Computational Linguistics and Chinese Language Processing, vol. 10, no. 2, pp. 237-250, June 2005.
    [44] M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1979, pp. 208-211.
    [45] S. Wang, A. Sekey, and A. Gersho, “An objective measure for predicting subjective quality of speech coders,” IEEE Journal on Selected Areas in Communications, vol. 10, no. 5, pp. 819-829, June 1992.
    [46] R. Hoyle and D. Falconer, “A comparison of digital speech coding methods for mobile radio systems,” IEEE Journal on Selected Areas in Communications, vol. 5, no. 5, pp. 915-920, Jun 1987.
    [47] W. Yang, M. Benbouchta, and R. Yantorno, “Performance of the modified bark spectral distortion as an objective speech quality measure,” in Proc. International Conference on Acoustics, Speech, and Signal Processing, 1998, pp. 541-544.
    [48] V. Vapnik, Statistical Learning Theory, New York: Wiley, 1998.
    [49] R. Courant and D. Hilbert, Methods of Mathematical Physics, Interscience Publishers, 1953.
    [50] U. Kressel, “Pairwise classification and support vector machines”, in Advances in Kernel Methods - Support Vector Learning, (Eds) B. Scholkopf, C. Burges, and A. J. Smola, MIT Press, Cambridge, Massachusetts, chapter 15, 1999.
    [51] H. Jiang and L. Deng, “A Bayesian approach to the verification problem: applications to speaker verification,” IEEE Transactions on Speech and Audio Processing, vol. 9, no. 8, pp.874–884, Nov. 2001.
    [52] J. M. Shapiro, “Embedded imaging coding using zerotrees of wavelet coefficients,” IEEE Trans. Signal Processing, vol. 41, pp. 3445-3462, Dec. 1993.
    [53] K. Andra, C. Chakrabarti, and T. Acharya, “A VLSI architecture for lifting based wavelet transform,” in Proc. IEEE Workshop Signal Process. Syst. , Oct. 2000, pp.70-79.
    [54] Altera Corporation, Altera Device Package Information Data Sheet, http://www.altera.com/literature/lit-index.html.
    [55] ARM922T Technical Reference Manual, Document Part NO. ARM DDI 0184A, Sep. 2000, [Online]. Available: http://www.arm.com.
    [56] F. Xu and A. N. Willson, Jr., “Novel systolic architectures for signal subspace tracking,” in Proc. of 43rd IEEE Midwest Symposium on Circuits and Systems (MWSCAS2000), Aug. 2000, vol. 2, pp. 880-833.

    下載圖示 校內:2009-01-02公開
    校外:2010-01-02公開
    QR CODE