簡易檢索 / 詳目顯示

研究生: 陳璽煌
Chen, Shi-Huang
論文名稱: 應用小波轉換於語音信號處理之研究
A Study on Speech Signal Processing Using Wavelet Transforms
指導教授: 王駿發
Wang, Jhing-Fa
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2002
畢業學年度: 90
語文別: 英文
論文頁數: 138
中文關鍵詞: 小波轉換音高週期求取語音除雜訊處理子母音分割語音段偵測
外文關鍵詞: pitch information extraction, wavelet transform, voice active detection, speech denoising, C/V segmentation
相關次數: 點閱:100下載:8
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 小波轉換及其理論為近幾年來相當熱門的研究主題之一﹐目前小波轉換已被廣泛地應用在信號處理﹑影像和音訊處理﹑通信系統以及應用數學等不同的研究領域。由於小波轉換具有極佳的時域-頻域分析功能以及多重解析的特性﹐因此非常適合運用在具有高時變性的語音信號上。本論文研究主題為小波轉換在語音信號處理上之應用﹐並針對語音信號提出一套以小波轉換為基礎的基本處理架構。在此架構下﹐本論文發展出音高週期求取﹑子母音分割,語音雜訊消除以及語音信號偵測等四項新演算法。此外﹐針對小波轉換濾波器組架構中所衍生的瑕頻干擾﹐本論文同時提出一套瑕頻補償演算法來加以克服。
    首先﹐在音高週期求取部份﹐為加強傳統演算法抗雜訊的能力﹐本論文運用具瑕頻補償的小波轉換以及空間相關函數來改良在雜訊環境中音高週期偵測之準確性﹐實驗結果顯示本論文所提的新演算法無論在乾淨或雜訊環境中均具有最佳的成效。其次﹐在子母音分割方面﹐本論文提出一套以小波轉換搭配積函數的創新演算法﹔與傳統方法比較之下﹐本論文所提的方法不需要進行音高週期求取以及逆搜尋動作﹐同時可提昇子母分割的準確度。另外在語音雜訊消除方面﹐本論文運用人耳聽覺的特性發展出一套接近巴克頻譜的小波分解架構﹐並配合時變性鄰界值演算法進行除雜訊處理﹔本論文所提的方法改良了傳統小波除雜訊系統中常見的語音信號過度抑制的缺點﹐同時可提升除雜訊後語音信號的音質。最後本論文進一步利用在語音雜訊消中所研發的時變性鄰界值演算法於語音信號偵測﹐實驗結果顯示本論文所提的方法在高雜訊環境下仍有極高的語音信號偵測率。

    Wavelet transform and its theory is one of the most exciting developments in the last decade. In fact, the wavelet transform has been developed independently for various fields such as signal processing, image processing, audio processing, communication, and applied mathematics. Due to the wavelet representation has characteristics of the efficient time-frequency localization and the multi-resolution analysis, the wavelet transforms are suitable for processing the non-stationary signals such as speech. Therefore, this thesis focuses on the study of wavelet-based speech signal processing and proposes a framework of speech signal processing using wavelet transform. Based on the proposed framework, this thesis develops four new wavelet-based speech signal processing algorithms including pitch detection, consonant/vowel (C/V) segmentation, speech enhancement, and voice active detection (VAD). Furthermore, in order to cancel out the aliasing distortion arose in the filterbank structure of wavelet transforms, this thesis also proposes an aliasing compensation algorithm to overcome this problem.
    The first part illustrated in this thesis is the wavelet-based pitch detection algorithm. This thesis applies the aliasing compensated wavelet transform and the modified spatial correlation function to improve the robustness of conventional pitch detection algorithms under noisy environments. Experimental results show the proposed pitch detection algorithm has the better performance than those of conventional algorithms no matter under clear or noisy environments.
    The second part of this thesis presents the wavelet-based C/V segmentation algorithm. This novel algorithm can directly detect the C/V segmentation point by the use of the product function and its energy profile. In comparison with conventional C/V segmentation algorithms, the proposed algorithm is no need to use pitch detector as well as backward processing. As a consequence, the accuracy of the proposed C/V segmentation algorithm can be increased substantially from those of conventional approaches.
    In the third part, this thesis proposes a wavelet-based speech enhancement method based on the perceptual wavelet packet decomposition (PWPD) and the time-adapted thresholding (TAT) in order to increase the perceptual speech quality after enhancement processing. With these improved techniques, the over thresholding of speech segments which is usually occurred in conventional speech enhancement schemes can be avoided. In addition, the advantage of this improved method is that it does not require a complicated estimation of the noise level or any knowledge of the SNR. Using both additive and real noises, experimental results demonstrate that the speech enhancement method proposed in this thesis is capable of outperforming conventional noise cancellation schemes.
    Finally, this thesis further applies the TAT algorithm developed in the third part to the application of VAD. This new wavelet-based VAD method also has the advantage that it needs not a complicated estimation of the noise level or any knowledge of the SNR. Experimental results show this new type of VAD method has an accurate detection rate even through the speech signal is seriously contaminated by the background noise.

    中文摘要 i ABSTRACT ii ACKNOWLEDGMENT iv CONTENTS v LIST OF FIGURES viii LIST OF TABLES xiii 1 Introduction 1 1-1 Motivations 1 1-2 Reviews of Wavelet-based Speech Signal Processing 3 1-2-1 Feature extraction of speech 4 1-2-2 Speech coding 5 1-2-3 Speech synthesis 5 1-2-4 Speaker verification and identification 5 1-2-5 Speech recognition 6 1-2-6 Speech enhancement 6 1-3 The Framework of Speech Signal Processing Using Wavelet 7 1-4 Dissertation Focus 8 1-5 Dissertation Outline 9 2 Review of Wavelet Transform 10 2-1 History of Wavelet 10 2-1-1 Advances before 1930 10 2-1-2 Advances in the 1930s 11 2-1-3 Advances during 1930 - 1980 12 2-1-4 Advances in the post-1980 12 2-2 Theory of Wavelet 13 2-2-1 Introduction to wavelet 13 2-2-2 Multi-resolution analysis of wavelet 15 2-2-3 Comparison of wavelet and Fourier transforms 21 2-3 Implementation of Wavelet Transform 23 2-3-1 Implementation of wavelet transforms using filter banks 23 2-3-2 Implementation of wavelet transforms using lifting 30 3 Advanced Developments on Wavelet Transform for Speech Signal Processing 34 3-1 The Aliasing Compensation Algorithm 34 3-2 The Perceptual Wavelet Packet Transform 41 3-2-1 Wavelet packet transform 41 3-2-2 Perceptual wavelet packet decomposition 42 4 Application of Wavelet Transforms for C/V Segmentation on Mandarin Speech Signal 46 4-1 Introduction 46 4-2 Mandarin Speech Production Model 49 4-3 Implementation of the Wavelet-Based C/V Segmentation Algorithm 53 4-4 Experimental Results 57 4-4-1 Performance of the proposed C/V segmentation algorithm 58 4-4-2 Comparison with the conventional C/V segmentation method 63 4-5 Summary 64 5 A Noise-Robust Pitch Detection Method Using Wavelet Transform with Aliasing Compensation 65 5-1 Introduction 65 5-2 The Proposed Pitch Detection Method 68 5-3 Performance Evaluation and Experimental Results 70 5-3-1 Illustration of the method for the synthetic speech data 71 5-3-2 Illustration of the method for the natural speech data 74 5-3-3 Robustness of the method 76 5-4 Summary 80 6 Speech Enhancement Using Perceptual Wavelet Packet Transform 81 6-1 Introduction 81 6-2 Time-Adaptive Thresholding Using Teager Energy Operator 85 6-2-1 Teager energy operator 86 6-2-2 Temporal masking construction 86 6-2-3 Time adaptive threshold computation 87 6-2-4 Soft thresholding 88 6-3 Experimental Results 88 6-3-1 On choosing of wavelet filter 88 6-3-2 Performance of the method for additive Gaussian white noise 90 6-3-3 Performance of the method for real environment noise 93 6-4 Summary 96 7 Voice Activity Detection Using Perceptual Wavelet Packet Transform 97 7-1 Introduction 97 7-2 Implementation of The VAD Algorithm 99 7-2-1 Band selection 99 7-2-2 Masks construction 100 7-2-3 Calculation of voice activity shape (VAS) 100 7-2-4 VAD decision 100 7-3 Experimental Results 101 7-4 Summary 106 8 Conclusions 108 8-1 Contributions of This Paper 108 8-2 Future Research Work 110 REFERENCES 112 VITA 120 PUBLICATION LIST 121

    [1] L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals. Englewood Cliffs, NJ: Prentice-Hall, 1978.
    [2] Louis R. Litwin, Jr., “Speech coding with wavelets,” IEEE Potentials, vol. 17, no. 2, pp. 38-41, April-May 1998.
    [3] Andrew Bruce, David Donoho, and Hong-Ye Gao, “Wavelet analysis,” IEEE Spectrum, pp. 26-35, Oct. 1996.
    [4] I. Daubechies, Ten Lectures on Wavelets, CBMS, SIAM publ., 1992.
    [5] Stephane Mallat, A Wavelet Tour of Signal Processing, Academic Press, 1998.
    [6] S. Kadambe, G.F. Boudreaux-Bartels, “Application of the wavelet transform for pitch detection of speech signals,” IEEE Transactions on Information Theory, vol. 38, Issue: 2, Part: 2, pp. 917-924, March 1992.
    [7] G. Evangelista, “Pitch-synchronous wavelet representations of speech and music signals,” IEEE Transactions on Signal Processing, vol. 41, no. 12, pp. 3313-3330, Dec. 1993.
    [8] J. Stegmann, G. Schroder, “Robust voice-activity detection based on the wavelet transform,” 1997 IEEE Workshop on Speech Coding For Telecommunications Proceeding, pp. 99-100, 1997.
    [9] Sungwook Chang, Y. Kwon, Sung-I Yang, “Speech feature extracted from adaptive wavelet for speech recognition,” Electronics Letters, vol. 34. no. 23, pp. 2211-2213, Nov. 1998.
    [10] R. Sarikaya, J.H.L. Hansen, “High resolution speech feature parametrization for monophone-based stressed speech recognition,” IEEE Signal Processing Letters, vol. 7, no. 7, pp. 182-185, July 2000.
    [11] L. Couvreur, C. Couvreur, “Wavelet-based method for nonparametric estimation of HMMs,” IEEE Signal Processing Letters, vol. 7, no. 2, pp. 25-27, Feb. 2000.
    [12] Shi-Huang H. Chen, Jhing-Fa Wang, “Application of wavelet transforms for C/V segmentation on Mandarin speech signals,” IEE Proceedings-Vision, Image and Signal Processing, vol. 148, no. 2, pp. 133 -139, April 2001.
    [13] Shi-Huang Chen, Jhing-Fa Wang, “A noise-robust pitch detection method using wavelet transform with aliasing compensation,” Accepted to appear in IEE Proc. - Vision, Image and Signal Processing.
    [14] G. Evangelista, “The coding gain of multiplexed wavelet transforms,” IEEE Transactions on Signal Processing, vol. 44, no. 7, pp. 1681-1692, , July 1996.
    [15] Wu Xiaodong, Li Yongming, Chen Hongyi, “Multi-domain speech compression based on wavelet packet transform,” Electronics Letters, vol. 34, no. 2, pp. 154-155, 22 Jan. 1998.
    [16] B. Carnero, A. Drygajlo, “Perceptual speech coding and enhancement using frame-synchronized fast wavelet packet transform algorithms,” IEEE Transactions on Signal Processing, vol. 47, no. 6, pp. 1622-1635, June 1999.
    [17] F.C.A. Brooks, L. Hanzo, “A multiband excited waveform-interpolated 2.35-kbps speech codec for bandlimited channels,” IEEE Transactions on Vehicular Technology, vol. 49, no. 3, pp. 766-777, May 2000.
    [18] M. Kobayashi, M. Sakamoto, T. Saito, Y. Hashimoto, M. Nishimura, K. Suzuki, “Wavelet analysis used in text-to-speech synthesis,” IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 45, no. 8, pp.1125-1129, Aug. 1998.
    [19] Hojung Nam, Hyoung-Soo Kim, Y. Kwon, Sung-I Yang, “Speaker verification system using hybrid model with pitch detection by wavelets,” 1998. Proceedings of the IEEE-SP International Symposium on Time-Frequency and Time-Scale Analysis, pp. 153-156, 1998.
    [20] F. Phan, E. Micheli-Tzanakou, S. Sideman, “Speaker identification using neural networks and wavelets,” IEEE Engineering in Medicine and Biology Magazine, vol. 19, no. 1, pp. 92 -101, Jan.-Feb. 2000.
    [21] H. M. Torres, H. L. Rufiner, “Automatic speaker identification by means of Mel cepstrum, wavelets and wavelet packets,” Proceedings of the 22nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, vol. 2, pp. 978 -981, 2000.
    [22] Jyh-Shing Shyuu, Jhing-Fa Wang, Chung-Hsien Wu, “A channel-weighting method for speech recognition using wavelet decompositions,” 1994 IEEE Asia-Pacific Conference on Circuits and Systems, pp. 519-523, 1994.
    [23] J. N. Gowdy, Z. Tufekci, “Mel-scaled discrete wavelet coefficients for speech recognition,” Proceedings. 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3, pp. 1351 -1354, 2000.
    [24] J. R. Karam, W. J. Phillips, W. Robertson, M. M. Artimy, “New wavelet packet model for automatic speech recognition system,” 2001. Canadian Conference on Electrical and Computer Engineering, vol. 1, pp. 511-514, 2001.
    [25] O. Farooq, S. Datta, “Mel filter-like admissible wavelet packet structure for speech recognition,” IEEE Signal Processing Letters, vol. 8, no. 7, pp. 196-198, July 2001.
    [26] D. L. Donoho, “De-noising by soft-thresholding,” IEEE Trans. Inform. Theory, vol. 41, pp. 613-627, 1995.
    [27] D. L. Donoho, I. M. Johnstone, “Ideal spatial adaptation by wavelet shrinkage,” Biometrika, vol. 81, pp. 425-455, 1994.
    [28] J. W. Seok, K. S. Bae, “Speech enhancement with reduction of noise components in the wavelet domain,” 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 1323-1326, 1997.
    [29] T. Gulzow, A. Engelsberg, U. Heute, “Comparison of a discrete wavelet transformation and nonuniform polyphase filterbank applied to spectral-subtraction speech enhancement,” Signal Process, vol. 64, pp. 5-19, 1998.
    [30] M. Bahoura, J. Rouat,” Wavelet speech enhancement based on the Teager energy operator,” IEEE Signal Processing Letters, vol. 8, no. 1, pp. 10 -12, Jan. 2001.
    [31] C. Sidney Burrus, Ramesh A. Gopinath and Haitao Guo, Introduction to Wavelets and Wavelet Transforms. Upper Saddle River, NJ: Prentice-Hall, 1998.
    [32] Gilbert Strang and Truong Nguyen, Wavelets and Filter Banks. Wellesley, MA: Wellesley-Cambridge Press, 1996.
    [33] Raghuveer M. Rao and Ajit S. Bopardikar, Wavelet Transforms: Introduction to Theory and Applications, Addison-Wesley, 1998.
    [34] Alfred Haar, “Zur theorie der orthogonalen funktionensysteme,” [in German] Mathematische Annalen, vol. 69, pp. 331-371, 1910.
    [35] P. Franklin, “A set of continuous orthogonal functions,” Mathematische Annalen, vol. 100, pp. 522-529, 1928.
    [36] J. Littlewood and R. Paley, “Theorems on Fourier series and power series,” Proc. London Math. Soc., vol. 42, pp. 52-89, 1937.
    [37] A. Calderon, “Intermediate spaces and interpolation, the complex method,” Studia Math., vol. 24, pp. 113-190, 1964.
    [38] I. Daubechies, “Orthonormal bases of compactly supported wavelets,” Comm. in Pure and Applied Math., vol. 41, No. 7, pp. 909-996, 1988.
    [39] S. Mallat, “A theory for multiresolution signal decomposition: the wavelet representation,” IEEE Trans. on Pattern Analysis and Machine Intell, vol. 11, No. 7, pp. 674-693, July 1989.
    [40] Amara Graps, “An introduction to wavelets,” IEEE Computational Science and Engineering, pp. 50-61, Summer 1995.
    [41] D. Gabor, “Theory of Communication,” J. of the IEE, vol. 93, pp. 429-457, 1946.
    [42] R. Bracewell, The Fourier Transform and its Applications, second ed., New York: McGraw-Hill, 1986.
    [43] P. P. Vaidyanathan, Multirate Systems and Filter Banks, Englewood Cliffs, NJ: Prentice-Hall, 1992.
    [44] W. Sweldens, “The lifting scheme: A new philosophy in biorthogonal wavelet constructions,” in Wavelet Application in Signal and Image Processing III, A. F. Laine and M. Unser, Eds. New York: SPIE, vol. 2569, pp. 68-79, 1995.
    [45] I. Daubechies and W. Sweldens, “Factoring wavelet transforms into lifting step,” Bell Labs., Lucent Technologies, Murray Hill, NJ, Tech. Rep., 1996.
    [46] Alan V. Oppenheim and Ronald W. Schafer, Discrete-Time Signal Processing, Englewood Cliffs, NJ: Prentice-Hall, 1989.
    [47] I. Pinter, “Perceptual wavelet-representation of speech signals and its application to speech enhancement,” Computer Speech and Language, 10(1), pp. 1–22, 1996.
    [48] O. Ghitza, “Auditory model and human performance in tasks related to speech coding and speech recognition,” IEEE Trans. Speech and Audio Processing, vol. 2, pp. 115-132, 1994.
    [49] Lawrence Rabiner and Biing-Hwang Juang, Fundamentals of Speech Recognition. Englewood Cliffs, NJ: Prentice-Hall, 1993.
    [50] E. Zwicker, E. Terhardt, “Analytical expressions for critical-band rate and critical bandwidth as a function of frequency,”. JASA, vol. 68, pp. 1523-1525, 1980.
    [51] B. Yegnanarayana, R. N. J. Veldhuis, ”Extraction of vocal-tract system characteristics from speech signals,” IEEE Trans. on Speech and Audio Processing, vol. 6, No. 4, pp. 313-327, July 1998.
    [52] Jiang Minghu, Yuan Baozong and Lin Biqin, ”The consonant/vowel (C/V) speech classification using high-rank function neural network (HRFNN),” 3rd International Conference on Signal Processing, pp. 1469 - 1472, 1996.
    [53] J. F. Wang, C. H. Wu, S. H. Chang, and J. Y. Lee, ”A hierarchical neural network model based on a C/V segmentation algorithm for isolated mandarin speech recognition,” IEEE Trans. on Signal Processing, vol. 39, No. 9, pp. 2141-2146, Sept. 1991.
    [54] Jhing-Fa Wang and Shi-Huang Chen, “A C/V Segmentation Algorithm for Mandarin Speech Signal Based on Wavelet Transforms,” ICASSP 99, Vol. 1, pp. 417-420, March 1999.
    [55] R. J. Mcaulay, T. F. Quatieri, Speech Coding and Synthesis, Elsevier, Amsterdam, 1995.
    [56] Jhing-Fa Wang, Shi-Huang Chen and Jyh-Shing Shyuu, “Wavelet Transforms for Speech Signal Processing,” Journal of The Chinese Institute of Engineers, Vol. 22, No. 5, pp.549-560, Sept. 1999.
    [57] Shi-Huang Chen and Jhing-Fa Wang, “A Pyramid-Structured Wavelet Algorithm for Detecting Pitch Period of Speech Signal,” 1998 International Computer Symposium (ICS), pp. 50-56, Dec. 1998.
    [58] M. Oshikiri and M. Akamine, ”A 2.4 kbps variable bit rate ADP-CELP speech coder,” 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 517-520, 1998.
    [59] C. Hamon, E. Moulines, and F. Charpentier, “A diphone synthesis system based on time domain prosodic modifications of speech,” 1989 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 238-241, 1989.
    [60] S. Ahmadi and A. S. Spanias, “Cepstrum-based pitch detection using a new statistical V/UV classification algorithm,” IEEE Trans. on Speech and Audio Processing, vol. 7, no. 3, pp. 333-338, May 1999.
    [61] S. G. Mallat and S. Zhong, “Characterization of signals from multiscale edges,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.14, no. 7, pp. 710-732, July 1992.
    [62] C. Wendt and A. P. Petropulu, “Pitch determination and speech segmentation using the discrete wavelet transform,” 1996 IEEE International Symposium on Circuits and Systems (ISCAS), vol. 2 , pp. 45-48, 1996.
    [63] Y. Xu, John B. Weaver, JR. Dennis M. Healy, and Jian Lu, ”Wavelet transform domain filters: a spatially selective noise filtration technique’. IEEE Trans. Image Processing, vol. 3, pp. 747-758, Nov. 1994.
    [64] B. H. Juang, “Recent developments in speech recognition under adverse conditions,” 1990 International Conference on Spoken Language Process, pp. 1113-1116, 1990.
    [65] J. H. Chen, A. Gersho, “Adaptive postfiltering for quality enhancement of coded speech,” IEEE Trans. Speech and Audio Processing 3, pp. 57-71, 1995.
    [66] R. Le Bouquin, ”Enhancement of noisy speech signals: Application to mobile radio communications,” Speech Communication 18 (1), pp. 3-19, 1996.
    [67] Y. Ephraim, D. Malah, “Speech enhancement using a minimum mean square error short time spectral amplitude estimator,” IEEE Trans. Acoust. Speech Signal Processing ASSP-32, pp. 1109-1121, 1984.
    [68] J. Meyer, K. U. Simmer, “Multi-channel speech enhancement in a car environment using Wiener filtering and spectral subtraction,” IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 2, pp. 1167 -1170, 1997.
    [69] B. Yegnanarayana, C. Avendano, H. Hermansky, and P. Satyanarayana Murthy, “Speech enhancement using linear prediction residual,” Speech Communication 28, pp. 25-42, 1999.
    [70] S. G. Chang, B. Yu, and M. Vetterli, “Adaptive wavelet thresholding for image denoising and compression,” IEEE Trans. Image Processing, vol. 9, pp. 1532-1546, 2000.
    [71] D. L. Donoho, “De-noising by soft-thresholding,” IEEE Trans. Inform. Theory 41, pp. 613-627, 1995.
    [72] D. L. Donoho, I. M. Johnstone, “Ideal spatial adaptation by wavelet shrinkage,” Biometrika. 81, pp. 425-455, 1994.
    [73] D. L. Donoho, “Unconditional bases are optimal bases for data compression and statistical estimation,” Applied and Computational Harmonic Analysis 1, pp. 100-115, 1994.
    [74] P. Srinivasan and L. H. Jamieson, “High quality audio compression using an adaptive wavelet decomposition and psychoacoustic modeling,” IEEE Trans. Signal Processing, 46(4), pp. 1085 –1093, 1998.
    [75] J. F. Kaiser, “On a simple algorithm to calculate the ‘energy’ of a signal,” IEEE Int. Conf. Acoustics, Speech, and Signal Processing, pp. 381-384, 1990.
    [76] J. F. Kaiser, “Some useful properties of Teager’s energy operator,” IEEE Int. Conf. Acoustics, Speech, and Signal Processing, pp. 149-152, 1993.
    [77] F. Jabloun, A. E. Cetin, and E. Erzin, ”Teager energy based feature parameters for speech recognition in car noise,” IEEE Signal Processing Letter, vol. 6, pp. 259-261, 1999.
    [78] G. Zhou, J. H. L. Hansen, and J. F. Kaiser, “Nonlinear feature based classification of speech under stress,” IEEE Trans. Speech and Audio Processing, vol. 9, pp. 201-216, 2001.
    [79] I. M. Johnstone and B. W. Silverman, “Wavelet threshold estimators for data with correlated noise,” J. Roy. Statist. Soc. B, vol. 59, pp. 319–351, 1997.
    [80] See http://www.icp.inpg.fr/ELRA/aurora2.html.
    [81] D. K. Freeman, G. Cosier, C. B. Southcott, and I. Boyd, “The voice activity detector for the pan European digital cellular mobile telephone service,” ICASSP’89, pp. 369-372, May 1989.
    [82] A. M. Kondoz, Digital Speech Coding for Low Bit Rate Communications Systems, John Wiley & Sons Ltd. 1994.
    [83] L. R. Rabiner and M. R. Sambur, “Voiced-unvoiced-silence detection using the Itakura LPC distance measure,” ICASSP’77, pp. 323-326, May 1977.
    [84] J. C. Junqua, B. Reaves, and B. Mak, “A study of endpoint detection algorithms in adverse conditions: Incidence on a DTW and HMM recognize,” Eurospeech’91, pp. 1371-1374, 1991.
    [85] J. A. Haigh and J. S. Mason, “Robust voice activity detection using cepstral features,” IEEE TENCON, pp. 321-324, 1993.
    [86] ITU-T Rec. G.729, Annex B, A silence compression scheme for G.729 optimized for terminals conforming to ITU-T V.70.
    [87] Jhing-Fa Wang, Shi-Huang Chen and Jeng-Jan Lee, “Speech Signal Denoising Based on Multi-Type Wavelet Transforms,” 2000 Asia Pacific Conference on Multimedia Technology and Applications, Kaohsiung, Taiwan, pp. 287-291, Dec. 2000.

    下載圖示 校內:2003-06-18公開
    校外:2003-06-18公開
    QR CODE