| 研究生: | 毛成一 Mao, Cheng-yi | 
|---|---|
| 論文名稱: | 利用自調節門檻值與追蹤能量封包動態於複合式語音活動偵測演算法 Using a Self-Regulatory Threshold and a Tracking Power Envelope Dynamics for an Integrated Voice Activity Detection Algorithm | 
| 指導教授: | 王振興 Wang, Jeen-shing | 
| 學位類別: | 碩士 Master | 
| 系所名稱: | 電機資訊學院 - 電機工程學系 Department of Electrical Engineering | 
| 論文出版年: | 2008 | 
| 畢業學年度: | 96 | 
| 語文別: | 英文 | 
| 論文頁數: | 56 | 
| 中文關鍵詞: | 自調節門檻值 、語音活動偵測 | 
| 外文關鍵詞: | voice activity detection, self-regulatory threshold | 
| 相關次數: | 點閱:53 下載:3 | 
| 分享至: | 
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 | 
本論文描述一複合式語音活動偵測演算法,其結合自調節門檻值及動態能量封包追蹤以達到分類效果。本文針對傳統語音活動偵測演算法中,需預先決定門檻值的方法作改良,並利用雜訊頻譜自動計算於門檻值估測中所需之關鍵參數,改良過去演算法設定參數之方法。同時為了符合實際應用中避免語音資料遺失之需求,自調節門檻值結合動態能量封包追蹤演算法,以期提高對雜訊偵測的準確性來達到降低語音偵測錯誤率的發生。由於本文中此二演算法之計算參數皆以頻譜能量為基礎並且為非因果性的計算特性,因此此二演算法可經由適當的結合而產生複合架構。最後,考慮過去的偵測值與兩種演算法得出相反結果時對特定方法的信任程度,乘上不同權重值後得到最終輸出結果。本演算法之效果經由AURORA語料庫對於偵測準確度的驗證,得到可接受之語音偵測錯誤率的情況下,同時達到較高的非語音偵測正確率,並接近於即時處理系統。
This paper presents an integrated voice activity detection (VAD) algorithm that is composed of an adaptive VAD with a self-regulatory threshold setting mechanism and an energy-based VAD with tracking power envelope dynamics. We develop an auto-modulated scheme of a crucial parameter calculation by a noise spectrum estimation to ameliorate the issue of parameter setting in conventional adaptive threshold schemes. The motivation of the integration with a tracking power envelope dynamics is to rely on the superiority in noise detection to suppress the speech false alarm rate to satisfy the need of practical applications. The final VAD outputs are determined by the proposed integrated scheme which considers the past decisions and the confidence level in alternative algorithms based on the case of the opposite detections. The effectiveness of the proposed scheme has been validated by the AURORA database. According to the experimental results, the proposed scheme achieves an acceptable speech-false-alarm rate and a higher non-speech hit rate in real-time procedures than those of some existing VAD algorithms.
[1] M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” in Proc. IEEE International Conf. Acoustics, Speech, Signal Processing, pp. 208-211, 1979.
[2] S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoustics Speech Signal Processing, vol. 27, no. 2, pp. 113-120, 1979.
[3] A. Benyassine, E. Shlomot, H. Y. Su, D. Massaloux, C. Lamblin, and J. P. Petit, “ITU-T recommendation G.729 annex B: A silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications,” IEEE Communication Mag., vol. 35, pp. 64-73, 1997.
[4] F. Beritelli, S. Casale, and A. Cavallaro, “A robust voice activity detector for wireless communications using soft computing,” IEEE J. Select. Areas Commun., vol. 16, pp. 1818-1829, 1998.
[5] J. H. Chang, N. S. Kim, and S. K. Mitra, “Voice activity detection based on multiple statistical models,” IEEE Trans. Signal Processing, vol. 54, no. 6, pp. 1965-1976, 2006.
[6] A. Davis, S. Nordholm, and R. Togneri, “Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold,” IEEE Trans. Audio, Speech, and Language Processing, vol. 14, no. 2, pp. 412-424, 2006.
[7] J. P. Egan, Signal Detection Theory and ROC Analysis. New York: Academetic, 1975.
[8] J. M. Gorriz, J. Ramírez, C. G. Puntonet, and J. C. Segura, “Generalized LTR-based voice activity detector,” IEEE Signal Processing Lett., vol. 13, no. 10, pp. 636-639, 2006.
[9] J. A. Haigh and J. S. Mason, “Robust voice activity detection using cepstral features,” in IEEE TENCON, pp. 321-324, 1993.
[10] S. Haykin, Communication Systems. New York: Wiley, 1994.
[11] H. G. Hirsch and D. Pearce, “The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions,” in ISCA ITRW ASR2000, 2000. 
[12] S. M. Kay, Fundamentals of Statistical Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1998.
[13] Q. Li, J. Zheng, A. Tsai, and Q. Zhou, “Robust endpoint detection and energy normalization for real-time speech and speaker recognition,” IEEE Trans. Speech and Audio Processing, vol. 10, no. 3, pp. 146-157, 2002.
[14] D. M. Jones, Noise. New York: Wiley, pp. 61–95. 1983.
[15] M. Markzinzik and B. Kollmeier, “Speech pause detection for noise spectrum estimation by tracking power envelope dynamics,” IEEE Trans. Speech and Audio Processing, vol. 10, no. 2, pp. 109-118, 2002.
[16] E. Nemer, R. Goubran, and S. Mahmoud, “Robust voice activity detection using higher-order statistics in the LPC residual domain,” IEEE Trans. Speech and Audio Processing, vol. 9, no. 3, pp. 217-231, 2001.
[17] M. Petrou and J. Kittler, “Optimal edge detectors for ramp edges,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13, no. 5, pp. 483-491, 1981.
[18] R. V. Prasad, A. Sangwan, H. S. Jamadagni, and M. C. Chiranth, “Comparison of voice activity detection algorithms for VoIP,” in Proc. IEEE Symposium on Computer and Communications, vol. 5, pp. 530-535, 2002.
[19] R. V. Prasad, R. Muralishhankar, S. Vijay, H. N. Shankar, P. Pawelczak, and I. Miemegeers, “Voice activity detection for VoIP-an information theoretic approach” in Proc. IEEE Global Telecommunications Conf., pp. 1-6, 2006.
[20] L. R. Rabiner and M. R. Sambur, “Voiced-unvoiced-silence detection using the Itakura LPC distance measure,” in Proc, Int. Conf. Acoustics, Speech, Signal Processing, pp. 323-326, 1977.
[21] R. Tuker, “Voice activity detection using a periodicity measure,” in IEE Proceedings-I, vol. 139, no. 4, 1992.
[22] J. Ramírez, J. C. Segura, C. Benítez, Á. de la Torre, and A. Rubio, “Efficient voice activity detection algorithms using long-term speech information,” Speech Communication , vol. 42, no. 3-4, pp. 271-287, 2004.
[23] J. Ramírez, J. C. Segura, C. Benítez, L. Gaucía, and A. Rubio, “Statistical voice activity detection using a multiple observation likelihood ratio test,” IEEE Signal Processing Lett., vol. 12, no. 10, pp. 689-692, 2005.
[24] J. Shon and W. Sung, “A voice activity detector employing soft decision based noise spectrum adaptation,” in Proc. IEEE ICASSP’ 98, vol. 1, pp. 365-368, 1998.
[25] J. S. Sohn, N. S. Kim, and W. Y. Sung, “A statistical model-based voice activity detection,” IEEE Signal Processing Lett., vol. 6, no. 1, pp. 1-3, 1999.
[26] S. G. Tanyer and H. Özer, “Voice activity detection in nonstationary noise,” IEEE Trans. Speech and Audio Processing, vol. 8, no. 4, pp. 478-482, 2000.
[27] P. D. Welch, “The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms,” IEEE Trans. Audio Electroacoustics, vol. 15, no.2, pp. 70-73, 1967.
[28] G. D. Wu and C. T. Lin, “A recurrent neural fuzzy network for word boundary detection invariable noise-level environments,” IEEE Trans. Systems, Man, and Cybernetics - Part B, vol. 31, no. 1, pp.84-97, 2001.
[29] “Digital cellular telecommunications system (Phase 2+); voice activity detector (VAD) for adaptive multi-rate (AMR) speech traffic channels,” ETSI, GSM 06.94 v7.1.1 (ETSI EN 301 708), 1998.
[30]“Speech processing, transmission and quality aspects (STQ); distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithms,” ETSI, v1.1.5 (ETSI ES 202 050), 2002.