| 研究生: | 吳典家 Wu, Dian-Jia | 
|---|---|
| 論文名稱: | 雜訊環境之語音/音樂信號分辨器演算法及超大型積體電路設計 VLSI and Algorithm Design for Speech/Music Discrimination under Noisy Environment | 
| 指導教授: | 王駿發 Wang, Jhing-Fa | 
| 學位類別: | 碩士 Master | 
| 系所名稱: | 電機資訊學院 - 電機工程學系 Department of Electrical Engineering | 
| 論文出版年: | 2004 | 
| 畢業學年度: | 92 | 
| 語文別: | 英文 | 
| 論文頁數: | 78 | 
| 中文關鍵詞: | 聲音分類 、語音/音樂信號分辨 、聲音檢測 、超大型積體電路設計 | 
| 外文關鍵詞: | Audio Classification, Speech/Music Signal Discrimination, Audio Activity Detection, VLSI Design | 
| 相關次數: | 點閱:111 下載:4 | 
| 分享至: | 
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 | 
  在多媒體的應用領域之中,從一段聲音裡分辨出語音和音樂信號的問題變的越來越重要。很多針對這個問題的研究曾經被提出,但大部分而言這些方法都需要大量的訓練資料才能得到滿意的結果,而且通常都並未考慮到訊號雜訊比較低時的情況。因此在本篇論文中,我們提出一種較穩健的語音/音樂信號分辨系統,其在較吵雜的環境之下仍可得到滿意的分辨率。在我們的系統中,首先使用一種基於統計模型方式的聲音檢測方法來切除背景雜訊並留下有用的聲音信號,然後針對每一個被檢測出來的聲音區段,採用低能量比例,頻譜通量以及相似度比值波形之交越率等三個參數進行語音和音樂的分辨。在我們的實驗驗證中,於吵雜的環境之下仍可達九成的正確率。論文最後,我們提出並實作了這個分辨器的硬體電路架構,這個分辨器並可以作為一個矽智財電路(IP),提供給各式的多媒體統晶片整合使用。
  The problem of distinguishing speech/music signals form audio signals has become more important in the applications of multimedia domains. Therefore, many studies have been proposed to treat it recently. Nevertheless, most of the proposed techniques need a great amount of training data in order to provide acceptable results. Besides, none of these techniques consider the audio signals classified under low SNR noisy environment. In this thesis, we proposed a robust speech/music discrimination system which works well under noisy environment. In our system, a statistical model-based audio activity detection theory is used to detect the audio segments and segments the audio signal into noise segments and noisy audio segments. For each noisy audio segment, low short time energy ratio (LSTER), spectrum flux (SF) and likelihood ratio crossing rate (LRCR) are adopted to classify the segment into speech or music segment. In our experiments, the performance of our proposed system can achieve about 90% classification of accuracy. Finally, VLSI architecture for the speech/music discriminator is proposed and implemented. This discriminator can be an useful IP to be integrated into the multimedia SOCs. 
References
[01] M.J. Carey, E.S. Parris and H. Lloyd-Thomas, “A comparison of features for speech, music discrimination” in Proc. IEEE Conf. on Acoustics, Speech and Signal Processing, March 1999 ICASSP '99.  
[02] E. Scheirer and M. Slaney. “Construction and evaluation of a robust multifeature speech/music discriminator”, Proc. ICASSP-97, pp1331 - 1334 ,1997
[03] J. Saunders “Real-time discrimination of broadcast speech/music”
Proc. ICASSP 1996. Vol. II, pp993 – 996, Atlanta, May 1996. 
[04] Lie Lu, Hong-Jiang Zhang and Hao Jiang,“Content analysis for audio classification and segmentation” IEEE Trans. on Speech and Audio Processing, IEEE Transactions on , Vol10, No.7, Oct. 2002
[05] Chou Wu and Gu Liang, “Robust singing detection in speech/music discriminator design” Proc. ICASSP 2001, Vol II , pp865 – 868, May 2001
[06] J. Ajmera, I.A. Mccowan, and H. Bourlard, “Robust HMM-based speech/music 
    segmentation” Proc. ICASSP 2002. Vol. I , pp297-300, April 2002 
[07] K. El-Maleh, M. Klein, G. Petrucci, and P. Kabal,“Speech/music discrimination for multimedia applications” Proc. ICASSP 2000, Vol.6 pp2445 - 2448 , June 2000
[08] Sohn Jongseo, Kim Nam Soo, and Sung Wonyong, “A statistical model-based voice activity detection” IEEE Signal Processing Letters, Vol6, pp1-3, Jan. 1999
[09] Shan Jongseo and Song Wonyong, “A voice activity detector employing soft decision based noise spectrum adaptation” Proc. ICASSP 1998, Vol.1, pp365 – 368, May 1998
[10] S.G. Tanyer, and H. Ozer, “Voice activity detection in nonstationary noise”
IEEE Trans. Speech and Audio Processing, Vol. 8, pp478-482, July 2000 
[11] K.D. Freeman, G. Cosier, B.C. Southcott, and I. Boyd, “The voice activity detector for the Pan-European digital cellular mobile telephone service” Proc. ICASSP1989, pp 369-372 May 1989
[12] Y. Ephraim, and D.Malah, “Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator” IEEE Trans. Acoustics, Speech, and Signal Processing, Vol.32, pp1109-1121, Dec 1984
[13] R.McAulay, and M. Malpass, “Speech enhancement using a soft-decision noise suppression filter”, IEEE Trans., Acoustics, Speech, and Signal Processing, Vol28, pp137-145, Apr 1980
[14] G..Williams and D.Ellis, “Speech/music discrimination based on posterior probabilites,” in Proc. European Conf. on Speech Commun. and Technology, Sept. 1999, pp687-690.
[15] T. Houtgast and H.J.M. SteenKen. The modulation transfer function in room acousytics as a predictor of speech intelligibility . Acustica, 28:66-73,1973
[16] Benjamin Kedem. Spectral analysis and discrimination by zero-crossing . Proc. IEEE ,74(11):1477-1493,1986.
[17] Huang Xuedong, Acero Alex , and Hon Hsiao-Wuen, “Spoken Language Processing: a guide to theory, algorithm, and system development”, Prentice Hall, 2002
[18] Lawrence Rabinea and Biing-Hwang Juang, ”Foundamentals of speech recognition”, Prentice Hall,1993.
[19] Behrooz Parhami, “Computer arithmetic: algorithm and hardware designs” Oxford University Press, New York, 2000
[20] J.C. Junqua, B. Reaves, and M. Mak, “A study of endpoint detection algorithm in adverse conditions: Incidence on a DTW and HMM recognize,” in Proc. Eurpospeeech’91, 1991, pp.1371-1374.
[21] J.A. Haigh and J.S. Mason, “Robust voice activity detection using cepstral features,” in Proc. IEEE TENCON, China, 1993, pp.321-324.
[22] N.B.Yoma,F.McInnes, and M. Jack,” Robust speech pulse-detection using adaptive noise modeling,” Electron. Lett. Vol.32, July 1996.
[23] R. Tucker,” Voice activity detection using a periodicity measure,”Proc. Inst. Elect.Eng. , vol.139, pp377-380, Aug, 1992.
[24] T. Zhang, and C.-C. J. Kuo, “Hierarchical classification of audio data for archiving and retrieving”, Proc. ICASSP’99, Vol.6 Phoenix, pp.3001-3004, Mar.1999.
[25] J. Yang, “Frequency domain noise suppression approaches in mobile telephone systems” IEEE International Conference on Acoustics, Speech, and Signal Processing, vol2, pp363 – 366, 1993.
[26] B. Mak, C. Junqua, J. and B.Reaves ,”A robust speech/non-speech detection algorithm using time and frequency-based features” IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.2 pp269 – 272,1993
[27] L.R. Rabiner and M. R. Sambur, “Voice-unvoiced-silence detection using the Itakura LPC distance measure,” in Proc. Int. Conf. Acoust., Speech, Signal Processing, May 1977, pp.323-326
[28] D.K. Freeman, G. Cosier, C. B. Southcott, and I. Boyd, “The voice activity detector for the Pan-European digital cellular mobile telephone service,” in Proc. Int. Conf. Acoust., Speech, Signal Processing, Glasgow, U.K., May 1989, pp.369-472.
[29] Manual of Synplify Pro 7.3.1 Jun. ,2003.
[30] Manual of ModelSim SE Plus, 5.8b, Jan. 2003.
[31] Manual of QuartusII 2.0, 2003. 
[32] http://www.altera.com
[33] K. Srinivasan and A. Gersho, “Voice activity detection for cellular networks,” in Proc. IEEE Speech Coding Workshop, Oct. 1993, pp.85-86.
[34] S. Sasaki and I. Matsumoto, “Voice activity detection and transmission error control for digital cordless telephone system,” IEICE Trans. Commun., vol. E77B, no. 7, pp.948-955, 1994.
[35] Nam Soo Kim, and Joon Hyuk Chang,”Spectral Enhancement Based on Global Soft Decision,” IEEE Signal Processing Letters, Vol. 7, pp108-110, May 2000