簡易檢索 / 詳目顯示

研究生: 吳典家
Wu, Dian-Jia
論文名稱: 雜訊環境之語音/音樂信號分辨器演算法及超大型積體電路設計
VLSI and Algorithm Design for Speech/Music Discrimination under Noisy Environment
指導教授: 王駿發
Wang, Jhing-Fa
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2004
畢業學年度: 92
語文別: 英文
論文頁數: 78
中文關鍵詞: 聲音分類語音/音樂信號分辨聲音檢測超大型積體電路設計
外文關鍵詞: Audio Classification, Speech/Music Signal Discrimination, Audio Activity Detection, VLSI Design
相關次數: 點閱:111下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   在多媒體的應用領域之中,從一段聲音裡分辨出語音和音樂信號的問題變的越來越重要。很多針對這個問題的研究曾經被提出,但大部分而言這些方法都需要大量的訓練資料才能得到滿意的結果,而且通常都並未考慮到訊號雜訊比較低時的情況。因此在本篇論文中,我們提出一種較穩健的語音/音樂信號分辨系統,其在較吵雜的環境之下仍可得到滿意的分辨率。在我們的系統中,首先使用一種基於統計模型方式的聲音檢測方法來切除背景雜訊並留下有用的聲音信號,然後針對每一個被檢測出來的聲音區段,採用低能量比例,頻譜通量以及相似度比值波形之交越率等三個參數進行語音和音樂的分辨。在我們的實驗驗證中,於吵雜的環境之下仍可達九成的正確率。論文最後,我們提出並實作了這個分辨器的硬體電路架構,這個分辨器並可以作為一個矽智財電路(IP),提供給各式的多媒體統晶片整合使用。

      The problem of distinguishing speech/music signals form audio signals has become more important in the applications of multimedia domains. Therefore, many studies have been proposed to treat it recently. Nevertheless, most of the proposed techniques need a great amount of training data in order to provide acceptable results. Besides, none of these techniques consider the audio signals classified under low SNR noisy environment. In this thesis, we proposed a robust speech/music discrimination system which works well under noisy environment. In our system, a statistical model-based audio activity detection theory is used to detect the audio segments and segments the audio signal into noise segments and noisy audio segments. For each noisy audio segment, low short time energy ratio (LSTER), spectrum flux (SF) and likelihood ratio crossing rate (LRCR) are adopted to classify the segment into speech or music segment. In our experiments, the performance of our proposed system can achieve about 90% classification of accuracy. Finally, VLSI architecture for the speech/music discriminator is proposed and implemented. This discriminator can be an useful IP to be integrated into the multimedia SOCs.

    CONTENT ABSTRACT ACNOWLEDGEMENT LIST of FIGURES LIST of TABLES Chapter 1 Introduction 1 1.1 Background 1 1.2 Previous Works 1 1.3 Motivations 2 1.4 Characteristics of the Proposed System 3 1.5 System Overview 5 1.6 Thesis Organization 6 Chapter 2 Audio Activity Detection 8 2.1 History of Audio Activity Detection 8 2.2 Statistical Model-Based Audio Activity Detection 9 2.2.1 Overview 11 2.2.2 Likelihood Estimation 14 2.2.3 Noise Spectrum Adaptation 16 2.2.4 Threshold Estimation 19 2.2.5 Merging Scheme 23 Chapter 3 Speech/Music Discrimination Features 25 3.1 Introduction to Speech/Music Discrimination Features 25 3.2 The Robust Speech/Music Discrimination Features 26 3.2.1 Low Short Time Energy Rate Ratio 27 3.2.2 Spectrum Flux 30 3.2.3 Likelihood Ratio Crossing Rate 33 Chapter 4 Speech/Music Classification 38 4.1 The K-Nearest Neighbor Classifier 38 4.2 Refinement 43 4.2.1 The Refined Rules 43 4.2.2 Refinement Using Convolution 45 Chapter 5 Algorithm Performance Evaluation & Discussions 48 5.1 Performance Evaluation of Audio Activity Detection 48 5.2 Performance Evaluation under Various Types of Noisy Environment Using K-NN 51 5.3 MATLAB Demonstration: 55 Chapter 6 VLSI Design for Speech/Music Discriminator 56 6.1 System Overview 56 6.2 The CORDIC Theory 60 6.2.1 CORDIC Division Mode 60 6.2.2 CORDIC Logarithm Mode: 62 6.3 Architectures of the Speech/Music Discriminator 64 6.4 Experimental Results 69 6.4.1 Design Flow and Strategy 69 6.4.2 FPGA Implementation and Simulation Results 70 Chapter 7 Conclusions and Future Works 73 References 75

    References
    [01] M.J. Carey, E.S. Parris and H. Lloyd-Thomas, “A comparison of features for speech, music discrimination” in Proc. IEEE Conf. on Acoustics, Speech and Signal Processing, March 1999 ICASSP '99.

    [02] E. Scheirer and M. Slaney. “Construction and evaluation of a robust multifeature speech/music discriminator”, Proc. ICASSP-97, pp1331 - 1334 ,1997

    [03] J. Saunders “Real-time discrimination of broadcast speech/music”
    Proc. ICASSP 1996. Vol. II, pp993 – 996, Atlanta, May 1996.

    [04] Lie Lu, Hong-Jiang Zhang and Hao Jiang,“Content analysis for audio classification and segmentation” IEEE Trans. on Speech and Audio Processing, IEEE Transactions on , Vol10, No.7, Oct. 2002

    [05] Chou Wu and Gu Liang, “Robust singing detection in speech/music discriminator design” Proc. ICASSP 2001, Vol II , pp865 – 868, May 2001

    [06] J. Ajmera, I.A. Mccowan, and H. Bourlard, “Robust HMM-based speech/music
    segmentation” Proc. ICASSP 2002. Vol. I , pp297-300, April 2002

    [07] K. El-Maleh, M. Klein, G. Petrucci, and P. Kabal,“Speech/music discrimination for multimedia applications” Proc. ICASSP 2000, Vol.6 pp2445 - 2448 , June 2000

    [08] Sohn Jongseo, Kim Nam Soo, and Sung Wonyong, “A statistical model-based voice activity detection” IEEE Signal Processing Letters, Vol6, pp1-3, Jan. 1999

    [09] Shan Jongseo and Song Wonyong, “A voice activity detector employing soft decision based noise spectrum adaptation” Proc. ICASSP 1998, Vol.1, pp365 – 368, May 1998

    [10] S.G. Tanyer, and H. Ozer, “Voice activity detection in nonstationary noise”
    IEEE Trans. Speech and Audio Processing, Vol. 8, pp478-482, July 2000

    [11] K.D. Freeman, G. Cosier, B.C. Southcott, and I. Boyd, “The voice activity detector for the Pan-European digital cellular mobile telephone service” Proc. ICASSP1989, pp 369-372 May 1989

    [12] Y. Ephraim, and D.Malah, “Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator” IEEE Trans. Acoustics, Speech, and Signal Processing, Vol.32, pp1109-1121, Dec 1984

    [13] R.McAulay, and M. Malpass, “Speech enhancement using a soft-decision noise suppression filter”, IEEE Trans., Acoustics, Speech, and Signal Processing, Vol28, pp137-145, Apr 1980

    [14] G..Williams and D.Ellis, “Speech/music discrimination based on posterior probabilites,” in Proc. European Conf. on Speech Commun. and Technology, Sept. 1999, pp687-690.

    [15] T. Houtgast and H.J.M. SteenKen. The modulation transfer function in room acousytics as a predictor of speech intelligibility . Acustica, 28:66-73,1973

    [16] Benjamin Kedem. Spectral analysis and discrimination by zero-crossing . Proc. IEEE ,74(11):1477-1493,1986.

    [17] Huang Xuedong, Acero Alex , and Hon Hsiao-Wuen, “Spoken Language Processing: a guide to theory, algorithm, and system development”, Prentice Hall, 2002

    [18] Lawrence Rabinea and Biing-Hwang Juang, ”Foundamentals of speech recognition”, Prentice Hall,1993.

    [19] Behrooz Parhami, “Computer arithmetic: algorithm and hardware designs” Oxford University Press, New York, 2000

    [20] J.C. Junqua, B. Reaves, and M. Mak, “A study of endpoint detection algorithm in adverse conditions: Incidence on a DTW and HMM recognize,” in Proc. Eurpospeeech’91, 1991, pp.1371-1374.

    [21] J.A. Haigh and J.S. Mason, “Robust voice activity detection using cepstral features,” in Proc. IEEE TENCON, China, 1993, pp.321-324.
    [22] N.B.Yoma,F.McInnes, and M. Jack,” Robust speech pulse-detection using adaptive noise modeling,” Electron. Lett. Vol.32, July 1996.

    [23] R. Tucker,” Voice activity detection using a periodicity measure,”Proc. Inst. Elect.Eng. , vol.139, pp377-380, Aug, 1992.

    [24] T. Zhang, and C.-C. J. Kuo, “Hierarchical classification of audio data for archiving and retrieving”, Proc. ICASSP’99, Vol.6 Phoenix, pp.3001-3004, Mar.1999.

    [25] J. Yang, “Frequency domain noise suppression approaches in mobile telephone systems” IEEE International Conference on Acoustics, Speech, and Signal Processing, vol2, pp363 – 366, 1993.

    [26] B. Mak, C. Junqua, J. and B.Reaves ,”A robust speech/non-speech detection algorithm using time and frequency-based features” IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.2 pp269 – 272,1993

    [27] L.R. Rabiner and M. R. Sambur, “Voice-unvoiced-silence detection using the Itakura LPC distance measure,” in Proc. Int. Conf. Acoust., Speech, Signal Processing, May 1977, pp.323-326

    [28] D.K. Freeman, G. Cosier, C. B. Southcott, and I. Boyd, “The voice activity detector for the Pan-European digital cellular mobile telephone service,” in Proc. Int. Conf. Acoust., Speech, Signal Processing, Glasgow, U.K., May 1989, pp.369-472.

    [29] Manual of Synplify Pro 7.3.1 Jun. ,2003.
    [30] Manual of ModelSim SE Plus, 5.8b, Jan. 2003.
    [31] Manual of QuartusII 2.0, 2003.
    [32] http://www.altera.com

    [33] K. Srinivasan and A. Gersho, “Voice activity detection for cellular networks,” in Proc. IEEE Speech Coding Workshop, Oct. 1993, pp.85-86.

    [34] S. Sasaki and I. Matsumoto, “Voice activity detection and transmission error control for digital cordless telephone system,” IEICE Trans. Commun., vol. E77B, no. 7, pp.948-955, 1994.

    [35] Nam Soo Kim, and Joon Hyuk Chang,”Spectral Enhancement Based on Global Soft Decision,” IEEE Signal Processing Letters, Vol. 7, pp108-110, May 2000

    下載圖示 校內:2005-08-20公開
    校外:2005-08-20公開
    QR CODE