簡易檢索 / 詳目顯示

研究生: 黃子軒
Huang, Tze-Hsuan
論文名稱: 以MPEG-7特徵為基礎的居家環境 聲音辨識器之超大型積體電路架構設計
VLSI Architectures for Home Environmental Sound Recognition Based on MPEG-7 Features
指導教授: 王駿發
Wang, Jhing-Fa
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2003
畢業學年度: 91
語文別: 英文
論文頁數: 57
中文關鍵詞: 聲音辨識隱藏馬可夫模型頻心超大型積體電路
外文關鍵詞: MPEG-7, HMM, centroid, spread, flatness, vlsi, sound recognition
相關次數: 點閱:89下載:6
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • none

    In this thesis, an environmental sound recognition system based on MPEG-7 features (centroid, spread, and flatness [1]) and its corresponding VLSI architectures are proposed. Traditional sound recognizer utilizes decision-tree based method and causes a problem where the parameter is not generalized [2~5]. The HMM based sound recognizer has been introduced by [8] to resolve this drawback. However, it adopts spectrum parameter and will result in high dimensional feature vectors. This thesis successfully solves the shortcoming by taking the basis extraction. The recognition rate is about 82% while only spectrogram is adopted as the parameter. The improved recognition rate is about 95% while above three mentioned MPEG-7 audio features are regarded as the parameters in our environmental sound recognizer.
    Moreover, related VLSI architectures for this sound recognition system are also proposed. The first one is the feature extraction module. The most complicated computations in the module are the division and nth-root operations. We utilize the CORDIC method to devise a divider. For the nth-root operation, a specific circuit is designed in accordance with the Brahmagupta iteration algorithm. For the Viterbi algorithm, a dedicated hardware architecture is also presented. This architecture is designed based on the 4-step fully Viterbi algorithm. This speed-up of this module is also ascribed to the fully pipeline systolic array architecture.

    CONTENTS ABSTRACT ACKNOLEDGEMENT CONTENTS LIST OF TABLES LIST OF FIGURES Chapter 1. Introduction………………………………………………………..1 1.1 Motivation………………………………………………………..1 1.2 Origin and Range of MPEG-7…………………………………...2 1.3 Thesis Organization……………………………………………..3 Chapter 2. HMM-Based Environmental Sound Recognition (ESR)………….4 2.1 Concept of Hidden Markov Model………………………………4 2.2 ESR by HMM……………………………………………………6 2.3 Parameter Extraction…………………………………………….8 2.4 Prior Entropy Training………………………………………….12 2.5 The Viterbi Algorithm…………………………………………..17 Chapter 3. Feature Extraction………………………………………………..21 3.1 Audio Spectrum Envelope……………………………………..21 3.2 Audio Spectrum Centroid………………………………………24 3.3 Audio Spectrum Spread…………………………………………25 3.4 Audio Spectrum Flatness……………………………………….26 Chapter 4. VLSI Architectures of ESR………………………………………30 4.1 System Block Diagram………………………………………….30 4.2 Parameters Extraction……………………………………………30 4.2.1 Division Architecture……………………………………..32 4.2.2 Nth-Rooting Architecture…………………………………34 4.3 Design of Viterbi Algorithm……………………………………39 4.3.1 The Algebra Formulation of Viterbi Algorithm………….39 4.3.2 4-Step Viterbi Algorithm Architecture…………………..40 Chapter 5. Experiments………………………………………………………..49 5.1 Training Data Preparation………………………………………49 5.2 Testing Procedure……………………………………………….49 5.3 Experimental Results……………………………………………50 Chapter 6. Conclusion……………………………………………………….53 References……………………………………………………………………54 作者自述…………………………………………………………………….57

    [1]ISO/IEC FDIS 15938 4:2001(E) Information Technology - Multimedia Content Description Interface---Part 4 : Audio

    [2]Guojun Lu; Hankinson, T.
    “A technique towards automatic audio classification and retrieval”
    Signal Processing Proceedings, 1998. ICSP '98. 1998 Fourth International Conference on , Volume: 2 , 12-16 Oct. 1998

    [3]Zhang, T.; Jay Kuo, C.-C.
    ”Audio content analysis for online audiovisual data segmentation and classification”
    Speech and Audio Processing, IEEE Transactions on , Volume: 9 Issue: 4 , May 2001

    [4]Tong Zhang; Kuo, C.-C.J.
    ”Classification and retrieval of sound effects in audiovisual data management”
    Signals, Systems, and Computers, 1999. Conference Record of the Thirty-Third Asilomar Conference on , Volume: 1 , 24-27 Oct. 1999

    [5]Tong Zhang; Kuo, C.-C.J.
    ”Hierarchical classification of audio data for archiving and retrieving“
    Acoustics, Speech, and Signal Processing, 1999. ICASSP '99. Proceedings., 1999 IEEE International Conference on , Volume: 6 , 15-19 March 1999

    [6]Wold, E.; Blum, T.; Keislar, D.; Wheaten, J.
    ”Content-based classification, search, and retrieval of audio“
    Multimedia, IEEE , Volume: 3 Issue: 3 , Fall 1996

    [7]Tzanetakis, G.; Cook, P.
    ”Musical genre classification of audio signals“
    Speech and Audio Processing, IEEE Transactions on , Volume: 10 Issue: 5 , July 2002

    [8]Casey, M.
    ”MPEG-7 sound-recognition tools“
    Circuits and Systems for Video Technology, IEEE Transactions on , Volume: 11 Issue: 6 , June 2001

    [9]Goldhor, R.S.
    ”Recognition of environmental sounds”
    Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on , Volume: 1 , 27-30 April 1993

    [10]Brand, M.
    ”Structure and parameter learning via entropy minimization, with applications to mixture and hidden Markov models”
    Acoustics, Speech, and Signal Processing, 1999. ICASSP '99. Proceedings., 1999 IEEE International Conference on , Volume: 3 , 15-19 March 1999

    [11]Brand, M.
    “Structure discovery in conditional probability models via an entropic prior and parameter extinction”
    Neural Comput.,
    vol. 11, no. 5, pp. 1155-1183, 1999.

    [12]Kak, S.C.; Barbir, A.O.
    ”The Brahmagupta algorithm for square rooting”
    System Theory, 1989. Proceedings., Twenty-First Southeastern Symposium on , 26-28 March 1989

    [13]Chen-Jen Huang; Jer-Min Jou
    Efficient rapid Hardware Prototyping, Analyzing and Design of AN HMM-based Speech Recognition Engine
    Department of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C. June, 2000[14]L.R. Rabiner
    A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition
    Proc. IEEE, 77(2):257-268, February 1989.

    下載圖示 校內:立即公開
    校外:2003-09-15公開
    QR CODE