簡易檢索 / 詳目顯示

研究生: 王聲庸
Sen, Eddyson
論文名稱: 子音量化分析
Quantized Consonant Analysis
指導教授: 鍾高基
Chung, Kao-Chi
學位類別: 碩士
Master
系所名稱: 工學院 - 醫學工程研究所
Institute of Biomedical Engineering
論文出版年: 2008
畢業學年度: 96
語文別: 英文
論文頁數: 76
中文關鍵詞: 量化分析子音
外文關鍵詞: consonant, quantized
相關次數: 點閱:68下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 人類最常使用的溝通方式是語音和文字。在之前大部份的研究,
    主要是利用數位訊號處理電腦科技來建立並且運用語音和語言的模
    式,例如語音編碼,分析,合成,辨識,對溝通障礙的人之協助,
    以及其它方面。雖然目前的數位訊號處理技術已經很發達,並且可
    以處理母音訊號,例如藉由線性預測編碼(LPC)來找出共振峰。而子
    音的分析是很困難的,因為不關在時域還是頻域子音的訊號是相似
    於雜訊。
    本研究的目的主要是建立量化不同參數的中文子音的訊號。這
    個研究是針對上顎發音的子音(ㄐ、ㄑ、ㄒ) (無聲的摩擦音和塞擦
    音)。本研究分成四個階段。第一個階段是:從語料庫 TCC300 來擷
    取中文音素phonemes。所有的TCC300 的phonemes 訊號是藉著HTK
    來辨識,並取得結果。第二階段是擷取穩定的子音波形區域,利用
    了正規化(normalization),終點偵測(endpoint detection),並且
    消除趨勢線(detrend)和音框(windowing)來得到穩定的波形區域。
    第三階段是針對不同的參數去量化分析子音。子音的頻譜密度則分
    成三個部分。低於2 kHz,F1 formant 被計算成第一個參數。針對
    2 到6 kHz 的頻譜,頻譜波峰(spectral peak)則是被計算成第二個
    參數。針對頻譜大於2 kHz,頻譜平坦(spectral flatness)的區域
    則是被計算成第三個參數。第四個階段是將每一個子音的波形訊號
    藉著HTK 轉成39 MFCCs。然後利用區別分析被用在藉著分類的分數
    來定義參數。本研究的結果顯示,參數(parameters)的分類分數
    (classification score)為17.8% (原本群組的樣本)和11.7% (分
    組評估的樣本),並且低於MFCC 的分數。

    Speech and text are most frequently used in human communication.
    In the past three decades many studies have developed and applied
    speech and language models with digital signal processing computer
    technology for speech coding, analysis, synthesis, recognition,
    aids-to-the handicapped, and others. Although present digital signal
    processing techniques are very well developed to characterize vowel
    signals speech signals, such as finding formant through Linear
    Predictive Coding (LPC). The characterization of consonants has been a
    difficult problem because consonant waveforms may be noise-like in
    time or frequency domain.
    The purpose of this research is to develop a signal processing
    technology for quantitative parametric representations on Mandarin
    consonant speech signals. This pilot research is focused on
    fricative/affricative palatal consonants (ㄐ/j/ and ㄑ/q/ as well as ㄒ/x/ ),
    which have long unvoiced speech signals. This research study is divided
    into four phases. The first phase is to extract Mandarin phonemes from
    Mandarin spontaneous speech corpus TCC300. All phoneme signals of
    corpus TCC300 are recognized through HTK (Hidden Markov Model
    ToolKit) and the phonemes are extracted from the recognition results.
    The second phase is to extract the stable waveform region of
    consonants. The discrete signal processing of normalization, end-point
    detection, detrend and windowing are applied to get the stable region.
    The third phase is to analyze quantitatively the parameters for the three
    consonants. The spectrum density of the consonant is divided into three
    portions. In spectral below 2 kHz, F1 formant is calculated for
    parameter representation. In spectral between 2 kHz and 6 kHz, spectral
    III
    peak is calculated for parameter representation; In spectral above 2 kHz,
    spectral flatness of spectrum is calculated for another parameter feature.
    The fourth phase is to transform each consonant waveform signal to 39
    MFCCs (Mel-Frequency Cepstrum Coefficient) through HTK.
    Discriminant Analysis is used for characterizing the parameters by
    giving classification scores. The results of this study showed that the
    classification score of parameters is 17.8% (original grouped cases) and
    11.7% (cross-validated grouped) below MFCC’s scores.

    中文摘要 ........................................................................................................................ I Abstract ......................................................................................................................... II List of Figures ..............................................................................................................VI List of Tables ................................................................................................................IX Chapter 1 Introduction ................................................................................................... 1 1.1. The Process of Speech Production ........................................................................................ 2 1.1.1. Anatomy and Physiology of Speech Organs ...................................................................... 2 Lungs and Thorax ....................................................................................................................... 2 Larynx and Vocal Folds ............................................................................................................... 3 Vocal Tract ................................................................................................................................. 6 Articulators ................................................................................................................................ 8 1.1.2. Articulatory Phonetics .................................................................................................... 12 Phonemes ................................................................................................................................ 12 Manners of Articulation ........................................................................................................... 14 Place of Articulation ................................................................................................................. 14 English Phonemes .................................................................................................................... 15 Mandarin Consonants .............................................................................................................. 17 Mandarin Vowels ...................................................................................................................... 18 1.2. Acoustic Theory of Speech Production ................................................................................ 19 1.2.1. Vocal Track Model for Speech Production Process ......................................................... 20 1.2.2. Discrete Signal Processing for Speech Production .......................................................... 23 1.2.3. Speech Signal Analysis and Acoustic Features ................................................................ 24 Short Time Speech Analysis ...................................................................................................... 24 Spectrogram ............................................................................................................................. 26 Short‐Time Energy Function ..................................................................................................... 27 Zero‐Crossing Rate ................................................................................................................... 27 Endpoint Detection .................................................................................................................. 28 Power Spectrum Density (PSD) ................................................................................................ 28 Linear Prediction Coding Model (LPC) ...................................................................................... 29 Complex Cepstrum ................................................................................................................... 31 Mel‐frequency Cepstrum Coefficients (MFCC) ......................................................................... 32 Hidden Markov Model (HMM) ................................................................................................. 33 1.2.4. Acoustic Features of Phonetics ....................................................................................... 35 Vowels ................................................................................................................................ 36 V Diphthongs ............................................................................................................................... 38 Glides and Liquids .................................................................................................................... 38 Nasals ...................................................................................................................................... 39 Fricatives ................................................................................................................................. 39 Affricate ................................................................................................................................... 40 Stops ........................................................................................................................................ 42 1.3. Motivation and Objectives .................................................................................................. 43 Chapter 2 Materials and Methods ................................................................................ 44 2.1. Phoneme Extraction from Mandarin Speech Corpus .......................................................... 46 Preparation of Data .................................................................................................................. 46 Recognition by HTK .................................................................................................................. 46 2.2. Consonants ㄐ/j/, ㄑ/q/, ㄒ/x/ Stable Region Extraction ................................................ 49 2.3. Parametric Representation for ㄐ/j/, ㄑ/q/, ㄒ/x/ Signals ............................................... 51 2.3.1. Formant Analysis ............................................................................................................ 53 2.3.2. Spectral Peak .................................................................................................................. 54 2.3.3. Spectral Flatness ............................................................................................................. 55 2.4. MFCC .................................................................................................................................. 56 2.5. Statistical Analysis and Characterization ............................................................................. 56 2.5.1. Statistical Analysis ........................................................................................................... 56 2.5.2. Discriminant Analysis ...................................................................................................... 56 Chapter 3 Results and Discussion ................................................................................ 58 3.1. Consonants ㄐ/j/, ㄑ/q/, ㄒ/x/ Stable Region Extraction and Filtering Results ............... 58 3.2. F1 Formant, Spectral Peak and Spectral Flatness ................................................................ 63 3.2.1. The F1 for Spectral Below 2.5 kHz .................................................................................. 63 3.2.2. Spectral Peak of PSD on Spectral Between 2 kHz and 6 kHz ........................................... 65 3.2.3. Spectral Flatness of PSD on Spectral Above 2 kHz .......................................................... 68 3.3. Characterization of Parameters F1, Spectral Peak, Spectral Flatness and MFCC through Discrimination Analysis .......................................................................................................... 70 3.3.1. Discriminant Analysis of F1, Spectral Peak and Spectral Flatness ................................... 70 3.3.2. Discriminant Analysis of MFCC ....................................................................................... 73 Chapter 4 Conclusions and Future Study ..................................................................... 74 References .................................................................................................................... 76

    [1] T. F. Quatieri, Discrete-Time Speech Signal Processing: Principles and
    Practice, Prentice Hall PTR, 2002.
    [2] D. O'Shaughnessy, Speech Communication: Human and Machine,
    Addison-Wesley Publishing Company, 1987.
    [3] R. D. Kent and C. Read, Acoustic Analysis of Speech, Singular Thomson
    Learning, 2002.
    [4] R. D. Kent, The speech sciences, Singular Pub. Group, 1997.
    [5] 張民賢, "Quantitative Analysis of Vowels for School Aged Children in
    Taiwan," 2007.
    [6] S. Young, "The HTK Book," 2006.
    [7] J. Coleman, "Introducing Speech and Language Processing," 2005.
    [8] Jo-Tong Chen, "Electrophysiological Analysis of the Spontaneous
    Electrical Activity for Characterizing the Sensitivity of Myofascial
    Trigger Points of Myofascial Pain Syndrome," 2001.
    [9] Icon Learning Systems LLC, “Interactive Atlas of Human Anatomy”
    version 3.0, 2003
    [10] W. R. Zemlin, "Speech and Hearing Science - Anatomy and
    Physiology," 4th ed., 1998.
    [11] F. J. Owens, "Signal Processing of Speech," 1993

    下載圖示 校內:2011-09-12公開
    校外:2011-09-12公開
    QR CODE