| 研究生: |
王聲庸 Sen, Eddyson |
|---|---|
| 論文名稱: |
子音量化分析 Quantized Consonant Analysis |
| 指導教授: |
鍾高基
Chung, Kao-Chi |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 醫學工程研究所 Institute of Biomedical Engineering |
| 論文出版年: | 2008 |
| 畢業學年度: | 96 |
| 語文別: | 英文 |
| 論文頁數: | 76 |
| 中文關鍵詞: | 量化分析 、子音 |
| 外文關鍵詞: | consonant, quantized |
| 相關次數: | 點閱:68 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
人類最常使用的溝通方式是語音和文字。在之前大部份的研究,
主要是利用數位訊號處理電腦科技來建立並且運用語音和語言的模
式,例如語音編碼,分析,合成,辨識,對溝通障礙的人之協助,
以及其它方面。雖然目前的數位訊號處理技術已經很發達,並且可
以處理母音訊號,例如藉由線性預測編碼(LPC)來找出共振峰。而子
音的分析是很困難的,因為不關在時域還是頻域子音的訊號是相似
於雜訊。
本研究的目的主要是建立量化不同參數的中文子音的訊號。這
個研究是針對上顎發音的子音(ㄐ、ㄑ、ㄒ) (無聲的摩擦音和塞擦
音)。本研究分成四個階段。第一個階段是:從語料庫 TCC300 來擷
取中文音素phonemes。所有的TCC300 的phonemes 訊號是藉著HTK
來辨識,並取得結果。第二階段是擷取穩定的子音波形區域,利用
了正規化(normalization),終點偵測(endpoint detection),並且
消除趨勢線(detrend)和音框(windowing)來得到穩定的波形區域。
第三階段是針對不同的參數去量化分析子音。子音的頻譜密度則分
成三個部分。低於2 kHz,F1 formant 被計算成第一個參數。針對
2 到6 kHz 的頻譜,頻譜波峰(spectral peak)則是被計算成第二個
參數。針對頻譜大於2 kHz,頻譜平坦(spectral flatness)的區域
則是被計算成第三個參數。第四個階段是將每一個子音的波形訊號
藉著HTK 轉成39 MFCCs。然後利用區別分析被用在藉著分類的分數
來定義參數。本研究的結果顯示,參數(parameters)的分類分數
(classification score)為17.8% (原本群組的樣本)和11.7% (分
組評估的樣本),並且低於MFCC 的分數。
Speech and text are most frequently used in human communication.
In the past three decades many studies have developed and applied
speech and language models with digital signal processing computer
technology for speech coding, analysis, synthesis, recognition,
aids-to-the handicapped, and others. Although present digital signal
processing techniques are very well developed to characterize vowel
signals speech signals, such as finding formant through Linear
Predictive Coding (LPC). The characterization of consonants has been a
difficult problem because consonant waveforms may be noise-like in
time or frequency domain.
The purpose of this research is to develop a signal processing
technology for quantitative parametric representations on Mandarin
consonant speech signals. This pilot research is focused on
fricative/affricative palatal consonants (ㄐ/j/ and ㄑ/q/ as well as ㄒ/x/ ),
which have long unvoiced speech signals. This research study is divided
into four phases. The first phase is to extract Mandarin phonemes from
Mandarin spontaneous speech corpus TCC300. All phoneme signals of
corpus TCC300 are recognized through HTK (Hidden Markov Model
ToolKit) and the phonemes are extracted from the recognition results.
The second phase is to extract the stable waveform region of
consonants. The discrete signal processing of normalization, end-point
detection, detrend and windowing are applied to get the stable region.
The third phase is to analyze quantitatively the parameters for the three
consonants. The spectrum density of the consonant is divided into three
portions. In spectral below 2 kHz, F1 formant is calculated for
parameter representation. In spectral between 2 kHz and 6 kHz, spectral
III
peak is calculated for parameter representation; In spectral above 2 kHz,
spectral flatness of spectrum is calculated for another parameter feature.
The fourth phase is to transform each consonant waveform signal to 39
MFCCs (Mel-Frequency Cepstrum Coefficient) through HTK.
Discriminant Analysis is used for characterizing the parameters by
giving classification scores. The results of this study showed that the
classification score of parameters is 17.8% (original grouped cases) and
11.7% (cross-validated grouped) below MFCC’s scores.
[1] T. F. Quatieri, Discrete-Time Speech Signal Processing: Principles and
Practice, Prentice Hall PTR, 2002.
[2] D. O'Shaughnessy, Speech Communication: Human and Machine,
Addison-Wesley Publishing Company, 1987.
[3] R. D. Kent and C. Read, Acoustic Analysis of Speech, Singular Thomson
Learning, 2002.
[4] R. D. Kent, The speech sciences, Singular Pub. Group, 1997.
[5] 張民賢, "Quantitative Analysis of Vowels for School Aged Children in
Taiwan," 2007.
[6] S. Young, "The HTK Book," 2006.
[7] J. Coleman, "Introducing Speech and Language Processing," 2005.
[8] Jo-Tong Chen, "Electrophysiological Analysis of the Spontaneous
Electrical Activity for Characterizing the Sensitivity of Myofascial
Trigger Points of Myofascial Pain Syndrome," 2001.
[9] Icon Learning Systems LLC, “Interactive Atlas of Human Anatomy”
version 3.0, 2003
[10] W. R. Zemlin, "Speech and Hearing Science - Anatomy and
Physiology," 4th ed., 1998.
[11] F. J. Owens, "Signal Processing of Speech," 1993