| 研究生: |
康育楷 Kang, Yu-Kai |
|---|---|
| 論文名稱: |
自發性語音辨識中音節合併現象之偵測與修正 Detection and Correction of Syllable Contraction in Spontaneous Speech Recognition |
| 指導教授: |
吳宗憲
Wu, Chung-Hsien |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2008 |
| 畢業學年度: | 96 |
| 語文別: | 中文 |
| 論文頁數: | 80 |
| 中文關鍵詞: | 音長資訊 、音節合併 、自發性語音辨識 |
| 外文關鍵詞: | spontaneous speech recognition, duration information, syllable contraction |
| 相關次數: | 點閱:116 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近幾年來,自動語音辨識器在朗讀式語音的技術已經臻於成熟,然而應用於實際生活中口語對話中,因文字並不會被字正腔圓地唸出來,使得ASR的效能大幅降低。在造成自發性口語辨識效能降低的許多因素裡,其中因為語者說話的加速而導致音節發音長度縮短,甚至是產生音節合併現象,使得ASR的辨識結果錯誤而無法被閱讀。
本論文目的要偵測與修正音節合併的現象,提出利用重新調整音節邊界並依其出現頻率與排序來選擇適合的音節合併候選音,藉由候選音來偵測音節合併現象。經由語音辨識結果所建立的word graph,在偵測出可能之音節合併的地方加入音節合併詞彙,藉由圖型模型(graphical model)的理論下,提出使用音節合併音長模型,其中考慮音節合併造成聲學模型內音框狀態匹配長度分布,和對於音節合併可能產生變異的條件機率,重新計算各字詞的事後機率,以期修正最後的答案。
在實驗部份,論文方法使用中研院所錄製的現代漢語口語對話語料庫做為評估語料。對音節合併詞彙部分的正確率約提昇22%,論文方法同時使用多音節升學模型約可提升41%。最後的音節和詞彙辨識率分別提升約1.7%、2.3%,論文方法同時使用多音節聲學模型可以改善2.9%和3.6%,表示使用音節合併音長模型確實有助於修正音節合併導致的辨識錯誤,提升最後的辨識率。
Recently, automatic speech recognition (ASR) technology for read speech has attained a high level of maturity. However, in spontaneous conversation, the performance of ASR is degraded by certain human habits, such as a rapid speaking rate. This results in shorter syllable durations in spontaneous speech when compared with read speech, which can lead to the syllable contraction (SC) phenomenon.
The goal of this thesis is to detect and correct errors caused by SC. We propose an approach which relaxes word boundaries to obtain probable SC candidates, which are used to detect syllable contraction. After extending the word graph with SC words, we propose a graphical-model-based approach to rescore all probable paths. This approach includes an acoustic model, a language model and a syllable contraction duration model (SCDM), which includes SC duration information (SCDI) and the SC conditional probability. After rescoring, the correction is obtained by finding the best path in the word graph.
The proposed approach was evaluated on the Mandarin Conversational Dialogue Corpus (MCDC), which was collected and annotated by Sinica. The recall rate on SC word correction was improved by about 22% using the SCDM alone and by about 41% using the approach combining the SCDM with a syllable pair acoustic model (SPAM). The improvement in syllable and word recognition rates was 1.7% and 2.3%, respectively, using the SCDM alone and 2.9% and 3.6%, respectively, using the approach combining the SCDM with the SPAM. The experimental results show our approach can be used to detect and correct contracted syllables in spontaneous speech.
[ 1] E. Fosler-Lussier and Nelson Morgan, “Effects Of Speaking Rate And Word Frequency On Conversation Pronunciations,”Speech Communication, vol. 29, pp.137-158, 1999
[ 2] S.-C Tseng, and Y.-F. Liu, “Annotation of Mandarin Conversational Dialogue Corpus,” CKIP Technical Report, No. 02-01, Academia Sinica, 2002
[ 3] J. Berstein, G. Baldwin, W. Cohen, H. Murveit, and M Weintraub, “Phonological studies for speech recognition,” In DARPA Speech Recognition Workshop, pp. 41-48, 1992
[ 4] S.-C Tseng, “Contracted Syllables in Mandarin: Evidence from Spontaneous Conversation,” Journal of Language and Linguistics, pp. 153-180, 2005
[ 5] S.-C Tseng , “Features of Contracted Syllables of Spontaneous Mandarin,” in the Proc. of EUROSPEECH2003, pp. 77-80, 2003
[ 6] S.-C Tseng, “Syllable Contraction in a Mandarin Conversation Dialogue Corpus,” International Journal of Corpus Linguistics, pp. 63-83, 2005
[ 7] Robert L. Cheng, “Sub-syllable Morphemes in Taieanese,” Journal of Chinese Linguistics, vol. 13 1985, pp. 141-144
[ 8] Charles Li and Sandra Thompson, Mandarin Chinese: A Functional Reference Grammar, University of California Press, 1981
[ 9] Chung, Raung-Fu., Syllable contraction in Chinese. Chinese Language and Linguistics III: Morphology and Lexicon, ed. By Feng-fu Tsao and H. Samuel Wang. Taipei: Institute of History and Philology, Academia Sinica, 1997
[10] D. Jurafsky, A. Bell, M.Gregory, and W.D. Raymond, “The Effect of Language Model Probability on Pronunciation Reduction,” in the Proc. of IEEE ICASSP, pp. 801-804, 2001
[11] M.-Y. Tsai, F.-C. Chou, and L.-S. Lee, “Pronunciation Modeling With Reduced Confusion for Mandarin Chinese Using Three-Stage Framework,” IEEE Transaction on Audio, Speech and Language Processing, pp. , 2007
[12] M. Weintraub, E. Fosler, C. Galles, Y.-H. Kao, S. Khudanpur, M. Saraclar, and S. Wegmann, “WS96 project report: Automatic learning of word pronunciation from data,” presented at the JHU Workshop Pronunciaion Group, 1996.
[13] T. Holter and T. Svendsen, “Maximum likelihood modeling of pronunciation variation,” Speech Commun., vol. 29 pp. 177-191, 1999.
[14] M. Finke and A. Waibel, “Flexible transcription alignment,” in Automatic Speech Recognition and Understanding Workshop, 1997, pp. 34-40.
[15] N. Cremelie and J.-P. Martens, “In search of better pronunciation models for speech recognition,” Speech Commun., vol 29, pp. 115-136, 1999
[16] E. Fosler-Lussier, “Multi-level decision tree for static and dynamic pronunciation models,” in Eur. Conf. Speech Commun. Technol., 1999, pp. 463-466.
[17] Yi Liu, and Pascale Fung, “Pronunciation Modeling for Spontaneous Mandarin Speech Recognition,” International Journal of Speech Technology, 2004
[18] Saraclar, M., Nock, H., and Khudanpur, S. “Pronunciation modeling by sharing Gaussian densities across phonetic models.” Computer Speech and Language, 14:137–160. 2000
[19] Saraclar, M. and Khudanpur, S. “Pronunciation ambiguity vs. pronunciation variability in speech recognition.” ICASSP’00 Proceedings. Istanbul Turkey: ICASSP, pp. 1679–1682. 2000
[20] Saraclar, M. “Pronunciation modeling for conversational speech recognition.” PhD thesis, The Johns Hopkins University, Baltimore, MD, 2000
[21] L.-Y. Sun, and Y.-R. Wang, “An Analysis Modeling of Syllable Contraction in Spontaneous Mandarin Speech Recognition,” Master Thesis, Dept. of Communication Engineering, NCTU, Taiwan, 2004
[22] Y.-S. Lo, and S.-H. Chen, “An Implementation of Spontaneous Mandarin Speech Recognition Baseline System,” Master Thesis, Dept. of Communication Engineering, NCTU, Taiwan, 2005
[23] S. Ortmanns, H. Ney, and X. Aubert, “A Word Graph Algorithm for Large Vocabulary Continuous Speech Recognition,” Computer Speech and Language, pp. 43-72, 1997
[24] F. Wessel, R. Schl er, and H. Ney, sing posterior probabilities for improved speech recognition, in Proc. IEEE Int. Conf. Aucoustic, Speech, Signal Processing 2000, Istanbul, Turkey, June 2000, pp. 1857-1590
[25] B. Rueber, btaining confidence measures from sentence probabilities, in Proc. 5th Eur, Conf. Speech Communication Technology 1997.
[26] Z.-Y. Zhou, Helen Meng, and W.-K. Lo, “A Multi-Pass Error Dection and Correction Framework for Mandarin LVCSR,” in the Proc. of IEEE ICSLP, pp. , 2006
[27] F.K. Soong, W.-K. Lo, and S. Nakamura, “Generalized Word Posterior Probability For Measuring Reliability of Recognized Word,” in the Proc. of SWIM2004, 2004
[28] Michael I. Jordan, “An Introduction to Probabilistic Graphical Models,” MIT Press, 1999
[29] X. Huang, Alex Acero, and H.-W. Hon “Spoken Language Processing” page 558 2001
[30] MAT Speech Database – TCC-300
(http://rocling.iis.sinica.edu.tw/ROCLING/MAT/Tcc_300brief.htm)
[31] S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, P. Woodland, “HTK Book,” for HTK Version 3.4, 2006