簡易檢索 / 詳目顯示

研究生: 吳維彥
Wu, Wei-Yen
論文名稱: 應用不定長度特徵之條件隨機域於口語不流暢語流修正模型
Disfluency Correction of Spontaneous Speech using Conditional Random Fields with Variable Length Features
指導教授: 吳宗憲
Wu, Chung-Hsien
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2006
畢業學年度: 94
語文別: 中文
論文頁數: 69
中文關鍵詞: 口語化語音條件隨機域
外文關鍵詞: disfluency, spontaneous, conditional random fields
相關次數: 點閱:63下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近幾年來,語音辨認技術已臻於成熟,然而要實際應用,必須考慮口語化語音中之不流暢語流(disfluency)現象。目前修正不流暢語流之研究大都侷限於某種型別或無法整合多種知識來源。因此,如何整合多種知識來源並同時修正三種主要的不流暢語流型別為一值得研究的主題。
    本論文提出一不定長度特徵之條件隨機域統計模型,利用狀態轉移特徵函數、觀測特徵函數以及相對應之參數,針對不流暢語流進行修正。其中觀測特徵函數可整合多種知識來源,包括前後文相關特徵、不流暢相關特徵以及圖樣符合相關特徵。在狀態方面我們使用可變動長度單位,包括詞、字元串集(chunk)以及句子三種不同的狀態單元。字元串集的構成是以Apriori演算法根據詞頻以及共現頻率找出,我們以動詞配合必要論元為一句子構成之組成要素。最後,我們以IIS演算法來做參數估測。
    在評估本論文提出的方法部份,我們使用中研院提供之現代漢語口語對話語料庫做為訓練以及測試語料。其中被修正詞(editing word)錯誤率為17.3%,相對於DF-gram、HMM、最大熵以及N-gram加校正之混合模型的方法分別降低了11.7%、8.7%、8%以及3.9%。在給定中斷點的情況下,被修正詞錯誤率為6.1%。實驗證明本論文所提之模型優於其他方法,並可有效偵測並修正口述語言中之不流暢語流。

    Recently, the speech recognition technologies are close to maturity. However, edit difluency in spontaneous speech should be considered as an important issue for practical application. Most of researches on edit disfluency either focus on specific edit disfluecy type or does not integrate multiple knowledge sources jointly. Therefore, central to this issue is how to detect and correct the three categories of edit disfluency with multiple knowledge sources.
    In this thesis we propose a conditional random fields with variable length model to detect and correct edit disfluency, which is composed of state transition function and observation function. The observation feature functions consist of context related, disfluency related and pattern related features. Three variable-length units, word, chunk and sentence are employed as states of state transition feature functions. Chunk is extracted by Apriori algorithm according to words co-occurrence and term frequency. Sentence is identified according to the verb with corresponding necessary arguments. Finally, the improved iterative scaling (IIS) algorithm is adopted for estimating the weights.
    For the evaluation of the proposed method, Mandarin conversational dialogue corpus (MCDC) is used as the spontaneous corpus. The detect error rate of edit word is 17.3%. Compared with DF-GRAM, Maximum Entropy and the approach combining language model and alignment model, the proposed approach achieved 11.7%, 8% and 3.9% improvements, respectively. The experimental results show that the proposed model outperforms other methods and efficiently detects and corrects edit disfluency in spontaneous speech.

    致謝 iii 目錄 iv 表目錄 vi 圖目錄 vii 第一章 緒論……………………………………………1 1.1 背景說明………………………………………1 1.2 研究動機與目的………………………………2 1.3 研究方法簡介…………………………………5 1.4 章節概述………………………………………5 第二章 系統架構………………………………………7 第三章 不定長度特徵之條件隨機域…………………11 3.1 隨機域…………………………………………11 3.2 條件隨機域……………………………………12 3.3 不定長度特徵之條件隨機域…………………17 第四章 子特徵推導演算法與層次……………………20 4.1 子特徵推導演算法……………………………20 4.2 層次之建立……………………………………24 4.2.1 各層次之特徵…………………………………25 第五章 參數估算………………………………………30 5.1 最大似然法參數估測…………………………30 5.2 條件隨機域之最大似然法估測………………30 5.2.1 IIS演算法……………………………………32 第六章 實驗與討論……………………………………36 6.1 實驗設定………………………………………36 6.2 現代漢語口語對話語音語料庫(MCDC)……36 6.3 基線系統(Baseline system)………………38 6.3.1 最大熵(Maximum Entropy)…………………39 6.3.2 結合語言模型與校正模型……………………41 6.3.3 不流暢語流語言模型(DF language model)…44 6.3.4 隱藏式馬可夫模型(Hidden Markov Model)…45 6.4 系統評估分析…………………………………………46 第七章 結論與未來方向………………………………60 7.1 結論……………………………………………60 7.2 未來研究方向…………………………………61 參考文獻……………………………………………………63

    [Bear, 1992] Bear, J., J. Dowding, and E. Shriberg, “Integrating multiple knowledge sources for detecting and correction of repairs in human computer dialog,” in Proc. of ACL, 1992, pp. 56–63.
    [Berger, 1996] Berger, A. L., S. A. D. Pietra and V. J. D. Pietra” A maximum
    entropy approach to natural language processing.” Computational Linguistics, Vol. 22, (1996). 39-72.
    [Byrne, 2004] Byrne, W., D. Doermann, M. Franz, S. Gustman, J. Hajic, D. Oard, M. Picheny, J.Psutka, B. Ramabhadran, D. Soergel, T. Ward, and Z. Wei-Jin, "AutomaticRecognition of Spontaneous Speech for Access to Multilingual Oral HistoryArchives," IEEE Trans. on Speech and Audio Processing, Vol. 12, No. 4, pp.420-435, 2004.
    [Charniak, 2001] Charniak, E. and M. Johnson. “Edit detection and parsing for transcribed speech,” In Proceedings of the North American Chapter of the Association for Computational Linguistics annual meeting, pages 118--126, 2001.
    [Chien, 2005] Chien, Jen-Tzung,”Association Pattern Language Modeling.” IEEE Transactions on Audio, Speech, and Language Processing : Accepted for future publication Volume PP, Issue 99, 2005 Page(s):1-10
    [Furui, 2000] Furui, S., K. Maekawa, H. Isahara, T. Shinozaki and T. Ohdaira
    “Toward the realization of spontaneous speech recognition – Introduction of a Japanese priority program and preliminary results –“, Proc. ICSLP2000, Beijing.
    [Heeman, 1996] Heeman, P. A., K. Loken-Kim, J. F. Allen, “Combining the
    detection andcorrection of speech repairs.” In Proceedings of the 4rd International Conferenceon Spoken Language Processing (ICSLP-96), Oct. 1996, pp. 358--361.
    [Honal, 2003] Honal, M., and T. Schultz, , "Correction of disfluencies in spontaneous speech using a noisy-channel approach," In EUROSPEECH-2003, 2781-2784.
    [Honal, 2005] Honal, M., T. Schultz, “Automatic disfluency removal recogniz- ed spontaneous. speech. –rapid adaptation to speaker-dependent disfluen- cies,“ ICASSP, 969-972, 2005.
    [Huang, 2002] Huang, J. and G. Zweig,” Maximum Entropy Model for
    Punctuation Annotation from Speech.” In Proceedings of ICSLP 2002, (2002). 917-920.
    [Johnson, 2004] Johnson, M., and E. Charniak,” A TAG-based noisy channel
    model of speech repairs.” in Proc. of ACL 2004, (2004). 33-39.
    [Kahn, 2004] Kahn, J.G., , M. Ostendorf and C. Chelba” Parsing
    Conversational Speech Using Enhanced Segmentation.” Proc. HLT-NAACL, 2004. pp. 125-128.
    [Kim,2004] Kim, J., S. E. Schwarm, and M. Ostendorf,” Detecting structural
    metadata with decision trees and transformation-based learning.” Proceedings of HLT/NAACL 2004, (2004), 137–144.
    [Lafferty, 2001] Lafferty, J., A. McCallum, and F. Pereira. “Conditional
    random fields: probabilistic models for segmenting and labeling sequence data.” In ICML, 2001.
    [Lin, 2005] Lin, Che-Kuang , Tseng, Shu-Chuan , Lee, Lin-Shan, "Important and new features with analysis for disfluency interruption point (IP) detection in spontaneous Mandarin speech", In DiSS-2005, 117-121.
    [Linguistic Data Consortium, 2004] Linguistic Data Consortium, ”Simple Metadata Annotation Specification Version 6.2 ,“ February 3, 2004.
    [Liu, 2003] Liu, Y., E. Shriberg, and A. Stolcke. “Automatic disfluency identification in conversational speech using multiple knowledge sources,” In Proc. Eurospeech, volume 1, pages 957—960, 2003.
    [Liu, 2005] Liu, Y., A. Stolcke, E. Shriberg and M. Harper” Using Conditional
    Random Fields for Sentence Boundary Detection in Speech. In Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics” ACL 2005, (2005)
    [Liu, 2005] Liu, Y., E. Shriberg, A. Stolcke, M. Harper“Comparing HMM,
    Maximum Entropy, and Conditional Random Fields for Disfluency Detection.” Eurospeech 2005.
    [MAT Speech Database] MAT Speech Database – TCC-300
    (http://rocling.iis.sinica.edu.tw/ROCLING/MAT/Tcc_300brief.htm)
    [Nakatani,1993] Nakatani, C., and J. Hirschberg,”A speech-first model for
    repair detection and correction.” Proceedings of the 31 Annual Meeting of the Association for Computational Linguistics, (1993) 46-53.
    [Nakatani, 1994] Nakatani, C. and J. Hirschberg. “A corpus-based study of repair cues in spontaneous speech.” Journal of the Acoustical Society of America, pages 1603--1616, 1994.
    [Ostendorf, 2005] Ostendorf, M., E. shriberg, Andreas Stolcke, "Human Language Technology: Opportunities And Challenges," ICASSP, 949-952,2005.
    [Peters, 2003] Peters, J.. “LM Studies on Filled Pauses in Spontaneous
    Medical Dictation.” In Proceedings of HLT-NAACL, 2003.
    [Stolcke, 1996] Stolcke, A. and E. Shriberg. “Statistical language modeling
    for speech disfluencies”. In Proceedings of the International Conference of Acoustics, Speech, and Signal Processing, 1996.
    [Stolcke, 1998] Stolcke, A., E. Shriberg, R. Bates, M. Ostendorf, D. Hakkani, M. Plauche, G. Tur, and Y. Lu.” Automatic detection of sentence boundaries and disfluencies based on recognized words,” In Proc. International Conference on Spoken Language Processing, pages 2247--2250, 1998.
    [Stolcke, 2004] Stolcke, A., W. Wang, D. Vergyri, V. R. R. Gadde, and J. Zheng, "An efficient repair procedure for quick transcriptions," in Proc. Intl. Conf. Spoken Language Processing, (Jeju, Korea), October 2004.
    [Snover, 2004] Snover, M., B. Dorr, and R. Schwartz. “A lexically-driven algorithm for disfluency detection”. In Proceedings of Human Language Technology Conference / North American Chapter of the Association for Computational Linguistics annual meeting, 2004.
    [Sha, 2003] Sha, F. and F. Pereira. “Shallow parsing with conditional random
    fields.” In Proceedings of Human Language Technology, NAACL, 2003.
    [Soltau, 2004] Soltau, H., , B. Kingsbury, , L. Mangu, , D. Povey, , G. Saon, ,
    and D. Zweig, ” The IBM 2004 Conversational Telephony System for Rich Transcription.” In Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005 (ICASSP '05). (2005), 205-208.
    [Tseng, 2003] Tseng, S.-C, “ Repairs and Repetitions in Spontaneous Mandarin,“ In Proceedings of Workshop on Disfluency in Spontaneous Speech (DISS 03). Ed. Robert Eklund. Gothenburg Papers in Theoretical Linguistics 90. Pp. 71-74. University of Gothenburg.
    [EARS, 2004] The EARS Fall 2004 Rich Transcription Evaluation Plan
    August 30, 2004。
    [Tseng, 2002] Tseng, S.-C. and Liu, Y.-F.: Annotation of Mandarin
    Conversational Dialogue Corpus. CKIP Technical Report no. 02-01.” Academia Sinica. (2002).
    [Wang, 2005] Wang, Y., J. Lee, M. Mahajan and A. Acero, “Statistical Spoken Language Understanding: from Generative Model to Conditional Model,” In NIPS Workshop: Advances in Structured Learning for Text and Speech Processing. December 2005.
    [Wu, 2005] Wu, Chung-Hsien; Gwo-Lang Yan. “Speech act modeling and
    verification of spontaneous speech with disfluency in a spoken dialogue system”. In Speech and Audio Processing, IEEE Transactions on Volume 13, Issue 3, May 2005 Page(s):330 – 344.
    [Wallach, 2002] Wallach, H., "Efficient training of conditional random
    fields." Master's thesis, University of Edinburgh, 2002.
    [Yeh, 2006] Yeh, Jui-Feng and Chung-Hsien Wu.“Edit Disfluency Detection
    and Correction Using a Cleanup Language Model and an Alignment Model,” accepted by IEEE Trans. Audio, Speech, and Language Processing, 2006.
    [Young, 2003] Young, S. J., G. Evermann, T. Hain, D. Kershaw, G. L. Moore, J. J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. C. Woodland: The HTK Book. Cambridge, U.K.: Cambridge Univ. Press, (2003).
    [羅應順, 2005] 羅應順,自發性中文語音基本辨認系統之建立,國立交通大學電信工程所碩士論文,民國94年。
    [徐文翰,2004] 徐文翰,自發性對話語音辨識之初步研究,國立交通大學電信工程所碩士論文,民國93年。
    [錢鐸樟, 2005] 錢鐸樟,以最大熵準則結合語音及語言特徵於語音辨識之
    研究,民國94年。
    [中研院詞庫小組, 2004] 中研院詞庫小組技術報告93-05中文詞類分析。
    [陳鳳儀, 1999] 陳鳳儀、蔡碧芳、陳克健、黃居仁. 1999. 中文句結構樹
    資料庫的構建(Sinica Treebank)。Computational Linguistics and Chinese Language Processing, Vol. 4, No. 2. pp.87-104.

    下載圖示 校內:立即公開
    校外:2006-07-17公開
    QR CODE