| 研究生: |
吳維彥 Wu, Wei-Yen |
|---|---|
| 論文名稱: |
應用不定長度特徵之條件隨機域於口語不流暢語流修正模型 Disfluency Correction of Spontaneous Speech using Conditional Random Fields with Variable Length Features |
| 指導教授: |
吳宗憲
Wu, Chung-Hsien |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2006 |
| 畢業學年度: | 94 |
| 語文別: | 中文 |
| 論文頁數: | 69 |
| 中文關鍵詞: | 口語化語音 、條件隨機域 |
| 外文關鍵詞: | disfluency, spontaneous, conditional random fields |
| 相關次數: | 點閱:63 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近幾年來,語音辨認技術已臻於成熟,然而要實際應用,必須考慮口語化語音中之不流暢語流(disfluency)現象。目前修正不流暢語流之研究大都侷限於某種型別或無法整合多種知識來源。因此,如何整合多種知識來源並同時修正三種主要的不流暢語流型別為一值得研究的主題。
本論文提出一不定長度特徵之條件隨機域統計模型,利用狀態轉移特徵函數、觀測特徵函數以及相對應之參數,針對不流暢語流進行修正。其中觀測特徵函數可整合多種知識來源,包括前後文相關特徵、不流暢相關特徵以及圖樣符合相關特徵。在狀態方面我們使用可變動長度單位,包括詞、字元串集(chunk)以及句子三種不同的狀態單元。字元串集的構成是以Apriori演算法根據詞頻以及共現頻率找出,我們以動詞配合必要論元為一句子構成之組成要素。最後,我們以IIS演算法來做參數估測。
在評估本論文提出的方法部份,我們使用中研院提供之現代漢語口語對話語料庫做為訓練以及測試語料。其中被修正詞(editing word)錯誤率為17.3%,相對於DF-gram、HMM、最大熵以及N-gram加校正之混合模型的方法分別降低了11.7%、8.7%、8%以及3.9%。在給定中斷點的情況下,被修正詞錯誤率為6.1%。實驗證明本論文所提之模型優於其他方法,並可有效偵測並修正口述語言中之不流暢語流。
Recently, the speech recognition technologies are close to maturity. However, edit difluency in spontaneous speech should be considered as an important issue for practical application. Most of researches on edit disfluency either focus on specific edit disfluecy type or does not integrate multiple knowledge sources jointly. Therefore, central to this issue is how to detect and correct the three categories of edit disfluency with multiple knowledge sources.
In this thesis we propose a conditional random fields with variable length model to detect and correct edit disfluency, which is composed of state transition function and observation function. The observation feature functions consist of context related, disfluency related and pattern related features. Three variable-length units, word, chunk and sentence are employed as states of state transition feature functions. Chunk is extracted by Apriori algorithm according to words co-occurrence and term frequency. Sentence is identified according to the verb with corresponding necessary arguments. Finally, the improved iterative scaling (IIS) algorithm is adopted for estimating the weights.
For the evaluation of the proposed method, Mandarin conversational dialogue corpus (MCDC) is used as the spontaneous corpus. The detect error rate of edit word is 17.3%. Compared with DF-GRAM, Maximum Entropy and the approach combining language model and alignment model, the proposed approach achieved 11.7%, 8% and 3.9% improvements, respectively. The experimental results show that the proposed model outperforms other methods and efficiently detects and corrects edit disfluency in spontaneous speech.
[Bear, 1992] Bear, J., J. Dowding, and E. Shriberg, “Integrating multiple knowledge sources for detecting and correction of repairs in human computer dialog,” in Proc. of ACL, 1992, pp. 56–63.
[Berger, 1996] Berger, A. L., S. A. D. Pietra and V. J. D. Pietra” A maximum
entropy approach to natural language processing.” Computational Linguistics, Vol. 22, (1996). 39-72.
[Byrne, 2004] Byrne, W., D. Doermann, M. Franz, S. Gustman, J. Hajic, D. Oard, M. Picheny, J.Psutka, B. Ramabhadran, D. Soergel, T. Ward, and Z. Wei-Jin, "AutomaticRecognition of Spontaneous Speech for Access to Multilingual Oral HistoryArchives," IEEE Trans. on Speech and Audio Processing, Vol. 12, No. 4, pp.420-435, 2004.
[Charniak, 2001] Charniak, E. and M. Johnson. “Edit detection and parsing for transcribed speech,” In Proceedings of the North American Chapter of the Association for Computational Linguistics annual meeting, pages 118--126, 2001.
[Chien, 2005] Chien, Jen-Tzung,”Association Pattern Language Modeling.” IEEE Transactions on Audio, Speech, and Language Processing : Accepted for future publication Volume PP, Issue 99, 2005 Page(s):1-10
[Furui, 2000] Furui, S., K. Maekawa, H. Isahara, T. Shinozaki and T. Ohdaira
“Toward the realization of spontaneous speech recognition – Introduction of a Japanese priority program and preliminary results –“, Proc. ICSLP2000, Beijing.
[Heeman, 1996] Heeman, P. A., K. Loken-Kim, J. F. Allen, “Combining the
detection andcorrection of speech repairs.” In Proceedings of the 4rd International Conferenceon Spoken Language Processing (ICSLP-96), Oct. 1996, pp. 358--361.
[Honal, 2003] Honal, M., and T. Schultz, , "Correction of disfluencies in spontaneous speech using a noisy-channel approach," In EUROSPEECH-2003, 2781-2784.
[Honal, 2005] Honal, M., T. Schultz, “Automatic disfluency removal recogniz- ed spontaneous. speech. –rapid adaptation to speaker-dependent disfluen- cies,“ ICASSP, 969-972, 2005.
[Huang, 2002] Huang, J. and G. Zweig,” Maximum Entropy Model for
Punctuation Annotation from Speech.” In Proceedings of ICSLP 2002, (2002). 917-920.
[Johnson, 2004] Johnson, M., and E. Charniak,” A TAG-based noisy channel
model of speech repairs.” in Proc. of ACL 2004, (2004). 33-39.
[Kahn, 2004] Kahn, J.G., , M. Ostendorf and C. Chelba” Parsing
Conversational Speech Using Enhanced Segmentation.” Proc. HLT-NAACL, 2004. pp. 125-128.
[Kim,2004] Kim, J., S. E. Schwarm, and M. Ostendorf,” Detecting structural
metadata with decision trees and transformation-based learning.” Proceedings of HLT/NAACL 2004, (2004), 137–144.
[Lafferty, 2001] Lafferty, J., A. McCallum, and F. Pereira. “Conditional
random fields: probabilistic models for segmenting and labeling sequence data.” In ICML, 2001.
[Lin, 2005] Lin, Che-Kuang , Tseng, Shu-Chuan , Lee, Lin-Shan, "Important and new features with analysis for disfluency interruption point (IP) detection in spontaneous Mandarin speech", In DiSS-2005, 117-121.
[Linguistic Data Consortium, 2004] Linguistic Data Consortium, ”Simple Metadata Annotation Specification Version 6.2 ,“ February 3, 2004.
[Liu, 2003] Liu, Y., E. Shriberg, and A. Stolcke. “Automatic disfluency identification in conversational speech using multiple knowledge sources,” In Proc. Eurospeech, volume 1, pages 957—960, 2003.
[Liu, 2005] Liu, Y., A. Stolcke, E. Shriberg and M. Harper” Using Conditional
Random Fields for Sentence Boundary Detection in Speech. In Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics” ACL 2005, (2005)
[Liu, 2005] Liu, Y., E. Shriberg, A. Stolcke, M. Harper“Comparing HMM,
Maximum Entropy, and Conditional Random Fields for Disfluency Detection.” Eurospeech 2005.
[MAT Speech Database] MAT Speech Database – TCC-300
(http://rocling.iis.sinica.edu.tw/ROCLING/MAT/Tcc_300brief.htm)
[Nakatani,1993] Nakatani, C., and J. Hirschberg,”A speech-first model for
repair detection and correction.” Proceedings of the 31 Annual Meeting of the Association for Computational Linguistics, (1993) 46-53.
[Nakatani, 1994] Nakatani, C. and J. Hirschberg. “A corpus-based study of repair cues in spontaneous speech.” Journal of the Acoustical Society of America, pages 1603--1616, 1994.
[Ostendorf, 2005] Ostendorf, M., E. shriberg, Andreas Stolcke, "Human Language Technology: Opportunities And Challenges," ICASSP, 949-952,2005.
[Peters, 2003] Peters, J.. “LM Studies on Filled Pauses in Spontaneous
Medical Dictation.” In Proceedings of HLT-NAACL, 2003.
[Stolcke, 1996] Stolcke, A. and E. Shriberg. “Statistical language modeling
for speech disfluencies”. In Proceedings of the International Conference of Acoustics, Speech, and Signal Processing, 1996.
[Stolcke, 1998] Stolcke, A., E. Shriberg, R. Bates, M. Ostendorf, D. Hakkani, M. Plauche, G. Tur, and Y. Lu.” Automatic detection of sentence boundaries and disfluencies based on recognized words,” In Proc. International Conference on Spoken Language Processing, pages 2247--2250, 1998.
[Stolcke, 2004] Stolcke, A., W. Wang, D. Vergyri, V. R. R. Gadde, and J. Zheng, "An efficient repair procedure for quick transcriptions," in Proc. Intl. Conf. Spoken Language Processing, (Jeju, Korea), October 2004.
[Snover, 2004] Snover, M., B. Dorr, and R. Schwartz. “A lexically-driven algorithm for disfluency detection”. In Proceedings of Human Language Technology Conference / North American Chapter of the Association for Computational Linguistics annual meeting, 2004.
[Sha, 2003] Sha, F. and F. Pereira. “Shallow parsing with conditional random
fields.” In Proceedings of Human Language Technology, NAACL, 2003.
[Soltau, 2004] Soltau, H., , B. Kingsbury, , L. Mangu, , D. Povey, , G. Saon, ,
and D. Zweig, ” The IBM 2004 Conversational Telephony System for Rich Transcription.” In Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005 (ICASSP '05). (2005), 205-208.
[Tseng, 2003] Tseng, S.-C, “ Repairs and Repetitions in Spontaneous Mandarin,“ In Proceedings of Workshop on Disfluency in Spontaneous Speech (DISS 03). Ed. Robert Eklund. Gothenburg Papers in Theoretical Linguistics 90. Pp. 71-74. University of Gothenburg.
[EARS, 2004] The EARS Fall 2004 Rich Transcription Evaluation Plan
August 30, 2004。
[Tseng, 2002] Tseng, S.-C. and Liu, Y.-F.: Annotation of Mandarin
Conversational Dialogue Corpus. CKIP Technical Report no. 02-01.” Academia Sinica. (2002).
[Wang, 2005] Wang, Y., J. Lee, M. Mahajan and A. Acero, “Statistical Spoken Language Understanding: from Generative Model to Conditional Model,” In NIPS Workshop: Advances in Structured Learning for Text and Speech Processing. December 2005.
[Wu, 2005] Wu, Chung-Hsien; Gwo-Lang Yan. “Speech act modeling and
verification of spontaneous speech with disfluency in a spoken dialogue system”. In Speech and Audio Processing, IEEE Transactions on Volume 13, Issue 3, May 2005 Page(s):330 – 344.
[Wallach, 2002] Wallach, H., "Efficient training of conditional random
fields." Master's thesis, University of Edinburgh, 2002.
[Yeh, 2006] Yeh, Jui-Feng and Chung-Hsien Wu.“Edit Disfluency Detection
and Correction Using a Cleanup Language Model and an Alignment Model,” accepted by IEEE Trans. Audio, Speech, and Language Processing, 2006.
[Young, 2003] Young, S. J., G. Evermann, T. Hain, D. Kershaw, G. L. Moore, J. J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. C. Woodland: The HTK Book. Cambridge, U.K.: Cambridge Univ. Press, (2003).
[羅應順, 2005] 羅應順,自發性中文語音基本辨認系統之建立,國立交通大學電信工程所碩士論文,民國94年。
[徐文翰,2004] 徐文翰,自發性對話語音辨識之初步研究,國立交通大學電信工程所碩士論文,民國93年。
[錢鐸樟, 2005] 錢鐸樟,以最大熵準則結合語音及語言特徵於語音辨識之
研究,民國94年。
[中研院詞庫小組, 2004] 中研院詞庫小組技術報告93-05中文詞類分析。
[陳鳳儀, 1999] 陳鳳儀、蔡碧芳、陳克健、黃居仁. 1999. 中文句結構樹
資料庫的構建(Sinica Treebank)。Computational Linguistics and Chinese Language Processing, Vol. 4, No. 2. pp.87-104.