研究生: |
申孟修 Sheng, Meng-Hsiu |
---|---|
論文名稱: |
強健性對話行為偵測於潛在錯誤感知與管理之研究 Robust Dialog Act Detection for Potential Error Awareness and Handling |
指導教授: |
吳宗憲
Wu, Chun-Hsien |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2011 |
畢業學年度: | 99 |
語文別: | 英文 |
論文頁數: | 77 |
中文關鍵詞: | 對話行為 、對話管理 、口語對話系統 |
外文關鍵詞: | Dialog Act, Dialog Management, Spoken Dialog System |
相關次數: | 點閱:82 下載:1 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著電腦計算能力的大幅提升和語音辨識技術的進展,以自然語音為溝通媒介之口語對話系統已被廣泛使用。然而因為不完美的語音辨識結果而造成系統錯誤回應是很常見的。因此,本論文訴求在於如何能讓系統基於不完整的辨識結果來正確地分析使用者的對話行為,並且藉由對話行為察覺潛在的交談錯誤,進而實行修復策略來完成使用者目標。
在本論文中,我們分別對於三種對話行為模式提出相對應的解決方法,包括一般模式、錯誤理解模式和無法理解模式。我們所提出的方法是基於語料中的譯文檔和辨識結果而建立,其目的是將語音辨識後的差異也納入考慮用以降低雜訊的干擾。對於一般對話模式和錯誤理解模式,我們使用詞的N-gram和經由剖析器得來的句法規則作為特徵向量來建構用於偵測對話行為的矩陣。為了提升矩陣的強健性,我們將訓練語料中每一類對話行為進行語句聚類用以避免對話行為之間的混淆,並且得到最適應的句法規則群。此外,為避免因語音辨識錯誤或是辨識分數太低被拒絕而造成某些字詞之遺漏,我們根據歸類好的每一群使用部分展開樹將每一群的句子部分展開產生多重候選句。最後矩陣被建構來模型化此兩種對話行為和特徵間的關係。而當語音辨識結果因過多詞的信任分數不夠而被拒絕掉或是偵測到的對話行為信任分數太低,無法理解的情形因而發生。在此種情形下參考對話行為矩陣中語句聚類後之架構,對話行為預測矩陣被提出用來預測使用者之行為,而此矩陣則是模型化音素單元和對話行為之間的關係。
在對話管理方面,藉由上述三種對話行為類型,我們結合了一般回應策略和錯誤修復策略於部分觀察馬可夫決策程序中。藉由數值化的機器學習方式,讓系統產生適當的回應。
針對系統效能評估,我們建立一個任務導向的口語對話系統。其主題是關於成功大學附近食物和餐廳的資訊。由實驗得知,在語音辨識率為87.6%的情形下,對話行為矩陣之平均偵測正確率為82.7 %;對話行為預測矩陣為66%。根據上述之對話行為偵測正確率,系統於實際對話過程中任務執行的平均成功率為92.4%,相較於僅使用關鍵詞填表的策略和沒有修復策略之POMDP系統,分別提升了29.5%和11.7%。由此可知本論文所提出的對話系統結合錯誤感知和管理的功能可達到更好的效果。
With the exponential growth in computing power and progress in speech recognition technology, spoken dialog systems (SDS) with which a user interacts through natural speech has been widely used in human-machine interaction. However, error prone ASR results usually lead to inappropriate semantic interpretation so that miscommunication is raised easily. Therefore, this thesis aims at 1) a reliable semantic analysis for dialog act (DA) identification, 2) potential error awareness based on the detected DA and 3) a general repair strategy for error handling.
In this thesis, two detection matrixes are proposed to solve three types of dialogue behaviors including normal DA, misunderstanding correction DA and non-understanding prediction DA. All the proposed techniques are based on transcriptions and ASR results in order to alleviate the degraded ASR performance. For the first type and second type, this thesis proposed a novel understanding approach called dialog act matrix (DAM) using words N-gram and syntactic rules as feature vectors for DA detection and potential error awareness. In order to enhance its robustness, sentence clustering algorithm is employed in each DA to obtain the best sentence groups to avoid DA confusion. Furthermore, PET is employed to generate candidate sentences for avoiding deletions of words caused by ASR errors or verification. Finally the relation between features and DAs is hence modeled by DAM; on the other hand, non-understanding happens due to words or normal DA rejection with low confidence score. Referring to the frame of DAM, dialog act prediction matrix(DAPM) is proposed for non-understanding awareness. Compared with DAM, DAPM models the relation between DAs and phone units.
For dialog control, a partially observable Markov decision process (POMDP) based dialogue manager with error handling strategy is proposed. That is, system’s responses include not only normal task executions but also recovery or repair actions.
To evaluate the proposed approaches, a goal oriented SDS is built to collect evaluation data and the system task is designed to introduce the food and restaurants around NCKU. The experimental results show that the average accuracy rate of ASR is 87.6%. Based on this ASR performance, the average detection accuracy rates of DAM and DAPM are 82.7%, 66%, respectively. Based on the proposed strategy, the task completion rate through real dialogs is 92.4%. Compared with strategies such as “keyword slot filling” and “POMDP without error handling functions”, significant improvements with 29.5% and 11.7% are obtained. Therefore, an SDS with proposed techniques leads to more reliable dialog process.
[1]Austin, J. L. How to Do Things with Words. Harvard University Press, 1962.
[2]Alexander I. Rudnicky et al - Creating natural dialogs in the Carnegie Mellon Communicator system. in Proceedings of Eurospeech'99, 1999.
[3]A. Raux et al - Doing Research on a Deployed Spoken Dialog System: One Year of Let's Go! Experience. in Proceedings of InterSpeech'06. Pittsburgh, PA, 2006.
[4]B. Coppola, A. Moschitti, and G. Riccardi, “Shallow semantic parsing for spoken language understanding,” in Proc. Annual Conference of the North American Chapter of the Association for Computational Linguistics-Human Language Technologies (NAACL-HLT), pp. 85–88 , 2009.
[5]B. Gonsior et al, “Towards a dialog strategy for handling miscommunication in human-robot dialog” 19th IEEE International Symposium on Robot and Human Interactive Communication, 2010.
[6]Clark, H. H., Using language. Cambridge University Press, 1996.
[7]Chiori Hori, Kiyonori Ohtake, Teruhisa Misu, Hideki Kashioka, Satoshi Nakamura, “Dialog Management using Weighted Finite-State Transducers,” in Proc. Interspeech2008, 2008.
[8]C.-H. Liu and C.-H. Wu, “Semantic role labeling with discriminative feature selection for spoken language understanding,” in Proc. INTERSPEECH, pp. 1043–1046, 2009.
[9]David R. Traum, “Speech Act for Dialog Agents,” Kluwer Academic Publishers, 1999.
[10]D. Bohus and A. Rudnicky – “Sorry I didn’t Catch That!– an Investigation of Non-understanding Errors and RecoveryStrategies”, in Proceedings of SIGdial-2005, Lisbon, Portugal, 2005.
[11]D. Bohus, Error Awareness and Recovery in Conversational Spoken Language Interfaces, Ph.D Thesis, Carnegie Mellon University, CS-07-124, 2007.
[12]Dan Bohus et al, “The RavenClaw dialog management framework: Architecture and systems,” Computer Speech and Language 23, 332–361, 2009.
[13]Epstein, S.L., J.B. Gordon, R.J. Passonneau, and T. Ligorio, "Toward spoken dialogue as mutual agreement." To appear in Proceedings of the AAAI-10 Workshop on Metacognition for Robust Social Systems, Atlanta, Georgia, 2010.
[14]G. SKANTZE, “Error Handling in Spoken Dialog Systems – Managing Uncertainty, Grounding and Miscommunication,” Doctoral Thesis, 2007.
[15]Google Map API [Online] Available: http://code.google.com/intl/zh-TW/apis/maps/
[16]H. Wright, “Automatic utterance type detection using supra segmental features,” in Proc. International Conference on Spoken Language Processing (ICSLP), 1998.
[17]International Phonetic Association (IPA), “Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet,” Cambridge University Press, 1999.
[18]J. Gustafson et al – “AdApt – a multimodal conversational dialog system in an apartment domain.” in Proceedings of ICSLP'00. Beijing, China, 2000.
[19]Jason D. Williams, Steve Young, “Partially observable Markov decision processes for spoken dialog systems,” in Computer Speech and Language, 2007.
[20]Kiyonori Ohtake et al, “Dialog Act Annotation for Statistically Managed Spoken Dialog Systems” Second International Symposium on Universal Communication, 2008.
[21]Matthijs T. J. Spaan, Nikos Vlassis, “Perseus: Randomized Point-based Value Iteration for POMDPs,” in Journal of Artificial Intelligence Research, 2005.
[22]N. Webb, M. Hepple and Y. Wilks, Empirical determination of thresholds for optimal dialog act classification. In Proceedings of the Ninth Workshop on the Semantics and Pragmatics of Dialog, 2005.
[23]N. Webb, M. Hepple, and Y. Wilks, “Dialog act classification based on intra-utterance features,” in Proceedings of the AAAI, 2005.
[24]Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval, chapter 2, pages 27-30. Addison Wesley, New York, 1999.
[25]R. J. Larsen and M. L. Marx, “An Introduction to Mathematical Statistics and Its Applications, 3rd Edition.” ISBN: 0139223037, 2000.
[26]R. Levy and C. Manning, “Is it harder to parse chinese, or the chinese treebank?” in Proc. 41st Annual Meeting on Association for Computational Linguistics (ACL), pp. 439–446, 2003.
[27]Searle, J. R., Speech Acts: An Essay in the Philosophy of Language. Cambridge University Press, 1969.
[28]Stolcke, “SRILM - An Extensible Language Modeling Toolkit. Proc. International Conference on Spoken Language Processing, pp. 901–904, 2002.
[29]S. Quarteroni et al, “Ontology-based grounding of Spoken Language Understanding,” Automatic Speech Recognition & Understanding, pp. 438 - 443, 2009.
[30]S. J. Young, D. Kershaw, J. Odell, D. Ollason, V. Valt-chev, and P. Woodland, The HTK Book, ver-sion 3.4. Cambridge University Press, 2009.
[31]S. Young,” Still Talking to Machines”, Proc. INTERSPEECH 2010, 2010.
[32]S. Young et al, “The Hidden Information State Model: a practical framework for POMDP-based spoken dialog management,” Computer Speech and Language, vol. 24, no. 2, pp. 150–174, 2010.
[33]S. Hara, N. Kitaoka, and K. Takeda, “Estimation method of user satisfaction using N-gram-based dialog history model for spoken dialog system,” in Proceedings of LREC2010, pp. 78–83, May 2010.
[34]S. Hara et al, “Automatic detection of task-incompleted dialog for spoken dialog system based on dialog act N-gram,” Proc. INTERSPEECH 2010, 2010.
[35]T. Kawahara, C.-H. Lee and B.-H. Juang, “Flexible speech understanding based on combined key-phrase detection and verification,” IEEE Transactions on Speech and Audio Processing, vol. 6, no. 6, 1998.
[36]The Stanford Parser Tool [Online] Available: http://nlp.stanford.edu/software/lex-parser.shtml.
[37]Wei-Bin Liang, Chung-HsienWu, Chia-Ping Chen, “Semantic Information and Derivation Rules for Robust Dialogue Act Detection in a Spoken Dialogue System,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: short papers, pages 603–608, Portland, Oregon, June 19-24, 2011.
[38]Zue, V., Seneff, S., Glass, J., Polifroni, J., Pao, C., Hazen, T., and Hetherington, L. JUPITER: A Telephone-Based Conversational Interface for Weather Information. IEEE Transactions on Speech and Audio Processing. 8(1), 2000.