| 研究生: |
吳浚瑋 Wu, Jhing-Wei |
|---|---|
| 論文名稱: |
基於改進式關鍵詞擷取技術以及可學習式跳躍對話決策樹之友善的老年人照護對話系統 Friendly Elderly Care Spoken Dialogue System Based on Improved Spontaneous Keyword Spotting and Learnable Jumping Dialogue Decision Tree |
| 指導教授: |
王駿發
Wang, Jhing-Fa |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2012 |
| 畢業學年度: | 100 |
| 語文別: | 英文 |
| 論文頁數: | 44 |
| 中文關鍵詞: | 自發性語音辨識 、關鍵詞擷取 、跳躍對話決策樹 |
| 外文關鍵詞: | spontaneous speech recognition, keyword-spotting, jumping dialogue decision tree |
| 相關次數: | 點閱:96 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
現今有越來越多的對話系統被發展與應用在日常生活之中。舉例來說,應用於電話客服的對話系統已被許多公司採用。而和Siri一樣受歡迎的智慧手機對話服務也是另一個成功的例子。但是,不甚友善的介面,例如應用傳統樹狀結構的對話管理以及對於自發性語音的低辨識率,依然是影響對話系統接受度的巨大因素。
本篇論文研究與提出一個基於改進式關鍵詞擷取技術以及可學習式跳躍對話決策樹之友善的老年人照護對話系統。此系統主要分成兩部份:第一部分是基於延長音消除、音節合併預測,以及分類式填充模型的改進式關鍵詞擷取技術。第二部份是擁有新穎結構、客製化介面與學習機制的學習式跳躍對話決策樹。
提出的系統中,自發性語音的關鍵詞擷取扮演了十分重要的角色。在這個模組中,延長音消除方法被應用於恢復音節結構以及消除多餘的延長音以降低運算複雜度。當音節合併發生時,音節合併預測方法則用於提高辨識率。分類式填充模型被提出以減少辨識時間。另一方面,跳躍對話決策樹被採用於管理對話。在此模組中,對話決策樹對應矩陣被用來有效率的定義所有節點之間的關係以及連接各別的關鍵詞列表以減少辨識的關鍵詞數量。學習機制被設計來適當地從根部增加跳躍線到使用率高的節點上,如此可以減少使用者花費的時間。
實驗顯示對於300個句子的資料庫,提出的系統能夠達到90%的辨識率。除此之外,提出的系統與基本線比較起來,速度上有更好的表現。此項資料也能顯示出提出系統的效率。
Nowadays, more and more spoken dialogue systems are developed and utilized in our daily life. For example, spoken dialogue systems for telephone customer service are well adopted in many companies. Furthermore, smart phone assistant like Siri is another famous product all over the world. However, unfriendly interface, such as dialogue management based on traditional tree structure or low recognition accuracy for spontaneous speech, is still a huge factor influencing the acceptability of users.
In this thesis, a friendly elderly care spoken dialogue system based on improved spontaneous keyword spotting and learnable jumping dialogue decision tree is investigated. The proposed system consists of two major parts: The first part is improved spontaneous keyword spotting composed of the lengthening cancellation, syllable contraction prediction, and sorted filler model. The second part is learnable jumping dialogue decision tree with the novel structure, customized interface, and learnable mechanism.
The spontaneous keyword spotting plays an important role of ASR in the proposed spoken dialogue system. In this module, the lengthening cancellation method is employed to recover syllable structures and cancel the redundant lengthening part in order to decrease the computation complexity. The syllable contraction prediction method is utilized to increase the recognition accuracy when syllable contraction occurs. The sorted filler model is proposed to reduce the recognition time. On the other hand, jumping dialogue decision tree is adopted to manage dialogues. In this module, decision tree mapping matrix is used to effectively define the relation of all nodes and connect with separate keyword lists to reduce the keyword number. Customized interface is investigated for users to easily modify and create the dialogue strategy. Learning mechanism is designed to appropriately add jumping lines to connect root and popular nodes, which can reduce the time requirement of users.
Experiments indicate that the proposed system can achieve an accuracy rate of 90% on a 300-scentence database. Moreover, the proposed approach also performs faster in contrast with the baseline. Such results prove the effectiveness of the proposed system.
[1] A. Tavanaei, H. Sameti, and S. H. Mohammadi, “False alarm reduction by improved filler model and ost-processing in speech keyword spotting,” in Proc. IEEE International Workshop on Machine Learning for Signal Processing, Beijing, China, 2011, Sept. 18-21,pp. 1-5
[2] S. Furui, “Recent advances in spontaneous speech recognition and understanding,” in Proc. IEEE-ISCA Workshop on Spontaneous Speech Processing and Recognition (SSPR), Tokyo, Japan, 2003, Apr. 13-16, pp. 1–6.
[3] E. Shriberg, “Spontaneous Speech: How People Really Talk and Why Engineers Should Care,” in Proc. European Conf. on Speech Communication and Technology, Lisbon, Portugal, 2005, Sep 4-8, pp. 1781–1784.
[4] F. Stouten, J. Duchateau, J.P. Martens and P. Wambacq,” Coping with Disfluencies in Spontaneous Speech Recognition: Acoustic Detection and Linguistic Context Manipulation,” Speech Communication, vol. 48, pp. 1590–1606, Apr. 2006.
[5] A. E. Turk and L. White, “Structural influences on accentual lengthening in English,” Journal of Phonetics, vol. 27, pp. 171-206, Apr. 1999.
[6] S.-C. Tseng and Y.-F Liu, “Annotation manual of mandarin conversation dialogue corpus,” Technical Report 02-01, Chinese Knowledge Information Processing Group, Academia Sinica, Taiwan, 2002.
[7] A. Stolcke, E. Shriberg, D. Hakkani-T¨ ur, and G. T¨ ur, “Modeling the prosody of hidden events for improved word recognition,” in Proc. European Conf. on Speech Communication and Technology, Budapest, Hungary, 1999, Sep.5-9, pp. 307–310.
[8] V. Gadde, “Modeling word durations,” in Proc. International Conference of Spoken Language Processing, Beijing, China, 2000, Oct. 16-20, pp. 601– 604.
[9] N. Ma and P. Green, “Context-dependent word duration modeling for robust speech recognition,” in Proc. Interspeech, Lisbon, Portugal, 2005, Sep. 5-9, pp. 2609–2612.
[10] G. Chung and S. Seneff, “A hierarchical duration model for speech recognition based on the ANGIE framework,” Speech Communication, vol. 27, pp. 113–134, Nov. 1999.
[11] K. Chen et al, “Fast speaker adaptation using eigenspace-based maximum likelihood linear regression,” in Proc. ICSLP, Beijing, Oct. 2000.
[12] P. C. Woodland, “Speaker Adaptation: Techniques and Challenges”, in Proc. IEEE Work shop on Automatic Speech Recognition and Understanding, pp.85-90, 2000.
[13] Brian Kan-Wing Mak, and Roger Wend-Huu Hsiao, “Kernel Eigenspace-Based MLLR Adaptation”, IEEE Trans. on Audio, Speech, and Language Processing, vol. 15, no. 3, Mar 2007.
[14] Roger Jang, Audio Signal Processing and Recognition, http://neural.cs.nthu.edu.tw/jang/books/audiosignalprocessing/index.asp
[15] Mandarin Conversational Dialogue Corpus (MCDC), http://mmc.sinica.edu.tw/mcdc_c.htm
[16] L-Y Sun, An Analysis and Modeling of Syllable Contraction in Spontaneous Mandarin Recognition, Master Thesis, NCTU, 2004
[17] Y-K Kang, Detection and Correction of Syllable Contraction in Spontaneous Speech Recognition, Master Thesis, NCKU, 2008
[18] Y-Y Cheng, Syllable Contraction in Mandarin A-not-A Construction, Master Thesis, NTU, 2008
[19] S-C Tseng, “Contracted Syllables in Mandarin: Evidence from Spontaneous Conversations*”, LANGUAGE AND LINGUISTICS, 6.1:153-180, 2005
[20] Edsger Dijkstra, “Dijkstra’s algorithm,” from Wikipedia
http://en.wikipedia.org/wiki/Dijkstra's_algorithm
[21] Speaker Conversion System Based on HMM-Based Speech Synthesis System and Regression-Tree-Based MGC and F0 Conversion with Backtracking Mechanism, Master Thesis, NCKU, 2012
[22] 圖片來源: http://gigaom.com/apple/will-apple-put-siri-in-everything/
[23] C-H Li, “A real-time training-free laughter detection system based on novel syllable segmentation and correlation methods,” in Proc. 4th International Conference on Awareness Science and Technology, Korea University, Seoul, Korea, 2012, Aug. 21-24