| 研究生: |
曾華璽 Tseng, Hua-Hsi |
|---|---|
| 論文名稱: |
以變調與斷詞改善台語語音辨識並以諧音建置台語幽默對話特徵 Improvement of Taiwanese Speech Recognition with Automatic Tone Sandhi and Word Segmentation and Construction of Taiwanese Humorous Conversation Pattern Based on Homophonic Words |
| 指導教授: |
楊中平
Young, Chung-Ping |
| 共同指導教授: |
盧文祥
Lu, Wen-Hsiang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2018 |
| 畢業學年度: | 106 |
| 語文別: | 英文 |
| 論文頁數: | 32 |
| 中文關鍵詞: | Kaldi 、語音辨識 、自然語言處理 、幽默辨識 |
| 外文關鍵詞: | Kaldi, Speech Recognition, Nature Language Processing, Humor Recognition |
| 相關次數: | 點閱:132 下載:9 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於現今台灣尚未開發出完整的閩南語語音辨識系統,因為閩南語不容易用中文字來斷詞與變調,在聲調部分閩南語的轉調規則也比國語複雜且有地域性區分,因此我們利用Kaldi語音辨識的模組建立並改善閩南語語音辨識系統。首先我們在語音辨識之前的文字標註階段加入閩南語轉調規則與台語斷詞處理,其中臺灣閩南語拼音的部分我們選擇官方的臺灣閩南語羅馬字拼音方案,簡稱台羅拼音。接著我們結合閩南語說唱藝術的幽默手法與自然語言處理技術,從YouTube上的說唱藝術影片,並依據幽默的雙關歧義與諧音策略,將閩南語與國語以諧音對應半自動的方式建立成幽默諧音特徵資料庫。
In Taiwan, we have not completed developed a reliable Taiwanese speech recognition system, because Taiwanese is difficult to write in words, segment word and do tone sandhi. In tone, the rule of Taiwanese tone sandhi is more complicated than the Chinese tone sandhi, and it’s also different for different locations, so we use the Kaldi Speech Recognition Toolkit to build Taiwanese speech recognition system and do some processes to improve it. First, we perform Taiwanese word segmentation and the rules of Taiwanese tone sandhi before building Taiwanese speech recognition model. We choose the Taiwanese Romanization System which is the officially promoted phonetic notation system by Taiwan's Ministry of Education, often referred to as Tâi-lô. Next, we combine quyi and nature language processing. According to the humor strategy homophonic puns, we build the humorous homophonic pattern database semi-automatically from videos on YouTube.
[1] 葉高華, “臺灣語言使用調查文獻回顧,” 2017, p. 20.
[2] Bryant, J., & Zillmann, D., “Chapter 2: Using Humor to Promote Learning in the Classroom.,” 於 Journal of Children in Contemporary Society, 1989, pp. 20(1-2),49-78.
[3] Mcghee, P. E., & Frank, M., “Humor and Children's Development: A Guide to Practical Applications,” 2014.
[4] Mihalcea, R., & Strapparava, C., “Technologies That Make You Smile: Adding Humor to Text-Based,” 於 IEEE Intelligent Systems, 2006b, pp. 21(5),33-39.
[5] X. Z. Dong Wang, “THCHS-30 : A Free Chinese Speech Corpus,” 2015.
[6] D. Povey, “The Kaldi Speech Recognition Toolkit,” 2011.
[7] 朱晴蕾, “Language Identification on Code-Switching Speech,” 2007.
[8] 游效儒, “A Telephone-based Mandarin/Taiwanese Bi-lingual Speech Recognition System,” 2002.
[9] 教育部, “臺灣閩南語羅馬字拼音方案使用手冊,” 2007.
[10] Diyi Yang, Alon Lavie, Chris Dyer, Eduard Hovy, “Humor Recognition and Humor Anchor Extraction,” 於 Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 2015, p. 2367–2376.
[11] Gozde Ozbal & Carlo Strapparava, “A Computational Approach to the Automation of Creative Naming,” 於 Paper presented at the 3rd International Workshop On Computational Humor, Amsterdam, Netherlands, 2012.
[12] X. W. Y. Y. Shikang Du, “Towards Automatic Generation of Entertaining Dialogues in Chinese Crosstalks,” 2017.