簡易檢索 / 詳目顯示

研究生: 李沅翰
Li, Yuan-Han
論文名稱: 以語法語意修正模型技術改善華語轉台語機器翻譯之語意流暢度
Using Grammatical and Semantic Correction Model to Improve Chinese-to-Taiwanese Machine Translation Fluency
指導教授: 盧文祥
Lu, Wen-Hsiang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 28
中文關鍵詞: 機器翻譯台語語法規則構詞轉換語序轉換華語轉台語
外文關鍵詞: Machine translation, Taiwanese grammatical rules, Lexical transformation, Syntactic transformation, Chinese-to-Taiwanese
相關次數: 點閱:53下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 目前華語轉台語機器翻譯主要面臨三大問題:台語一詞多音選音、台語辭典未知詞選音,以及華語轉台語語法語意轉換。相關文獻多聚焦於台語一詞多音選音或是台語辭典未知詞選音的研究,而目前華語轉台語語法語意轉換之文獻僅片甲麟毛,然而台語句法的特殊規則也會影響翻譯成效,就算發音與選詞正確,若華語語句翻譯至台語時未考慮華台語之間的語法差異,仍會產生華語式台語,除了讀起來不流暢,也會影響讀者對原句語意的理解。本論文的貢獻在整理台語常見句型、語法規則,並以此模組對華語轉台語機器翻譯輸出進行語法語意偵錯修正,以改善語意流暢度。

    Currently, there are three major issues to tackle in Chinese-to-Taiwanese machine translation: multi-pronunciation Taiwanese words, unknown words, and Chinese-to-Taiwanese grammatical and semantic transformation. Recent studies have mostly focused on the issues of multi-pronunciation Taiwanese words and unknown words, while very few research papers focus on grammatical and semantic transformation. However, there exist grammatical rules exclusive to Taiwanese that, if not translated properly, would cause the result to feel unnatural to native speakers and potentially twist the original meaning of the sentence, even with the right words and pronunciations. Therefore, we collect and organize a few common Taiwanese sentence structures and grammar rules, and create a grammar and semantic correction model for Chinese-to-Taiwanese machine translation, which would detect and correct grammatical and semantic discrepancies between the two languages, thus improving translation fluency.

    Table of Contents Abstract ii 摘要 iii 致謝 iv Table of Contents v List of Tables viii List of Figures ix Chapter 1. Introduction 1 1.1 Background and Current Development 1 1.2 Common Issues in Chinese-to-Taiwanese Machine Translation 2 1.3 Paper Contribution 3 Chapter 2. Related Work 5 2.1 Machine Translation Technology 5 2.1.1 Rule-based Machine Translation 5 2.1.2 Statistical Machine Translation 5 2.1.3 Neural Network Model Machine Translation 5 2.2 Chinese-to-Taiwanese Machine Translation 6 2.2.1 Resolving Multi-pronunciation Issue in Taiwanese with Rule-based Machine Translation 6 2.2.2 Deciding Unknown Word Pronunciation with Statistical Machine Translation 6 2.2.3 Performing Whole-sentence Translation Using Neural Network Model Machine Translation 7 2.2.4 Resolving Multi-pronunciation and Word Insertion/Deletion Issue with Mixed Algorithm 7 Chapter 3. Using Grammatical and Semantic Correction Model to Improve Taiwanese Translation Fluency 8 3.1 System Architecture 8 3.2 Building Grammatical and Semantic Correction Model Using Rule-based Machine Translation 9 3.2.1 Building Chinese-to-Taiwanese Correspondence Dictionary Corpus 9 3.2.2 Organizing Taiwanese Grammatical Rules 11 3.2.3 Sentence Preprocessing Module 12 1. Arabic Numerals Regularization 12 2. CKIP System Word Segmentation and POS Tagging 13 3. Base Noun Phrase (BNP) Retrieval 13 3.2.4 Chinese-to-Taiwanese Grammatical Error Detection and Correction Module 13 3.2.5 Chinese-to-Taiwanese Pronunciation Selection Module 14 1. Abbreviation Word Restoration 14 2. Pronunciation Selection 15 Chapter 4. Experiment Result 16 4.1 Dataset Collection 16 4.2 Chinese-to-Taiwanese Grammatical Error Correction Model Experiment Result 16 4.3 Error Analysis 19 4.3.1 Arabic Numerals Regularization Submodule 19 1. Numeral Word Translation Error 19 2. Date Transformation Error 19 4.3.2 Word Segmentation and Part of Speech Tagging Submodule 19 4.3.3 Base Noun Phrase (BNP) Retrieval Submodule 20 1. Verbal Clauses being mistaken as noun phrases 20 2. BNP Retrieval affected by Hanlp and CKIP Word Segmentation Differences 20 4.3.4 Chinese-to-Taiwanese Grammatical and Semantic Error Detection and Correction Module 20 1. "共" Sentences Translation Error 20 2. "了" Particle Transformation Error 20 4.3.5 Chinese-to-Taiwanese Pronunciation Selection Module 21 Chapter 5. Conclusion and Future Work 22 References 23 Appendix 1. Chinese-to-Taiwanese Grammar Ruleset 25 List of Tables Table 1: Chinese-to-Taiwanese Translation System Output Result Comparison 3 Table 2: Categories of Arabic Numerals Regularization 13 Table 3: Chinese-to-Taiwanese Grammatical Correction Examples 14 Table 4: Number of Articles and Sentences for News Dataset 16 Table 5: Arabic Numerals Regularization Submodule Experiment Result 17 Table 6: Word Segmentation and Part-of-speech Tagging Submodule Experiment Result 17 Table 7: Base Noun Phrase (BNP) Retrieval Submodule Experiment Result 18 Table 8: Grammatical and Semantic Error Detection and Correction Module Experiment Result 18 Table 9: Pronunciation Selection Module Experiment Result 19 Table 10: Chinese-to-Taiwanese Function Word Transformation 25 Table 11: Other Lexical Transformation Rules in Chinese-to-Taiwanese Translation 26 Table 12: Comparison Sentences Word Order Revision in Chinese-to-Taiwanese Translation 26 Table 13: Verb-Object Word Order Transformation in Chinese-to-Taiwanese Translation 27 Table 14: Other Syntactic Transformation Rules in Chinese-to-Taiwanese Translation 28 List of Figures Figure 1: Chinese-to-Taiwanese Machine Translation System Architecture 8 Figure 2: Chinese-to-Taiwanese Dictionary Corpus 10 Figure 3: Multi-Pronunciation Words 10 Figure 4: Wenbai Pronunciations of Taiwanese Words 11 Figure 5: Types of Chinese-Taiwanese Grammatical and Semantic Differences 12 Figure 6: Chinese-to-Taiwanese Pronunciation Selection Module Workflow 15

    [1] 潘冠勳 (2021)。基於變調的台語語音合成系統與中台翻譯應用,碩士論文,國立成功大學資訊工程學系,台灣。取自https://hdl.handle.net/11296/9s79j7
    [2] https://suisiann.ithuan.tw/ 意傳科技鬥拍字系統官網
    [3] Arvi Hurskainen, Jörg Tiedemann. Rule-based Machine translation from English to Finnish. Proceedings of the Second Conference on Machine Translation, 323–329.
    [4] Peter F. Brown, John Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Fredrick Jelinek, John D. Lafferty, Robert L. Mercer, Paul S. Roossin (1990), A Statistical Approach to Machine Translation. Computational Linguistics, 16(2), 79-85.
    [5] Franz Josef Och, Christoph Tillmann, Hermann Ney (1999). Improved Alignment Models for Statistical Machine Translation. Proc. of the Joint Conference of Empirical Methods in Natural Language Processing. 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 20-28.
    [6] Philipp Koehn, Franz J. Och, Daniel Marcu (2003). Statistical Phrase-Based Translation. Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 127–133.
    [7] Franz Josef Och, Hermann Ney (2004). The Alignment Template Approach to Statistical Machine Translation. Computational Linguistics, 30(4), 417–449.
    [8] Ilya Sutskever, Oriol Vinyals, Quoc V. Le (2014). Sequence to Sequence Learning with Neural Networks. NIPS'14: Proceedings of the 27th International Conference on Neural Information Processing Systems, 2, 3104–3112.
    [9] Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio (2014). Neural Machine Translation by Jointly Learning to Align and Translate, ICLR 2015
    [10] Thang Luong, Hieu Pham, and Christopher D. Manning (2015). Effective Approaches to Attention-based Neural Machine Translation. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 1412–1421.
    [11] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, Attention Is All You Need, Paper presented at the Meeting of 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.
    [12] 吳時耀 (2015)。結合多特徵模型與階層式架構解決台語文轉音系統中一詞多音問題,碩士論文,國立中興大學資訊科學與工程學系,台灣。取自https://hdl.handle.net/11296/aa7djz
    [13] 陳世翔 (2015)。華台語文轉音系統中未知詞發音決策,碩士論文,國立中興大學資訊科學與工程學系,台灣。取自https://hdl.handle.net/11296/hy8jw8
    [14] 許文漢、曾證融、廖元甫、王文俊、潘振銘 (2020)。基於深度學習之中文文字轉台語語音合成系統初步探討。International Journal of Computational Linguistics & Chinese Language Processing, 25(2), 69-84.
    [15] 黃志超 (2015)。範例為本的國語--台語翻譯研究,碩士論文,國立臺灣海洋大學資訊工程學系,台灣。取自https://hdl.handle.net/11296/h9mtz
    [16] https://github.com/ChhoeTaigi/ChhoeTaigiDatabase ChhoeTaigi 找台語:台語字詞資料庫
    [17] 劉承賢 (2012)。台語、華語,語法大不同!。民國101年2月16日。取自http://taokara.blogspot.com/2012/02/blog-post.html
    [18] 郭永鏜 (2016)。外籍與大陸配偶生活適應輔導班台語《語法》。民國105年7月4日。取自高雄市政府教育局:https://www.kh.edu.tw/filemanage/upload/2301/0712-2%E8%AA%9E%E6%B3%95-%E9%83%AD%E6%B0%B8%E9%8F%9C%E8%80%81%E5%B8%AB.pdf

    無法下載圖示 校內:2027-08-17公開
    校外:2027-08-17公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE