| 研究生: |
夏敏翔 Shia, Min-Shiang |
|---|---|
| 論文名稱: |
使用詞組及流暢性來改善統計式機器翻譯 Using Phrase and Fluency to Improve Statistical Machine Translation |
| 指導教授: |
盧文祥
Lu, Wen-Hsiang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2006 |
| 畢業學年度: | 94 |
| 語文別: | 中文 |
| 論文頁數: | 66 |
| 中文關鍵詞: | 流暢化 、詞組結構 、統計式的機器翻譯 |
| 外文關鍵詞: | Statistical Machine Translation, Phrase Structure, Fluency |
| 相關次數: | 點閱:85 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
自從1993年Peter F. Brown等人提出統計式的機器翻譯後(Statistical Machine Translation) , 目前翻譯的基本單位已由單詞轉變成詞組(Phrase),雖然詞組的統計式機器翻譯已經可以達到不錯的效果,但使用者仍在詞組閱讀上遇到不順暢的情形,而目前研究機器翻譯的領域很少針對翻譯後的流暢化進行探討。我們觀察到譯者在英中翻譯時,常常會加入一些不存在於英文句中的詞彙使句子可以更流暢,如果只是簡單的詞對詞翻譯,無法顯示這些額外附加的詞彙在中文句子中。有鑑於此,我們提出流暢化的詞組為本機器翻譯 (Fluent Phrase-based Machine Translation, FPMT),利用機率模型來確認加入詞彙後的中文字串是否流暢,以及使用語料庫(Corpus)和網路搜尋結果(Search Result)來找出附加的詞彙, 不管是詞組內部( Intra-phrase ), 或是詞組與詞組之間(Inter-phrase),都可以加入流暢化的詞彙,使中文翻譯更為順暢。實驗結果顯示,我們提出的機器翻譯模型以及流暢化方法和IBM Model 4相比較,我們的方法確實可以補回缺少的詞彙,效果也比較好。
After Peter F. Brown (1993) proposed statistical machine translation, more and more researchers used statistical techniques in the research of machine translation. In recent years, phrase-based machine translation methods begin to replace word-based machine translation methods gradually, and tree structures was also are used instead of flat structures. Additionally, although more and more techniques used to improve the quality of machine translation; statistical probability models are still adopted popularly. According to our observations, translators often added additional words in order to improve the fluency of translated sentences, which is difficult to complete in the mechanism of word-to-word translation. Therefore, we propose a fluent phrase-based machine translation (FPMT) model to determine whether it is fluent or not after adding additional words into the sentence. And we also utilize corpus and Web search results to find these additional words. Our approach can add additional words in both intra-phrases and inter-phrases and then make Chinese sentences more fluent. Experiments show that our FPMT can add suitable additional words and the performance of translation fluency is batter than IBM Model 4.
Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer 1993. The Mathematics of
Statistical Machine Translation: Parameter Estimation, Computational Linguisics vol 19(2) 1993.
P.-J. Cheng, Y.-C. Pan, W.-H. Lu, L.-F. Chien. (2004). Creating Multilingual Translation Lexicons with Regional
Variations Using Web Corpora. In Proc. of ACL 2004: 535-542.
Jason S Chang, David Yu, Chun-Jun Lee. 2001. Statistical Translation Model for Phrases. Computational Linguistics
and Chinese Language Processing.
David Chiang 2005. A Hierarchical Phrase-Based Model for Statistical Machine Translation. In Proceeding of
the 43th Annual Meeting of the ACL.
Lee-Feng Chien, T. I. Huang, M. C. Chen, PAT-Tree-Based Keyword Extraction for Chinese Information Retrieval.
Proceedings of 1997 ACM SIGIR Conference, Philadelphia, USA, 50-58.
Yuan Ding and Martha Palmer 2005. Machine Translation Using Probabilistic Synchronous Dependency Insertion
Grammars. In 43st Annual Meeting of the Association for Computational Linguistics (ACL 2005).
George Foster 2000. A Maximum Entropy Minimum Divergence Translation Model, Proc. of the 38th Annual Meeting of
the Association for Computational Linguistics.
Heidi J. Fox. 2002. Phrasal Cohesion and Statistical Machine Translation. Proceedings of the Conference on
Empirical Methods in Natural Language Processing.
Young-Sook Hwang, Yutaka Sasaki. 2005. Context-dependent SMT Model using Bilingual Verb-Noun Collocation.
Proceedings of the 43rd Annual Meeting of the ACL.
Dan Klein and Christopher D. Manning. 2003. Accurate Unlexicalized Parsing. Proceedings of the 41st Meeting of the
Association for Computational Linguistics.
Dan Klein and Christopher D. Manning. 2002. Fast Exact Inference with a Factored Model for Natural Language
Parsing. In Advances in Neural Information Processing Systems 15 (NIPS 2002), December 2002.
Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of
HLT-NAACL 2003, pages 127–133.
James N.K. Liu and Lina Zhou. 1998. A hybrid model for Chinese-English machine translation. IEEE International
Conference.
Nagao, M. A framework of a mechanical translation between Japanese and English by analogy principle. In: A.
Elithorn and R. Banerjii (eds.) Artificial and Human Intelligence. NATO Publications,1984.
Daniel Marcu and WilliamWong. 2002. A phrase-based, joint probability model for statistical machine translation.
In Proc. of EMNLP-2002, Philadelphia, PA, July.
Franz Josef Och and Hermann Ney. 2004. The alignment template approach to statistical machine translation.
Computational Linguistics.
Franz Josef Och, 2002. Statistical Machine Translation: From Single-Word Models to Alognment Templates. Phd
thesis, RWTH Achen, Germany.
Franz Josef Och, Hermann Ney 2002. Discriminative Training and Maximum Entropy Models for Statistical Machine
Translation, Proc. North American Association for Computational Linguistics.
Franz Josef Och, Christoph Tillmann, Hermann Ney 1999. Improved Alignment Models for Statistical Machine
Translation, Proc. of the Joint Conference of Empirical Methods in Natural Language Processing.
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu 2002. BLEU: a Method for Automatic Evaluation of
Machine Translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.
Su, K.-Y., J.-S. Chang and Y.-L. Una Hsu, "A Corpus-based Two-Way Design for Parameterized MT Systems: Rationale,
Architecture and Training Issues," Proceedings of TMI-95, Vol. 2, pp. 334-353, 6th Int. Conf. on Theoretical and
Methodological Issues in Machine Translation, University of Leuven, Leuven, Belgium, July 5--7, 1995. Ashish Venugopal, Stephan Vogel and AlexWaibel 2003. Effective Phrase Translation Extraction from Alignment Models. In
Proceeding of the 41th Annual Meeting of the ACL.
Stephan Vogel, Ying Zhang, Fei Huang, Alicia Tribble, Ashish Venugopal, Bing Zhao, AlexWaibel. 2003. The CMU
Statistical Machine Translation System. Proceedings of MT Summit.
Taro Watanabe, Eiichiro Sumita and Hiroshi G. Okuno. 2003. Chunk-based Statistical Translation. In 41st Annual
Meeting of the Association for Computational Linguistics (ACL 2003) pp. 303-310 Sapporo, Japan.
Taro Watanabe and Eiichiro Sumita. 2003. Example-based Decoding for Statistical Machine Translation. Proceedings
of Machine Translation Summit IX.
Kenji Yamada and Kevin Knight 2001. A Decoder for Syntax-based Statistical MT. Proceedings of the 40th Annual
Meeting of the Association for Computational Linguistics (ACL).
Kenji Yamada and Kevin Knight 2001. A syntax-based statistical translation model. In Proceeding of the 39th Annual
Meeting of the ACL, page 523-530.
Richard Zens and Hermann Ney. 2004. Improvements in Phrase-Based Statistical Machine Translation. Proceedings of
HLT-NAACL 2004.