| 研究生: |
關松堅 Kokoh, Philips |
|---|---|
| 論文名稱: |
應用階層懲罰機率之FOIL及多維語言模型於英語錯誤句之分類與訂正 Hierarchical Penalized Probabilistic FOIL and Multidimensional Language Model for ESL Error Classification and Correction |
| 指導教授: |
吳宗憲
Wu, Chung-Hsien |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2009 |
| 畢業學年度: | 97 |
| 語文別: | 英文 |
| 論文頁數: | 84 |
| 中文關鍵詞: | 懲罰機率式FOIL(First-Order Inductive Logic) 、英語錯誤句分類與校正 、多維度語言模型 |
| 外文關鍵詞: | penalized probabilistic FOIL, multidimensional language model, ESL error classification and correction |
| 相關次數: | 點閱:83 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
有別於以往相關研究僅針對英語學習者之一或兩種錯誤句類別進行錯誤偵測與校正,本篇論文針對英語學習者之多種主要英語文法錯誤句提出自動分類與校正的方法。懲罰機率式FOIL (First-Order Inductive Logic)首次被提出並應用於英語錯誤句分類,不同以往類似研究僅以詞形與詞類表現句子所蘊含的資訊,此模組首次整合多種型態的背景資訊做為句子表徵,其中,採用量化的詞形結構背景資訊,取代傳統n-gram詞形順序的表示法,以避免推論結果為包含特定詞形結構與順序之規則。此外,不同於其它常見分類器演算法僅能提供錯誤句之錯誤類別,此模組承襲一偕邏輯表示法的優點,可進一步推論錯誤句之錯誤原因,另外,並採用解構式測試方式,促進分類結果,並推論出錯誤原因可能導致的多種錯誤類別。多維度語言模型也是首次應用於錯誤句校正,此校正模組根據懲罰機率式FOIL所推論的錯誤原因,變更語言模型中某些維之值,產生錯誤句校正建議。此研究並根據錯誤原因與錯誤範圍,提出階層式分類與校正之機制,以降低錯誤偵測或校正而導致其它錯誤的產生。此研究以中國英語學習者語料庫為實驗評估對象,實驗結果顯示針對各類錯誤類別得到穩定的效果,且平均召回率及正確率均超過其它基線系統,根據其自動分類結果,自動校正在量化及質化的評估過程亦達到不錯的評價。
The fact that shows the large number of non-native English speaker over native English speaker and the non-native English speaker difficulty to achieve native-level proficiency raises the needs of automatic proofing tools for ESL. Much previous work on automatic error detection and correction just deals with one or two error types and adopts n-gram and part-of-speech information to represent a sentence. This study presents penalized probabilistic FOIL (First-Order Inductive Logic) approach for error classification and error cause inference. A sentence is represented using an expressive quantized and multi-type background knowledge which can capture various relations in a sentence. Decomposition-based testing mechanism is proposed to improve classification performance and infer many possible error causes for correction. Multidimensional language model is used to generate all possible corrections. Both classification and correction processes are combined using a hierarchical approach to catch up all possible correction and reduce faulty corrections caused by multiple errors which affect each other. Evaluation on error classification shows that the proposed classification method outperforms the other methods. Correction method demonstrates good and natural suggestion corrections.
[BDG06] Brockett, C., Dolan, W. B., & Gamon, M. (2006). Correcting ESL Errors Using Phrasal SMT Techniques. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. (pp. 249-256). Sydney, July 2006.
[CTH07] Chodorow, M., Tetreault, J. R., & Han, N.-R. (2007). Detection of Grammatical Errors Involving Prepositions. Proceedings of the Fourth ACL-SIGSEM Workshop on Prepositions. (pp. 25-30). Prague, Czech Republic, June 2007.
[Cus96] Cussens, J. (1996). Part-of-Speech Disambiguation using ILP. Technical Report, Oxford University Computing Laboratory.
[Dom07] Domingos, P. (2007). Toward Knowledge-Rich Data Mining. Data Mining and Knowledge Discovery, 15, 21-28, 2007.
[FP07] Felice, R. D., & Pulman, S. G. (2007). Automatically Acquiring Models of Preposition Use. Proceedings of the Fourth ACL-SIGSEM Workshop on Prepositions. (pp. 45-50). Prague, Czech Republic, June 2007.
[FP08] Felice, R. D., & Pulman, S. G. (2008). Automatic Detection of Preposition Errors in Learner Writing. Paper presented at CALICO workshop on Automatic Analysis of Learner Language 2008. San Fransisco, 2008.
[FP08] Felice, R. D., & Pulman, S. G. (2008). A Classifier-Based Approach to Preposition and Determiner Error Correction in L2 English. Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008). (pp. 169-176). Manchester, U.K., August 2008.
[GBD+08] Gamon, M., Brockett, C., Dolan, W. B., Gao, J., Belenko, D., Klementiev, A., et al. (2008). Using Statistical Techniques and Web Search to Correct ESL Errors. Paper presented at CALICO workshop on Automatic Analysis of Learner Language 2008. San Fransisco, 2008.
[GGB+08] Gamon, M., Gao, J., Brockett, C., Klementiev, A., Dolan, W. B., Belenko, D., et al. (2008). Using Contextual Speller Techniques and Language Modeling for ESL Error Correction. Proceedings of the Third International Joint Conference on Natural Language Processing (IJCNLP): Volume-I. (pp. 449-456). Hyderabad, India, August 2008.
[HCL06] Han, N.-R., Chodorow, M., & Leacock, C. (2006). Detecting Errors in English Article Usage by Non-Native Speakers. Natural Language Engineering. 12(2), 115-129.
[LGB09] Leacock, C., Gamon, M., & Brockett, C. (2009). User Input and Interactions on Microsoft Research ESL Assistant. Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications, (pp. 73-81). Boulder, Colorado, June 2009.
[LKD07] Landwehr, N., Kersting, K., Raedt, L.D. (2007). Integrating Nave Bayes and FOIL. Journal of Machine Learning Research (JMLR), vol. 8, pp. 481-507, March 2007.
[LS08] Lee, J., & Seneff, S. (2008). Correcting Misuse of Verb Forms. Proceedings of ACL-08: HLT. (pp. 174-182). Columbus, Ohio, USA, June 2008.
[Moo97] Mooney, R. (1997). Inductive Logic Programming for Natural Language Processing. Proceedings of the sixth International Inductive Logic Programming Workshop. (pp. 3-24).
[MS99] Manning, C.D., & Schutze, H. (1999). Foundation of Statistical Natural Language Processing, MIT Press.
[McC02] McCallum, A.K. (2002). "MALLET: A Machine Learning for Language Toolkit." http://mallet.cs.umass.edu. 2002.
[ND97] Nienhuys-Cheng, S.-H., & DeWolf, R. (1997). Foundation of Inductive Logic Programming, Springer.
[NKM+06] Nagata, R., Kawai, A., Morihiro, K., & Isu, N. (2006). A Feedback-Augmented Method for Detecting Errors in the Writing of Learners of English. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. (pp. 241-248). Sydney, July 2006.
[NLM99] Nigam, K., Lafferty, J., McCallum, A. (1999). Using Maximum Entropy for Text Classification. In IJCAI’99 workshop on Information Filtering, 1999.
[NP03] Niles, I. and Pease, A. (2003). Linking Lexicons and Ontologies: Mapping WordNet to the Suggested Upper Merged Ontology. In Proceedings of the 2003 International Conference on Information and Knowledge Engineering (IKE’03), Las Vegas, Nevada, June 23-26, 2003
[Qui93] Quinlan, J.R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993.
[Qui90] Quinlan, J.R. (1990). Learning Logical Definitions from Relations. Machine Learning 239-266.
[RK04] Raedt, L.D., Kersting, K. (2004). Probabilistic Inductive Logic Programming. Proceedings of the Fifteenth International Conference on Algorithmic Learning Theory. (pp. 19-36). Springer, 2004.
[SH03] Shicun G., Huizhong Y. (2003). 中國學習者英語語料庫 (Chinese Learner English Corpus). 上海外語教育出版社, 2003.
[SSN07] Specia, L., Stevenson, M., & Nunez, M. V. (2007). Learning Expressive Models for Word Sense Disambiguation. Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. (pp. 41-48). Prague, Czech Republic, June 2007.
[SLC+07] Sun, G., Liu, X., Cong, G., Zhou, M., Xiong, Z., Lee, J., et al. (2007). Detecting Erroneous Sentences using Automatically Mined Sequential Patterns. Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. (pp. 81-88). Prague, Czech Republic, June 2007.
[TC08] Tetreault, J., & Chodorow, M. (2008). The Ups and Downs of Preposition Error Detection in ESL Writing. Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008). (pp. 865-872). Manchester, U.K., August 2008.
[YJJ+08] Yang, X., Jian, S., Jun, L., Tan, .C. L., Liu, T., & Li, S. (2008). An Entity-Mention Model for Coreference Resolution with Inductive Logic Programming. Proceedings of ACL-08: HLT. (pp. 843-851). Columbus, Ohio, USA, June 2008.
[YGD08] Yi, X., Gao, J., & Dolan, W. B. (2008). A Web-based English Proofing System for English as a Second Language Users. Proceedings of the Third International Joint Conference on Natural Language Processing (IJCNLP): Volume-I I. (pp 619-624). Hyderabad, India, August 2008.