| 研究生: |
蘇嘉穎 Sou, Ka-Weng |
|---|---|
| 論文名稱: |
利用語法結構與語意相似度建立改寫句子抄襲偵測方法 Developing a Plagiarism Detecting Method of Paraphrasing Sentences by Syntactical Structure and Semantic Similarity |
| 指導教授: |
王惠嘉
Wang, Hei-Chia |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2012 |
| 畢業學年度: | 100 |
| 語文別: | 中文 |
| 論文頁數: | 67 |
| 中文關鍵詞: | 改寫 、抄襲偵測 、語法結構分析 、語意相似度 |
| 外文關鍵詞: | paraphrase, plagiarism detection, syntactical structure, semantic similarity |
| 相關次數: | 點閱:86 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著台灣高等教育全球化,學術研究領域的發展與貢獻成為評估國家經濟發展的重要指標之一,為了提升台灣各大學在全球高等教育的競爭力,教育部近幾年更加重視各大專院校的國際化程度,英語成為各大專院校發表學術研究成果的主要媒介。
然而,對英語為外語的學者而言,要以全英語撰寫研究成果並非易事,其英文寫作的訓練經常不足以獨自撰寫全英語的學術研究成果,而隨著網際網路的進步,配合功能強大的搜尋引擎,使得資訊的取得越來越容易,直接或間接地導致學者有意或無意地抄襲他人的想法。
由於許多學者對於抄襲的認知不正確,認為稍作修改內容就不算抄襲,並不知道自己已觸犯抄襲行為。然而,目前市面上抄襲偵測軟體其功能多為比對論文資料庫或網路資源,只能偵測出簡單的抄襲類型,且只有單純使用單一字詞為基礎進行比對,不但無法偵測文句改寫的現象,且無法教導學者如何適切的改寫句子等反抄襲方法。因此,我們更需主動協助或教育學者偵測所著文件是否抄襲。
有鑑於此,本研究將提出一個偵測方法能指引改寫句子期望改寫時不要犯下抄襲的行為,本偵測方法利用語法結構分析擷取句子中所有的片語,以片語為比對單位,改善單純使用單一字詞比對的準確性,並同時考量同義字替代及字詞次序的改變,計算句子的語意相似度及次序相似度。透過來源文件及使用者修改文件中句子的比對,找出可能有抄襲情形的改寫文件,並詳細分析其所使用的改寫手法,期望藉由分析結果,建議學者如何避免觸犯抄襲。
實驗結果發現本研究所提出的方法是可行的,實驗數據證實採用片語比對可改善傳統使用單一字詞的準確性,使用語意相似度也對結果有正面的影響。另外,結合PATH和WUP計算語意相似度,比單純使用PATH或WUP有較好的表現。
Along with globalization of higher education in Taiwan, the contribution and development of academic researches become one of national economic development indicators. In order to enhance the competitiveness of universities in Taiwan, Ministry of Education pay more attention to the degree of internationalization in recent years. As a result, English has become the main medium for universities to publish academic researches.
However, it is not an easy task for Taiwanese researchers─English as a foreign language learners─to compose a variety of writings in English. There is not enough English writing training for researchers to compose researches in English by themselves. With the advance of Internet, it is easier to obtain information using powerful search engines on the web. It leads researchers to copy the ideas of others intentionally or unintentionally.
In fact, many researchers’ recognition of plagiarism is incorrect. They think that it is not regarded as plagiarism when they modify the documents slightly, so they does not recognize their behavior is illegal. Nevertheless, the major function of the currently available plagiarism-detecting softwares which plays the role of checking possibly plagiarized papers with the essay database or Internet-based search engines. These existing systems can only detect simple kinds of plagiarism based on only single terms. It causes that the cases of paraphrase can not be detected, and systems are not able to guide researchers to paraphrase properly. Hence, there is a requirement of an environment which can help researchers and educate them detect whether their documents are plagiarized or not.
In this study, a new plagiarism detection method which can guide paraphasing the original sentences to avoid plagiarism will be proposed. The proposed method make use of the syntactical structure to retrieve all phrases of the sentences to improve the accuracy of using single terms. In addition, we considers the substitution and reversal of the terms, computing the semantic similarity and order similarity of the sentences. Find out the paraphrased documents which is suspected to avoid plagiarism through original document and user modified document. Finally, we are looking forward to suggest users how to avoid plagiarism.
After evaluation, the proposed method can improve the accuracy of using single term traditionally. The semantic similarity can also take an advantage of the results. Moreover, the performance of using PATH and WUP to calculate the semantic similarity is better than only using PATH or WUP.
英文文獻
Aimmanee, P. (2011). Automatic Plagiarism Detection Using Word-Sentence Based S-gram. Chiang Mai Journal of Science, 38, 1-7.
Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern Information Retrieval: Addison-Wesley Longman Publishing Co., Inc.
Chen, C.-Y., Yeh, J.-Y., & Ke, H.-R. (2010). Plagiarism Detection using ROUGE and WordNet. Journal of Computing, 2(3), 34-44.
Clough, P., & Stevenson, M. (2011). Developing a Corpus of Plagiarised Short Answers. Language Resources and Evaluation, 45(1), 5-24.
Culwin, F., & Lancaster, T. (2000). A Review of Electronic Services for Plagiarism Detection in Student Submissions. Paper presented at the LTSN-ICS 1st Annual Conference.
Howard, R. M. (1995). Plagiarisms, authorships, and academic death penalty. College English, 57(7), 788-806.
Howard, R. M. (2010). Writing Matters: A Handbook for Writing and Research. New York: McGraw-Hill.
Jiang, J. J., & Conrath, D. W. (1997). Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. Paper presented at the International Conference Research on Computational Linguistics (ROCLING X).
Kakkonen, T., & Mozgovoy, M. (2010). Hermetic and Web Plagisrism Detection Systems for Student Essays-An Evaluation of the State-of-the-art. Journal of Educational Computing Research, 42(2), 135-159.
Kang, N., Gelbukh, A., & Han, S. (2006). PPChecker: Plagiarism Pattern Checker in Document Copy Detection. In P. K. I. P. K. Sojka (Ed.), Text, Speech and Dialogue, Proceedings (Vol. 4188, pp. 661-667).
Li, Y. H., McLean, D., Bandar, Z. A., O'Shea, J. D., & Crockett, K. (2006). Sentence Similarity Based on Semantic Nets and Corpus Statistics. IEEE Transactions on Knowledge and Data Engineering, 18(8), 1138-1150.
Lin, D. (1997). Using Syntactic Dependency as Local Context to Eesolve Word Sense Ambiguity. Paper presented at the Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics.
Losee, R. M. (2001). Natural Language Processing in Support of Decision-making: Phrases and Part-of-speech Tagging. Information Processing & Management, 37(6), 769-787.
Maurer, H., Kappe, F., & Zaka, B. (2006). Plagiarism - A Survey. Journal of Universal Computer Science, 12(8), 1050-1084.
Miller, G. A. (1995). WordNet - A Lexical Database For English. Communications of the Acm, 38(11), 39-41.
Mozgovoy, M., Kakkonen, T., & Cosma, G. (2010). Automatic Student Plagiarism Detection: Future Perspectives. Journal of Educational Computing Research, 43(4), 511-531.
Oetsch, J., Puehrer, J., Schwengerer, M., & Tompits, H. (2010). The System Kato: Detecting Cases of Plagiarism for Answer-set Programs. Theory and Practice of Logic Programming, 10, 759-775.
Oliva, J., Ignacio Serrano, J., Dolores del Castillo, M., & Iglesias, A. (2011). SyMSS: A Syntax-based Measure for Short-text Semantic Similarity. Data & Knowledge Engineering, 70(4), 390-405.
Patwardhan, S. (2003). Incorporating Dictionary and Corpus Information Into a Context Vector Measure of Semantic Relatedness. University of Minnesota, Duluth.
Pecorari, D. (2003). Good and Original: Plagiarism and Patchwriting in Academic Second-language Writing. Journal of Second Language Writing, 12(4), 317-345.
Pera, M. S., & Ng, Y.-K. (2011). SimPaD: A Word-similarity Sentence-based Plagiarism Detection Tool On Web Documents. Web Intelligence and Agent Systems, 9(1), 27-41.
Porter, M. F. (1980). An Algorithm For Suffix Stripping. Program-Automated Library and Information Systems, 14(3), 130-137.
Potthast, M., Barron-Cedeno, A., Stein, B., & Rosso, P. (2011). Cross-language Plagiarism Detection. Language Resources and Evaluation, 45(1), 45-62.
Potthast, M., Stein, B., Eiselt, A., Barrón-Cedeño, A., & Rosso, P. (2009). Overview of the 1st International Competition on Plagiarism Detection. Paper presented at the PAN-09 3rd Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse and 1st International Competition on Plagiarism Detection.
Rabin, M. O. (1981). Fingerprinting by Random Polynomials. Center for Research in Computing Technology, Harvard University, Report TR-15-81.
Rada, R., Mili, H., Bicknell, E., & Blettner, M. (1989). Development And Application Of A Metric on Semantic Nets. IEEE Transactions on Systems Man and Cybernetics, 19(1), 17-30.
Resnik, P. (1995). Using Information Content to Evaluate Semantic Similarity in a Taxonomy. Paper presented at the Proceedings of the 14th international joint conference on Artificial intelligence.
Roig, M. (1997). Can undergraduate students determine whether the text has been plagiarized? The Psychological Record, 47, 113-123.
Roig, M. (1999). When college students’ attempts at paraphrasing become instances of potential plagiarism. Psychological Reports, 84, 973-982.
Sun, Y.-C. (2009). Using a Two-tier Test in Examining Taiwan Graduate Students' Perspectives on Paraphrasing Strategies. Asia Pacific Education Review, 10(3), 399-408.
Uzuner, Ö., Katz, B., & Nahnsen, T. (2005). Using Syntactic Information to Identify Plagiarism. Proceedings of the 2nd Workshop on Building Educational Applications Using NLP, 37-44.
Walker, A. L. (2008). Preventing Unintentional Plagiarism: A Method for Strengthening Paraphrasing Skills. Journal of Instructional Psychology, 35(4), 387-395.
White, D. R., & Joy, M. S. (2004). Sentence-based Natural Language Plagiarism Detection. J. Educ. Resour. Comput., 4(4), 2.
Wu, Z., & Palmer, M. (1994). Verb Semantics and Lexical Selection. Paper presented at the 32nd. Annual Meeting of the Association for Computational Linguistics.
Yamada, K. (2003). What Prevents ESL/EFL Writers from Avoiding Plagiarism?: Analyses of 10 North-American college websites. System, 31(2), 247-258.
Zaka, B. (2009). Empowering Plagiarism Detection with a Web Services Enabled Collaborative Network. Journal of Information Science and Engineering, 25(5), 1391-1403.
網路資料
Bull, J., Colins, C., Coughlin, E., & Sharp, D. (2001). Technical review of plagiarism detection software report. Retrieved Nov 26, 2011, from http://www.jisc.ac.uk/uploaded_documents/luton.pdf
Canexus Inc. (2011). EVE2 - Essay Verification Engine., from http://www.canexus.com/
CFL Software Limited. (2011). CopyCatch. Retrieved Nov 17, 2011, from http://cflsoftware.com/
Howard, R. M. (2001). Plagiarism: What Should a Teacher Do? Retrieved Nov 21, 2011, from http://wrt-howard.syr.edu/Papers/CCCC2001.html
iParadigms. (2011). Turnitin.com. Digital assessment suite. Retrieved Nov 17, 2011, from http://turnitin.com
教育部高教司. (2004). 大學校務評鑑規劃與實施計畫─評鑑手冊. 2011年11月13日,取自:http://academic.ntou.edu.tw/service/dia/ntou/book1.pdf.
網站資料
The Comprehensive Perl Archive Network (CPAN)
http://search.cpan.org/
The Stanford NLP (Natural Language Processing) Group
http://nlp.stanford.edu/software/lex-parser.shtml
WordNet
http://wordnet.princeton.edu/
維基百科
http://www.wikipedia.org/
校內:2022-12-31公開