| 研究生: |
魏奇安 Wei, Chi-An |
|---|---|
| 論文名稱: |
基於有向圖與爭論導向摘要的網路辯論之爭論元素辨識 Identifying Argument Components in Online Debates through Directed Graph and Argument-oriented Summarization |
| 指導教授: |
高宏宇
Kao, Hung-Yu |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2018 |
| 畢業學年度: | 106 |
| 語文別: | 英文 |
| 論文頁數: | 63 |
| 中文關鍵詞: | 辯論語意挖掘 、辨識論點 、自動化摘要 、短句語意擴展 、有向圖 |
| 外文關鍵詞: | Argument Mining, Argument Components, Summarization, Short Text Expansion, Directed Graph |
| 相關次數: | 點閱:84 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
辨識觀點在「辯論語意挖掘」的領域中扮演著重要的角色。在任意文章中辨識出論點,不僅可以用在被立場分類做為特徵來源增加辨識的準確性,同時也可以做為一篇文章表明其支持或反對一個目標的理由。在這項研究中,已經有許多方法嘗試解決這一項任務,較廣泛提起的是文字分類方法和自動化摘要的技巧。然而,將辨識觀點轉化成文字分類的問題不僅著重在特徵的選擇和使用,同時也因為將文章中一個一個句子拆出視為獨立的句子,所以也失去了文章整體的關係;自動化摘要的技巧雖然將文章中的所有句子一起處理評估出其中一句適合做為文章代表的句子,然而現有的一些方法多使用詞袋模型以及通用的自動化摘要方法,不僅沒有表示的特徵過於稀疏,同時亦沒有考量及引入辯論文章的特性。
本研究主要在可以保留文章為一體的自動化摘要方法上進行深刻的觀察及調整,使其不僅可以以辯論強度為導向摘要文章,同時針對網際網路中辯論文章句子通常較短,包含較少背景知識的問題,本研究引入外部的文章用以加強及擴展這些短句的語意,並建構一有向圖來表示這些文章。在這些增強和調整下,不僅讓原本無辯論導向的自動化摘要方法可以更好的排序出辯論的句子,同時也得益於外部文章的嵌入,短句子可以擁有更多語意使其有更多的資訊來排序。最終,本研究提出的方法可以在辯論文章中識別論點句子的任務上提升8%的辨識準確率。
Identifying argument components has become an important issue of research in argument mining. When argument components are identified, they can not only be used for stance classification but also can provide reasons for determining an article is supporting or opposing about a specific target.
Previous research mainly used text classification and summarization techniques to solve this task. However, by transforming the task to a classification problem, not only rely heavily on choosing and using bag-of-words features, but also lose the article entity information due to extract the sentences out of the article and treat as an individual training instance. In the other hand, although summarization techniques handle on entire article and try to figure out which sentence can best represent the core concept of the article, in identifying argument components still heavily relies on bag-of-words feature representation and lack of argument-oriented features to concern about argument components characteristics.
In our study, we dive down to the core of the summarization method, adjust it. Not only makes it based on argument strength to summarize articles and identify argument components, but also proposed a directed graph construction approach and embedded extra documents to enhance and expand the short text semantic due to the short and meaningless sentence in the online debates. Experiments show that our proposed method outperforms 8% better than those without argument-oriented methods.
[1] Sobhani, P., D. Inkpen, and S. Matwin. From argumentation mining to stance classification. in Proceedings of the 2nd Workshop on Argumentation Mining. 2015.
[2] Stab, C. and I. Habernal. Detecting argument components and structures. in Report of Dagstuhl Seminar on Debating Technologies (15512). 2016.
[3] Hasan, K.S. and V. Ng. Why are you taking this stance? Identifying and classifying reasons in ideological debates. in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014.
[4] Petasis, G. and V. Karkaletsis. Identifying argument components through TextRank. in Proceedings of the Third Workshop on Argument Mining (ArgMining2016). 2016.
[5] Grolmusz, V., A note on the pagerank of undirected graphs. arXiv preprint arXiv:1205.1960, 2012.
[6] Barrios, F., et al., Variations of the similarity function of textrank for automated summarization. arXiv preprint arXiv:1602.03606, 2016.
[7] Bar-Haim, R., et al. Improving claim stance classification with lexical knowledge expansion and context utilization. in Proceedings of the 4th Workshop on Argument Mining. 2017.
[8] Stab, C. and I. Gurevych. Identifying argumentative discourse structures in persuasive essays. in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014.
[9] Anand, P., et al. Cats rule and dogs drool!: Classifying stance in online debate. in Proceedings of the 2nd workshop on computational approaches to subjectivity and sentiment analysis. 2011.
[10] Somasundaran, S. and J. Wiebe. Recognizing stances in online debates. in Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1. 2009.
[11] Somasundaran, S. and J. Wiebe. Recognizing stances in ideological on-line debates. in Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text. 2010.
[12] Walker, M.A., et al. Stance classification using dialogic properties of persuasion. in Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2012.
[13] Hasan, K.S. and V. Ng. Stance classification of ideological debates: Data, models, features, and constraints. in Proceedings of the Sixth International Joint Conference on Natural Language Processing. 2013.
[14] Faulkner, A., Automated classification of stance in student essays: An approach using stance target information and the Wikipedia link-based measure. Science, 2014. 376(12): p. 86.
[15] Goudas, T., et al. Argument extraction from news, blogs, and social media. in Hellenic Conference on Artificial Intelligence. 2014.
[16] Lawrence, J. and C. Reed. Combining argument mining techniques. in Proceedings of the 2nd Workshop on Argumentation Mining. 2015.
[17] Mandya, A., A. Siddharthan, and A. Wyner. Scrutable feature sets for stance classification. in Proceedings of the Third Workshop on Argument Mining (ArgMining2016). 2016.
[18] Persing, I. and V. Ng. Modeling stance in student essays. in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2016.
[19] Allahyari, M., et al., A brief survey of text mining: Classification, clustering and extraction techniques. arXiv preprint arXiv:1707.02919, 2017.
[20] Vanderwende, L., et al., Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion. Information Processing & Management, 2007. 43(6): p. 1606-1618.
[21] Qiang, J.-P., et al., Multi-document summarization using closed patterns. Knowledge-Based Systems, 2016. 99: p. 28-38.
[22] Steinberger, J. and K. Jezek, Using latent semantic analysis in text summarization and summary evaluation. Proc. ISIM, 2004. 4: p. 93-100.
[23] Erkan, G.u., nes and D.R. Radev, Lexrank: Graph-based lexical centrality as salience in text summarization. journal of artificial intelligence research, 2004. 22: p. 457-479.
[24] Manning, C.D., P. Raghavan, and t. Sch"u, Hinrich, Introduction to Information Retrieval. 2008, New York, NY, USA: Cambridge University Press.
[25] Sparck-Jones, K., A Statistical Interpretation of Term Speci. city and Its Application in Retrieval. Journal of documentation, 1972. 28(5): p. 111-121.
[26] Mihalcea, T., P.T. Textrank, and others. Bringing order into texts. in Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2004.
[27] Page, L., et al., The PageRank citation ranking: Bringing order to the web. 1999, Stanford InfoLab.
[28] Tang, J., et al. End-to-end Learning for Short Text Expansion. in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2017.
[29] Wilson, T., J. Wiebe, and P. Hoffmann. Recognizing contextual polarity in phrase-level sentiment analysis. in Proceedings of the conference on human language technology and empirical methods in natural language processing. 2005.
[30] Mikolov, T., et al., Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
[31] Pennington, J., R. Socher, and C.D. Manning. GloVe: Global Vectors for Word Representation. in Empirical Methods in Natural Language Processing (EMNLP). 2014.
[32] Bojanowski, P., et al., Enriching Word Vectors with Subword Information. arXiv preprint arXiv:1607.04606, 2016.
[33] Rehurek, R.a.S., Petr. Software Framework for Topic Modelling with Large Corpora. in {Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks}. 2010. Valletta, Malta: ELRA.
[34] Nair, V. and G.E. Hinton. Rectified linear units improve restricted boltzmann machines. in Proceedings of the 27th international conference on machine learning (ICML-10). 2010.
[35] Bishop, C.M., Pattern Recognition and Machine Learning (Information Science and Statistics). 2006, Secaucus, NJ, USA: Springer-Verlag New York, Inc.
[36] Le, Q. and T. Mikolov. Distributed representations of sentences and documents. in International Conference on Machine Learning. 2014.
[37] Hochreiter, S. and J.u. Schmidhuber, rgen, Long short-term memory. Neural computation, 1997. 9(8): p. 1735-1780.
[38] Kim, Y., Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882, 2014.