簡易檢索 / 詳目顯示

研究生: 張緯丞
Chang, Wei-Cheng
論文名稱: 預訓練語言模型之事件導向新聞分群
Event-driven News Clustering in Pretrained Language Model
指導教授: 盧文祥
Lu, Wen-Hsiang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 31
中文關鍵詞: 新聞事件分群演算法預訓練模型
外文關鍵詞: News Event, Clustering Algorithm, Pretrained Language Model
相關次數: 點閱:110下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 新聞分類是一個很普遍的任務,現在各大新聞網都是使用自己定義好的類別,可以視為是一個監督式任務。但是我們在閱讀新聞的時候,常常是想了解某個人做了什麼,或是某個事件的相關新聞。然而現在並沒有一個工具或模型可以自動化做到這件事,必須是大量的人工介入才有辦法完成。在這篇論文裡,藉由開源工具的幫助,我們整理出了事件抽取的規則,對所有新聞資料抽取出事件,將這些事件分群並產生主題,後續分析分群的結果,發現這樣的方式是可以有效的幫助使用者閱讀新聞。

    News classification is a very common task. Now major news website use their own defined categories, which can be regarded as a supervised task. But when we read the news, we often want to know what someone has done or news related to an event. However, there is no tool or model that can do this automatically. It must be done with a lot of manual intervention. In this paper, with the help of open source tools, we have sorted out the rules for event extraction, extracted events from news corpus, clustered these events and generated topics, and analyzed the results of the clustering and found that this way is possible effectively help users read news.

    1. INTRODUCTION 1 1.1 Background 1 1.2 Motivation 5 1.3 Goal 5 1.4 Method 6 1.5 Contribution 6 2. RELATED WORK 7 2.1 Event Extraction 7 2.2 Pretrained Language Model 9 2.3 Clustering Methods 9 3. METHOD 10 3.1 Overview 10 3.2 Event Extractor 10 3.3 Language Model Embeddings 14 3.4 Embedding K-Means 14 3.5 Topic Generation 14 4. EXPERIMENT 16 4.1 Data 16 4.2 Experiment on Event Extraction 16 4.2.1 Description 16 4.2.2 Evaluation 17 4.3 Experiment on Event Clustering 18 4.3.1 Description 18 4.3.2 Case Study 19 4.3.3 Summary for the Event Clustering Results 23 4.4 K Number Decision 23 5. CONCLUSION 27 6. REFERENCE 28

    George Doddington, Alexis Mitchell, Mark Przybocki, Lance Ramshaw, Stephanie Strassel, and Ralph Weischedel. 2004. The automatic content extraction (ACE) program – tasks, data, and evaluation. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04), Lisbon, Portugal. European Language Resources Association (ELRA).

    Thien Huu Nguyen and Ralph Grishman. 2015. Event detection and domain adaptation with convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 365–371, Beijing, China. Association for Computational Linguistics.

    Thien Huu Nguyen, Kyunghyun Cho, and Ralph Grishman. 2016. Joint event extraction via recurrent neural networks. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 300–309, San Diego, California. Association for Computational Linguistics.

    Yubo Chen, Liheng Xu, Kang Liu, Daojian Zeng, and Jun Zhao. 2015. Event extraction via dynamic multipooling convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 167–176, Beijing, China. Association for Computational Linguistics.

    Shulin Liu, Yubo Chen, Kang Liu, and Jun Zhao. 2017. Exploiting argument information to improve event detection via supervised attention mechanisms. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1789–1798, Vancouver, Canada. Association for Computational Linguistics

    Xiao Liu, Zhunchen Luo, and Heyan Huang. 2018. Jointly multiple events extraction via attentionbased graph information aggregation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1247–1256, Brussels, Belgium. Association for Computational Linguistics.

    Sen Yang, Dawei Feng, Linbo Qiao, Zhigang Kan, Dongsheng Li. 2019. Exploring Pre-trained Language Models for Event Extraction and Generation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5284–5294 Florence, Italy, July 28 - August 2, 2019. c 2019 Association for Computational Linguistics

    MUC-4. 1992. Fourth message understanding conference (MUC-4). In Proceedings of FOURTH MESSAGE UNDERSTANDING CONFERENCE (MUC4), McLean, Virginia.

    Siddharth Patwardhan and Ellen Riloff. 2009. A unified model of phrasal and sentential evidence for information extraction. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 151–160, Singapore. Association for Computational Linguistics.

    Ruihong Huang and Ellen Riloff. 2011. Peeling back the layers: Detecting event role fillers in secondary contexts. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 1137–1147, Portland, Oregon, USA. Association for Computational Linguistics.

    Ruihong Huang and Ellen Riloff. 2012. Modeling textual cohesion for event extraction. In Twenty-Sixth AAAI Conference on Artificial Intelligence.

    Xinya Du and Claire Cardie. 2020. Document-Level Event Role Filler Extraction using Multi-Granularity Contextualized Encoding. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8010–8020 July 5 - 10, 2020. c 2020 Association for Computational Linguistics

    Alexander Yates, Michael Cafarella, Michele Banko, Oren Etzioni, Matthew Broadhead, and Stephen Soderland. 2007. Textrunner: open information extraction on the web. In Proc. of NAACL: Demonstrations, pages 25–26. Association for Computational Linguistics.

    Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying relations for open information extraction. In Proc. of EMNLP, pages 1535–1545.

    Xiao Ding, Yue Zhang, Ting Liu, and Junwen Duan. 2015. Deep learning for event-driven stock prediction. In Proceedings of IJCAI, BueNos Aires, Argentina, August.

    Xiao Ding, Yue Zhang, Ting Liu, and Junwen Duan. 2016. Knowledge-Driven Event Embedding for Stock Prediction. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 2133–2142, Osaka, Japan, December 11-17 2016.

    Dawei Cheng, Fangzhou Yang, Xiaoyang Wang, Ying Zhang and Liqing Zhang. 2020. Knowledge Graph-based Event Embedding Framework for Financial Quantitative Investments. SIGIR ’20, July 25–30, 2020, Virtual Event, China © 2020 Association for Computing Machinery

    Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient Estimation of Word Representations in Vector Space. In ICLR Workshop Papers.

    Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), pages 1532– 1543.

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, pages 6000–6010.

    Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018a. Deep contextualized word representations. In NAACL.

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.

    D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993-1022, 2003.

    Gabor Angeli, Melvin Johnson Premkumar, and Christopher D Manning. Leveraging linguistic structure for open domain information extraction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL 2015), 2015.

    Timothy Dozat, Peng Qi, and Christopher D. Manning. 2017. Stanford’s graph-based neural dependency parser at the conll 2017 shared task. In Proc. of CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies.

    無法下載圖示 校內:2026-08-27公開
    校外:2026-08-27公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE