簡易檢索 / 詳目顯示

研究生: 王鈺云
Wang, Yu-Yun
論文名稱: 通過蘊涵關係強化內容選擇之多文本摘要
Enhance Content Selection for Multi-Document Summarization with Entailment Relation
指導教授: 高宏宇
Kao, Hung-Yu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 45
中文關鍵詞: 蘊涵關係多文本摘要生成式摘要
外文關鍵詞: entailment relation, multi-document summarization, abstractive summarization
相關次數: 點閱:109下載:8
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著大數據時代的來臨,文字類型的資料大量湧現,自動文本摘要成為熱門需求,自動文本摘要在自然語言領域中是一項非常重要的研究,目標是在保留重要資訊的條件下將冗長的文本濃縮成短文本,簡明且精確的掌握原文本的關鍵訊息。人工撰寫摘要需要花費相當高的成本及時間完成,機器學習可以透過大量訓練,充分理解原文後,識別出重要訊息並將其整理成摘要內容,快速減少讀者的閱讀時間。依據輸入的文本數量分為單文本摘要和多文本摘要,摘要的輸出類型則分為兩種,抽取式摘要與生成式摘要,兩者最大的差別在於,抽取式摘要中的所有詞彙皆來自原文,而生成式摘要允許內容含有原文未出現過的新穎詞彙。
    本論文探討在新聞多文本摘要中所面臨的問題及困難點,並從中找到解決辦法。多文本摘要主要存在多篇文章之間的資訊重疊與資訊差異兩大問題,現有模型多以單文本角度去處理多文本摘要,此作法沒有考慮多篇新聞文章之間的關係。我們提出的模型由內容選擇器與摘要生成器兩大架構組成,內容選擇器基於不同文章中句子之間的蘊含關係挑選出多資訊且具代表性的句子,透過演算法擷取出與文章主題相關的內容由摘要生成器生成最後的摘要,確保摘要內容不含冗餘資訊且保有關鍵訊息。實驗結果顯示,我們提出的模型在評估結果上得到有效的改善。
    我們提出的方法主要貢獻在於使用蘊含關係取得多文本中的關鍵訊息,加入語意理解能更清楚得識別重要資訊,提高多文本摘要的準確度。

    Automatic text summarization is one of the common tasks in natural language processing. The main task is to generate a shorter version based on the original text and maintain relevant information. This thesis studies multi-document summarization (MDS) that applies to news articles. MDS has two significant issues which are information overlap and information difference among multiple articles. Existing models mostly deal with MDS from the perspective of single document summarization (SDS). The models do not consider the relation between sentences in multiple news articles. Our proposed method consists of two models. The sentence selector model selects representative sentences based on the entailment relation in different articles. The content is related to the event of the article extracted through the algorithm. The summary generator model generates a final summary to ensure that the summary contains no redundancy and maintains vital information. Experiment results show that our proposed model has effectively improved in the evaluation results.
    The main contribution of our approach is to use the entailment relation to obtain key content in multiple articles. Adding semantic comprehension can identify salient information clearly and improve the accuracy of MDS.

    中文摘要 I Abstract II 誌謝 III CONTENTS IV TABLE LISTING VI FIGURE LISTING VII 1. Introduction 1 1.1 Background 1 1.2 Motivation 3 1.3 Our approach 7 1.4 Thesis structure 8 2. Related work 9 2.1 Sequence-to-sequence model 9 2.1.1 Encoder-decoder 9 2.1.2 Attention mechanism 10 2.2 Bidirectional Encoder Representations from Transformers 12 2.2.1 Transformer encoder 12 2.2.2 Pre-training 13 2.2.3 Fine-tuning 14 2.3 SDS to MDS model 15 2.4 Extractive MDS method 16 2.4.1 Graph-based system 16 2.4.2 Maximum Margin Relevance Algorithm 17 2.5 Abstractive MDS method 18 3. Method 19 3.1 Sentence Selector Model 19 3.1.1 Singletons-Pairs Instance 20 3.1.2 Entailment Instance 22 3.1.3 Bert Fine-Tune For Classifier 25 3.1.4 Entail-MMR 27 3.2 Summary Generation Model 28 4. Experiments 31 4.1 Dataset 31 4.2 Evaluation Metric 32 4.3 Model parameters 33 4.3.1 Sentence selector model 33 4.3.2 Summary generation model 33 4.4 Performance Comparison 33 4.5 Ablation Study 35 5. Analysis 37 5.1 Example Analysis 37 5.2 Instance Selection Analysis 39 6. Conclusion 41 6.1 Future Work 41 7. Reference 42

    [1] Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
    [2] D. Bahdanau, K. Cho and Y. J. a. p. a. Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
    [3] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In North American Association for Computational Linguistics (NAACL), 2019.
    [4] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008), 2017.
    [5] Zhu, Y., Kiros, R., Zemel, R., Salakhutdinov, R., Urtasun, R., Torralba, A., & Fidler, S. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In Proceedings of the IEEE international conference on computer vision (pp. 19-27), 2015.
    [6] Logan Lebanoff, Kaiqiang Song, and Fei Liu. Adapting the neural encoder-decoder framework from single to multi-document summarization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018.
    [7] Jianmin Zhang, Jiwei Tan, and Xiaojun Wan. Adapting Neural Single-Document Summarization Model for Abstractive Multi-Document Summarization: A Pilot Study. In Proceedings of the 11th International Conference on Natural Language Generation, 2018.
    [8] Gunes Erkan and Dragomir R Radev. Lexrank: ¨ Graph-Based Lexical Centrality as Salience in Text Summarization. Journal of artificial intelligence research, 22:457–479, 2004.
    [9] Rada Mihalcea and Paul Tarau. Textrank: Bringing Order into Text. In Proceedings of the 2004 conference on empirical methods in natural language processing, 2014.
    [10] Michihiro Yasunaga, Rui Zhang, Kshitijh Meelu, Ayush Pareek, Krishnan Srinivasan, and Dragomir R. Radev. Graph-Based Neural Multi-Document Summarization. In Proceedings of CoNLL-2017. Association for Computational Linguistics, 2017.
    [11] Ziqiang Cao, Wenjie Li, Sujian Li, and Furu Wei. Improving Multi-Document Summarization via Text Classification. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pages 3053–3059, 2017.
    [12] Jaime Carbonell and Jade Goldstein. The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 335–336. ACM, 1998.
    [13] Peter J. Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, and Noam Shazeer. Generating Wikipedia by Summarizing Long Sequences. In Proceedings of the 6th International Conference on Learning Representations, 2018.
    [14] Logan Lebanoff, Kaiqiang Song, Franck Dernoncourt, Doo Soon Kim, Seokhwan Kim, Walter Chang, and Fei Liu. Scoring sentence singletons and pairs for abstractive summarization. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2019.
    [15] Yashar Mehdad, Giuseppe Carenini, Frank W Tompa, and Raymond T Ng. Abstractive meeting summarization with entailment and fusion. In Proc. of the 14th European Workshop on Natural Language Generation. pages 136–146, 2013.
    [16] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar S. Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke S. Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach. ArXiv, abs/1907.11692, 2019.
    [17] Ani Nenkova and Kathleen McKeown. Automatic summarization. Foundations and Trends in Information Retrieval, 2011.
    [18] Abigail See, Peter J. Liu, and Christopher D. Manning. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 1073–1083, 2017.
    [19] Alexander R. Fabbri, Irene Li, Tianwei She, Suyi Li, Dragomir R. Radev. Multi-news: A large-scale multi-document summarization dataset and abstractive hierarchical model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1074–1084, 2019.
    [20] Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Stan Szpakowicz Marie-Francine Moens, editor, Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pages 74–81, 2004.
    [21] Klein, G., Kim, Y., Deng, Y., Senellart, J., and Rush, A. M. OpenNMT: Open-Source Toolkit for Neural Machine Translation. ArXiv e-prints, 2017.
    [22] Sebastian Gehrmann, Yuntian Deng, and Alexander M.Rush. Bottom-Up Abstractive Summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, pages 4098–4109, 2018.
    [23] Wang, D., S. Zhu, T. Li, and Y. Gong. Comparative document summarization via discriminative sentence selection. In Proceeding of CIKM, 2009.

    下載圖示 校內:2021-09-01公開
    校外:2021-09-01公開
    QR CODE