| 研究生: |
顏宏峻 Yan, Hong-Jun |
|---|---|
| 論文名稱: |
基於顯著性提取器和一致網路之非監督式文本摘要模型 SECNet: Unsupervised Text Summarization Using Salience Extractor with a Consistent Network |
| 指導教授: |
黃仁暐
Huang, Jen-Wei |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 英文 |
| 論文頁數: | 46 |
| 中文關鍵詞: | 基於提取的文檔摘要 、序列到序列網絡 、深度平均網絡 、注意力機制 |
| 外文關鍵詞: | Extraction-based Document Summarization, Sequence to Sequence network, Deep average network, Attention mechanism |
| 相關次數: | 點閱:109 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
基於自動提取的文檔摘要是一項重要且困難的自然語言處理任務,尤其是在無監督式的摘要中。以前的非監督式摘要模型通常使用基於圖的排序算法來計算句子之間的相似性,藉此評估句子的顯著性。然而,相似性僅捕獲句子之間的表面關係。最近在非監督式的圖像生成和基於注意力機制的圖像標題模型方面之研究受到啟發。我們提出了一種基於注意力機制的非監督式摘要模型來解決上述問題。我們的模型包含兩個Seq2Seq模型,顯著性提取器和一致網路。顯著性提取器模型是用編碼器提取文檔的信息,並用解碼器生成潛在的摘要。通過潛在摘要的注意力分數,我們可以計算出每個句子在文檔中的重要性。一致網路是確保潛在摘要包含文檔信息。我們在英文摘要上準備了DUC 2001和DUC 2002兩個資料集以及中文摘要資料集LCSTS。另外我們還創建了一個中文財經新聞資料集,並由一組銀行審計專家標記參考摘要。
Automated extractive document summarization is an important aspect of natural language processing. Most existing unsupervised summarization models use a graph-based ranking algorithm to evaluate the salience of sentences based on similarity. However, similarity captures only the surface relationships between sentences. In this study, we addressed this issue by exploiting recent developments in the field of unsupervised image generation and attention-based models for image captioning. Our model includes two Seq2Seq models, a salience extractor, and a consistent network. The salience extractor model derives information from documents using an encoder, and generates latent summaries using a decoder. The attention score of latent summaries is used to calculate the importance of each sentence in the document. The use of a consistent network ensures that the latent summaries contain the necessary document information. The proposed model was evaluated using the English summarization benchmark datasets, DUC 2001 and DUC 2002, as well as the large Chinese summarization dataset, LCSTS. We also created a finance news dataset in Chinese and had a group of experts in bank auditing label a summary for use as a reference.
[1] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Advances in neural information processing systems, 2014, pp. 3104–3112.
[2] A. M. Rush, S. Chopra, and J. Weston, “A neural attention model for abstractive sentence summarization,” arXiv preprint arXiv:1509.00685, 2015.
[3] A. See, P. J. Liu, and C. D. Manning, “Get to the point: Summarization with pointergenerator networks.” CoRR, vol. abs/1704.04368, 2017.
[4] R. Mihalcea and P. Tarau, “Textrank: Bringing order into texts,” in Proceedings of Empirical Methods for Natural Language Processing, 2004, pp. 404–411.
[5] G. Erkan and D. R. Radev, “Lexrank: Graph-based lexical centrality as salience in text summarization.” J. Artif. Intell. Res. (JAIR), vol. 22, pp. 457–479, 2004.
[6] H.-Y. Wang, J.-W. Chang, and J.-W. Huang, “User intention-based document summarization on heterogeneous sentence networks.” in DASFAA (2), ser. Lecture Notes in Computer Science, G. Li, J. Yang, J. Gama, J. Natwichai, and Y. Tong, Eds., vol. 11447. Springer, 2019, pp. 572–587.
[7] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.
[8] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio, “Show, attend and tell: Neural image caption generation with visual attention,” in International conference on machine learning, 2015, pp. 2048–2057.
[9] D. A. Reynolds, “Gaussian mixture models,” in Encyclopedia of Biometrics, 2009.
[10] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks.” CoRR, vol. abs/1703.10593, 2017.
[11] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017, pp. 5998–6008.
[12] C.-Y. Lin, “Rouge: a package for automatic evaluation of summaries,” in Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), 2004.
[13] H. Edmundson, “New methods in automatic extracting,” Journal of theACM, vol. 16, no. 2, pp. 264–285, 1969.
[14] Y. Gong and X. Liu, “Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis,” in Proceedings of SIGIR 2001, 2001, pp. 19–25.
[15] Z. Cao, W. Li, S. Li, F. Wei, and Y. Li, “Attsum: Joint learning of focusing and summarization with neural attention.” in COLING, N. Calzolari, Y. Matsumoto, and R. Prasad, Eds. ACL, 2016, pp. 547–556.
[16] T. Mikolov, “Recurrent neural network based language model.” in Interspeech, vol. 2, 2010, p. 3.
[17] Y.-C. Chen and M. Bansal, “Fast abstractive summarization with reinforce-selected sentence rewriting.” CoRR, vol. abs/1805.11080, 2018.
[18] R. Nallapati, F. Zhai, and B. Zhou, “Summarunner: A recurrent neural network based sequence model for extractive summarization of documents.” in AAAI, S. P. Singh and S. Markovitch, Eds. AAAI Press, 2017, pp. 3075–3081.
[19] R. Paulus, C. Xiong, and R. Socher, “A deep reinforced model for abstractive summarization.” CoRR, vol. abs/1705.04304, 2017.
[20] L. Page, S. Brin, R. Motwani, and T. Winograd, “The pagerank citation ranking: Bringing order to the web,” Stanford University, Tech. Rep., 1999.
[21] J. G. Carbonell and J. Goldstein, “The use of MMR, diversity-based reranking for reordering documents and producing summaries,” in Research and Development in Information Retrieval, 1998, pp. 335–336.
[22] Q. V. Le and T. Mikolov, “Distributed representations of sentences and documents.” in ICML, vol. 14, 2014, pp. 1188–1196.
[23] R. Kiros, Y. Zhu, R. Salakhutdinov, R. S. Zemel, R. Urtasun, A. Torralba, and S. Fidler, “Skip-thought vectors.” in NIPS, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds., 2015, pp. 3294–3302.
[24] M. Iyyer, V. Manjunatha, J. L. Boyd-Graber, and H. D. III, “Deep unordered composition rivals syntactic methods for text classification.” in ACL (1). The Association for Computer Linguistics, 2015, pp. 1681–1691.
[25] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” in ICLR (Workshop Poster), 2013.
[26] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, “A neural probabilistic language model,” JMLR, pp. 1137–1155, 2003.
[27] K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” Jun. 2014.
[28] S. Hochreither and J. Schmidhuber, “Long short-term memory,” 1997.
[29] B. Pang and L. Lee, “Opinion mining and sentiment analysis,” Foundations and Trends in Information Retrieval, vol. 2, no. 1-2, pp. 1–135, 2008.
[30] Y. Bengio, A. C. Courville, and P. Vincent, “Representation learning: A review and new perspectives.” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1798–1828, 2013.
[31] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting.” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
[32] B. Hu, Q. Chen, and F. Zhu, “Lcsts: A large scale chinese short text summarization dataset.” CoRR, vol. abs/1506.05865, 2015.
[33] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information,” Transactions of the Association for Computational Linguistics, vol. 5, pp. 135–146, 2017.
[34] A. M. Dai and Q. V. Le, “Semi-supervised sequence learning.” in NIPS, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds., 2015, pp. 3079–3087.
[35] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[36] R. Reh˚uˇrek and P. Sojka, “Software Framework for Topic Modelling with Large Corpora,” ˇ in Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Valletta, Malta: ELRA, May 2010, pp. 45–50.
[37] F. Barrios, F. L´opez, L. Argerich, and R. Wachenchauzer, “Variations of the similarity function of textrank for automated summarization.” CoRR, vol. abs/1602.03606, 2016.
[38] M. Freitag and Y. Al-Onaizan, “Beam search strategies for neural machine translation.” in NMT@ACL, T. Luong, A. Birch, G. Neubig, and A. M. Finch, Eds. Association for Computational Linguistics, 2017, pp. 56–60.
[39] P. Li, W. Lam, L. Bing, and Z. Wang, “Deep recurrent generative decoder for abstractive text summarization.” in EMNLP, M. Palmer, R. Hwa, and S. Riedel, Eds. Association for Computational Linguistics, 2017, pp. 2091–2100.