| 研究生: |
王夢昀 Wang, Meng-Yun |
|---|---|
| 論文名稱: |
基於聯合訓練之主題感知生成式文本摘要 Topic-Aware Abstractive Summarization with Joint Training Topic Model |
| 指導教授: |
高宏宇
Kao, Hung-Yu |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 英文 |
| 論文頁數: | 42 |
| 中文關鍵詞: | 序列到序列 、文本摘要 、主題模型 、預訓練語言模型 |
| 外文關鍵詞: | Seq2Seq, Abstractive Summarization, Topic model, BERT |
| 相關次數: | 點閱:116 下載:16 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
自動文本摘要生成是自然語言處理領域的經典研究方向之一,主要任務是將較長的文本在保留原始語義的基礎上生成一個較短的版本,在文本長度和文本資訊中保持平衡。近兩年出現了很多深度學習的方法。自動文本摘要可以幫助用戶快速的瀏覽和提取有用的資訊,在互聯網資訊爆炸的今天有著廣泛的應用前景。自動文摘的技術也涉及到句子表示、文本壓縮、語言模型等相關技術的研究,研究自動文摘也能促進其他領域的發展。
本論文主要提出了兩種方法來改進現有的模型: (1)提出了一種結合主題模型指導注意力機制的方法。主題模型能對文本的淺層語義進行建模,本論文依據傳統語言模型的原理,研究了神經主題模型與自動文本摘要任務聯合訓練的應用,原文本中有潛力被挑選作爲摘要的字詞可能是與文本主題比較相關的字詞,我們根據每篇文本的主題嵌入與解碼器一起計算出當前的注意力分佈,從而生成與文章主題較爲相關的摘要。 (2)提出了採用預訓練的BERT模型作為編碼器來提取原文本特徵的方法。這種遷移學習的兩段式訓練方法近年來得到了廣泛的應用,因爲BERT從海量無監督文本中學習到了通用的語言學知識,使得下游任務可以得益於預訓練語言模型的先驗資訊。實驗表明,使用BERT作為編碼器,能夠有效地提升以LSTM模型為編碼器產生的文本表示的品質。
Automatic text summary generation is one of the classic tasks in the field of natural language processing. The main task is to generate a shorter version based on the original text and maintain a balance between text length and text information. The technology of automatic text summarization can help people to browse and extract useful information and has a broad application prospect in the explosion of Internet information. The technology of automatic summarization also involves the research of sentence representation, text compression, language model, and other related technologies.
In this thesis, two methods are used to improve the existing models: (1) This paper proposes a method based on the topic model, which implies the topic model guiding the attention mechanism. The topic model is considered to be a useful tool for shallow semantic modeling. Based on the previous work, this paper studies the role of the topic model in supervised automatic summarization tasks. The potential words of the original text are selected as the summary may be words that are more relevant to the text's topic. We compute the current attention distribution along with the decoder based on the topic embedding of each text, to generate a summary that is more relevant to the topic of the text. (2) This paper proposes a method of extracting features using a pre-trained BERT model as an encoder. This two-stage training method of migration learning has been widely used in NLP tasks recently. Because BERT has learned general linguistic knowledge from massive unsupervised text, downstream tasks can benefit from the prior information of pre-trained language models. Experiments show that using BERT as the encoder can effectively improve the quality of the text representation of the encoder than the traditional LSTM model.
[1] Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems (pp. 3104-3112).
[2] Hu, B., Chen, Q., & Zhu, F. (2015). Lcsts: A large scale chinese short text summarization dataset. arXiv preprint arXiv:1506.05865.
[3] Rush, A. M., Harvard, S. E. A. S., Chopra, S., & Weston, J. (2017). A Neural Attention Model for Sentence Summarization. In ACLWeb. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.
[4] Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
[5] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
[6] Mikolov, T., Karafiát, M., Burget, L., Černocký, J., & Khudanpur, S. (2010). Recurrent neural network based language model. In Eleventh annual conference of the international speech communication association.
[7] Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.
[8] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
[9] Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
[10] Zhu, Y., Kiros, R., Zemel, R., Salakhutdinov, R., Urtasun, R., Torralba, A., & Fidler, S. (2015). Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In Proceedings of the IEEE international conference on computer vision (pp. 19-27).
[11] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).
[12] Srivastava, A., & Sutton, C. (2017). Autoencoding variational inference for topic models. arXiv preprint arXiv:1703.01488.
[13] Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
[14] Gu, J., Lu, Z., Li, H., & Li, V. O. (2016). Incorporating copying mechanism in sequence-to-sequence learning. arXiv preprint arXiv:1603.06393.
[15] Ma, S., Sun, X., Xu, J., Wang, H., Li, W., & Su, Q. (2017). Improving semantic relevance for sequence-to-sequence learning of chinese social media text summarization. arXiv preprint arXiv:1706.02459.
[16] Li, P., Lam, W., Bing, L., & Wang, Z. (2017). Deep recurrent generative decoder for abstractive text summarization. arXiv preprint arXiv:1708.00625.
[17] Lin, J., Sun, X., Ma, S., & Su, Q. (2018). Global encoding for abstractive summarization. arXiv preprint arXiv:1805.03989.
[18] Ma, S., Sun, X., Lin, J., & Wang, H. (2018). Autoencoder as assistant supervisor: Improving text representation for chinese social media text summarization. arXiv preprint arXiv:1805.04869.
[19] Dumais, S. T. (2004). Latent semantic analysis. Annual review of information science and technology, 38(1), 188-230.
[20] Hofmann, T. (1999, July). Probabilistic latent semantic analysis. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence (pp. 289-296). Morgan Kaufmann Publishers Inc.
[21] Nallapati, R., Zhou, B., Gulcehre, C., & Xiang, B. (2016). Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv preprint arXiv:1602.06023.
[22] See, A., Liu, P. J., & Manning, C. D. (2017). Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368.