簡易檢索 / 詳目顯示

研究生: 王夢昀
Wang, Meng-Yun
論文名稱: 基於聯合訓練之主題感知生成式文本摘要
Topic-Aware Abstractive Summarization with Joint Training Topic Model
指導教授: 高宏宇
Kao, Hung-Yu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 42
中文關鍵詞: 序列到序列文本摘要主題模型預訓練語言模型
外文關鍵詞: Seq2Seq, Abstractive Summarization, Topic model, BERT
相關次數: 點閱:116下載:16
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 自動文本摘要生成是自然語言處理領域的經典研究方向之一,主要任務是將較長的文本在保留原始語義的基礎上生成一個較短的版本,在文本長度和文本資訊中保持平衡。近兩年出現了很多深度學習的方法。自動文本摘要可以幫助用戶快速的瀏覽和提取有用的資訊,在互聯網資訊爆炸的今天有著廣泛的應用前景。自動文摘的技術也涉及到句子表示、文本壓縮、語言模型等相關技術的研究,研究自動文摘也能促進其他領域的發展。
    本論文主要提出了兩種方法來改進現有的模型: (1)提出了一種結合主題模型指導注意力機制的方法。主題模型能對文本的淺層語義進行建模,本論文依據傳統語言模型的原理,研究了神經主題模型與自動文本摘要任務聯合訓練的應用,原文本中有潛力被挑選作爲摘要的字詞可能是與文本主題比較相關的字詞,我們根據每篇文本的主題嵌入與解碼器一起計算出當前的注意力分佈,從而生成與文章主題較爲相關的摘要。 (2)提出了採用預訓練的BERT模型作為編碼器來提取原文本特徵的方法。這種遷移學習的兩段式訓練方法近年來得到了廣泛的應用,因爲BERT從海量無監督文本中學習到了通用的語言學知識,使得下游任務可以得益於預訓練語言模型的先驗資訊。實驗表明,使用BERT作為編碼器,能夠有效地提升以LSTM模型為編碼器產生的文本表示的品質。

    Automatic text summary generation is one of the classic tasks in the field of natural language processing. The main task is to generate a shorter version based on the original text and maintain a balance between text length and text information. The technology of automatic text summarization can help people to browse and extract useful information and has a broad application prospect in the explosion of Internet information. The technology of automatic summarization also involves the research of sentence representation, text compression, language model, and other related technologies.
    In this thesis, two methods are used to improve the existing models: (1) This paper proposes a method based on the topic model, which implies the topic model guiding the attention mechanism. The topic model is considered to be a useful tool for shallow semantic modeling. Based on the previous work, this paper studies the role of the topic model in supervised automatic summarization tasks. The potential words of the original text are selected as the summary may be words that are more relevant to the text's topic. We compute the current attention distribution along with the decoder based on the topic embedding of each text, to generate a summary that is more relevant to the topic of the text. (2) This paper proposes a method of extracting features using a pre-trained BERT model as an encoder. This two-stage training method of migration learning has been widely used in NLP tasks recently. Because BERT has learned general linguistic knowledge from massive unsupervised text, downstream tasks can benefit from the prior information of pre-trained language models. Experiments show that using BERT as the encoder can effectively improve the quality of the text representation of the encoder than the traditional LSTM model.

    中文摘要 I Abstract II TABLE LISTING VI FIGURE LISTING VII 1 INTRODUCTION 1 1.1 Background 1 1.2 Motivation 3 1.3 Our Approaches 7 1.4 Paper structure 8 2 RELATED WORK 9 2.1 Sequence-to-sequence model 9 2.1.1 Recurrent Neural Networks (RNN) 10 2.1.2 RNN Encoder-Decoder 11 2.1.3 Attention mechanism 11 2.2 Bidirectional Encoder Representations from Transformers (BERT) [8] 13 2.2.1 Transformer Encoder 13 2.2.2 Pre-training 14 2.2.3 Fine-tuning 14 2.3 Neural Topic Models 16 2.3.1 Topic model 16 2.3.2 Variational Autoencoder Topic Models [12] 17 3 METHOD 18 3.1 Preprocessing 19 3.2 Bert as Encoder 20 3.3 Topic Attention Mechanism 23 3.4 Joint Learning 25 3.5 Training Strategies 26 3.5.1 Teacher forcing 26 3.5.2 Beam search 26 4 EXPERIMENTS 27 4.1 Dataset 27 4.2 Evaluation Metric 28 4.3 Model Parameter 29 4.4 Model Performance Analysis 30 4.4.1 Number of topics 30 4.4.2 Rouge Score Performance 33 4.4.3 Some examples of result 35 4.4.4 Experiments in small dataset 37 5 CONCLUSION 38 6 FUTURE WORK 39 7 REFERENCES 40

    [1] Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems (pp. 3104-3112).
    [2] Hu, B., Chen, Q., & Zhu, F. (2015). Lcsts: A large scale chinese short text summarization dataset. arXiv preprint arXiv:1506.05865.
    [3] Rush, A. M., Harvard, S. E. A. S., Chopra, S., & Weston, J. (2017). A Neural Attention Model for Sentence Summarization. In ACLWeb. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.
    [4] Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
    [5] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
    [6] Mikolov, T., Karafiát, M., Burget, L., Černocký, J., & Khudanpur, S. (2010). Recurrent neural network based language model. In Eleventh annual conference of the international speech communication association.
    [7] Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.
    [8] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
    [9] Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
    [10] Zhu, Y., Kiros, R., Zemel, R., Salakhutdinov, R., Urtasun, R., Torralba, A., & Fidler, S. (2015). Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In Proceedings of the IEEE international conference on computer vision (pp. 19-27).
    [11] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).
    [12] Srivastava, A., & Sutton, C. (2017). Autoencoding variational inference for topic models. arXiv preprint arXiv:1703.01488.
    [13] Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
    [14] Gu, J., Lu, Z., Li, H., & Li, V. O. (2016). Incorporating copying mechanism in sequence-to-sequence learning. arXiv preprint arXiv:1603.06393.
    [15] Ma, S., Sun, X., Xu, J., Wang, H., Li, W., & Su, Q. (2017). Improving semantic relevance for sequence-to-sequence learning of chinese social media text summarization. arXiv preprint arXiv:1706.02459.
    [16] Li, P., Lam, W., Bing, L., & Wang, Z. (2017). Deep recurrent generative decoder for abstractive text summarization. arXiv preprint arXiv:1708.00625.
    [17] Lin, J., Sun, X., Ma, S., & Su, Q. (2018). Global encoding for abstractive summarization. arXiv preprint arXiv:1805.03989.
    [18] Ma, S., Sun, X., Lin, J., & Wang, H. (2018). Autoencoder as assistant supervisor: Improving text representation for chinese social media text summarization. arXiv preprint arXiv:1805.04869.
    [19] Dumais, S. T. (2004). Latent semantic analysis. Annual review of information science and technology, 38(1), 188-230.
    [20] Hofmann, T. (1999, July). Probabilistic latent semantic analysis. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence (pp. 289-296). Morgan Kaufmann Publishers Inc.
    [21] Nallapati, R., Zhou, B., Gulcehre, C., & Xiang, B. (2016). Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv preprint arXiv:1602.06023.
    [22] See, A., Liu, P. J., & Manning, C. D. (2017). Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368.

    下載圖示 校內:2020-08-20公開
    校外:2020-08-20公開
    QR CODE