簡易檢索 / 詳目顯示

研究生: 蔡子健
Cai, Zi-Jian
論文名稱: 基於注意力獎勵機制的條件序列生成對抗學習
Conditional Sequence Generative Adversarial Learning with Attention-Based Rewards
指導教授: 高宏宇
Kao, Hung-Yu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 50
中文關鍵詞: 自然語言處對抗生成網絡注意力獎勵機制
外文關鍵詞: Nature Language Processing, Generative Adversarial Nets, Attention-Based Reward
相關次數: 點閱:98下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著深度學習技術的發展,為了解決一些複雜的序列生成問題,越來越多的神經網路架構被提出。生成對抗學習就是其中最新穎的策略之一。應用了這種特殊想法的模型通常被稱為“生成對抗網絡(GANs)”,並且該網絡是由兩個部分組成:生成器和判別器。這些模型使用判別器來指導生成器的訓練從而提高彼此的效能。與此同時,這種結構已經在圖像處理方面做出了巨大的貢獻。然而,GANs在文本生成方面的影響一直不夠穩定。其主要導致GANs很難在自然語言處理(NLP)領域取得突破的原因有三個。首先,將對話生成問題視為一種決策步驟的話,由採樣操作而得到的離散數據難以通過梯度的方式從判別器傳遞到生成器。其次,由於訓練和評估遞歸神經網絡(RNN)的過程中採用了不同的執行策略,因此在測試過程中誤差會隨著序列的產生而不斷積累。對此我們稱這種現象為“暴露偏差”。最後,重要的是判別器設計之初只能評估一個完整的序列,而對於其中每一個時間點,想要提取獨立詞彙部分的獎勵是十分困難的。總而言之,如果我們希望GANs能夠在NLP領域得到應用,那麼如何處理這些問題就變成了至關重要的因素。
    在本篇論文中,我們提出了一個條件序列生成對抗網絡,通過採用獎勵回饋注意力機制的策略來解決這些問題。我們的方法是在訓練GANs的同時加入一個注意力機制。這種模型可以根據字詞和句子之間的潛在關聯將來自判別器的反饋動態分配給生成器,從而使得網絡的訓練更加穩定高效。從合成數據的實驗結果可以看出,我們的模型能夠產生更優質的文字序列。此外在一些真實數據集的實驗中,我們的模型也表現出了相比以往的基本模型更為顯著的效能提升。

    With the significant development of the deep learning technique, more and more neural networks have been proposed to solve some intricate problems about sequence generation. Generative adversarial Learning is one of the most novel strategies. Models applying this particular idea generally called “Generative Adversarial Nets (GANs)” consists of two parts: the Generator and the Discriminator. These models use the discriminator to guide the training of the generator for improving the effectiveness of each other. At the same time, GANs have already achieved great contributions to image processing. However, the effect of GANs on text generation has been shown unstable. Three major limitations cause GANs are hard to make a breakthrough in Nature Language Processing (NLP). Firstly, with considering the dialogue generation problem as a kind of decision-making step, the discrete outputs generated by the sampling operation is difficult to pass through the gradient from the discriminator to the generator. Secondly, prediction errors will be accumulated during generating sequence because of the different strategy between training and testing using the recurrent neural network (RNN). Therefore, we call it “exposure bias” for short. Finally yet importantly, the discriminator is only able to evaluate a complete sequence, which for every time steps, it is harsh that to extract the current score for every partial word. In summary, how to deal with these series of questions has become the critical factor if we can apply GANs in the NLP field.
    In this paper, we propose a conditional sequence generative adversarial network to solve these problems by using the attention-based reward strategy. We jointly train an attention mechanism and the GANs. This model dynamically assigns the weights of feedback information from the discriminator back to the generator conditioned on the potential associations between words and sentences, which makes the training process much more stable and computationally efficient. Experimental results on synthetic data demonstrate that our model can generate better sequences. Moreover, we report a significant improvement of our model over the previous baselines on several real-world tasks.

    中文摘要 I ABSTRACT II TABLE LISTING VII FIGURE LISTING VIII 1 INTRODUCTION 1 1.1 Background 1 1.2 Motivation 4 1.3 Our Approaches 8 1.4 Paper structure 10 2 RELATED WORK 10 2.1 Gumbel-softmax distribution 10 2.2 Sequence generation and SeqGAN 13 2.4 Sequence autoencoder model 17 2.5 Attention Mechanism 17 3 ATTENTION-BASED REWARDED CONDITIONAL SEQUENCE GENERATIVE ADVERSARIAL NETS 20 3.1 Preliminary 20 3.2 Generative Adversarial Nets 21 3.3 Policy Gradient Adversarial Training 26 3.4 Training Strategies 28 4 EXPERIMENT AND RESULTS 30 4.1 Dataset 30 4.2 Evaluation Metrics 32 4.3 Method Parameters 33 4.4 Synthetic Data Experiments 34 4.5 Real-world Data Experiments 44 5 CONCLUSION 47 6 REFERENCES 48

    [1] VINYALS, Oriol; LE, Quoc. A neural conversational model. arXiv preprint arXiv:1506.05869, 2015.
    [2] LI, Jiwei, et al. A persona-based neural conversation model. arXiv preprint arXiv:1603.06155, 2016.
    [3] LI, Jiwei, et al. A diversity-promoting objective function for neural conversation models. arXiv preprint arXiv:1510.03055, 2015.
    [4] SUTSKEVER, Ilya; VINYALS, Oriol; LE, Quoc V. Sequence to sequence learning with neural networks. In: Advances in neural information processing systems. 2014. p. 3104-3112.
    [5] Vinyals, Oriol, and Quoc Le. "A neural conversational model." arXiv preprint arXiv:1506.05869 (2015).
    [6] Wen, Tsung-Hsien, et al. "A network-based end-to-end trainable task-oriented dialogue system." arXiv preprint arXiv:1604.04562 (2016).
    [7] Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.
    [8] Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural net-works." Advances in neural information processing systems. 2014.
    [9] Bengio, Samy, et al. "Scheduled sampling for sequence prediction with recurrent neural networks." Advances in Neural Information Processing Systems. 2015.
    [10] Sordoni, Alessandro, et al. "A neural network approach to context-sensitive generation of conversational responses." arXiv preprint arXiv:1506.06714 (2015).
    [11] Yu, Lantao, et al. "SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient." AAAI. 2017.
    [12] Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.
    [13] Li, Jiwei, et al. "Deep reinforcement learning for dialogue generation." arXiv preprint arXiv:1606.01541 (2016).
    [14] Li, Jiwei, et al. "Adversarial learning for neural dialogue generation." arXiv preprint arXiv:1701.06547 (2017).
    [15] JANG, Eric; GU, Shixiang; POOLE, Ben. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144, 2016.
    [16] Kusner, Matt J., and José Miguel Hernández-Lobato. "Gans for sequences of discrete elements with the gumbel-softmax distribution." arXiv preprint arXiv:1611.04051 (2016).
    [17] Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural machine translation by jointly learning to align and translate." arXiv preprint arXiv:1409.0473.(2014).
    [18] Kim, Yoon. "Convolutional neural networks for sentence classification." arXiv preprint arXiv:1408.5882 (2014).
    [19] SHANG, Lifeng; LU, Zhengdong; LI, Hang. Neural responding machine for short-text conversation. arXiv preprint arXiv:1503.02364, 2015.
    [20] DAI, Andrew M.; LE, Quoc V. Semi-supervised sequence learning. In: Advances in Neural Information Processing Systems. 2015. p. 3079-3087.
    [21] Serban, Iulian Vlad, et al. "Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models." AAAI. Vol. 16. 2016.
    [22] Arjovsky, Martin, Soumith Chintala, and Léon Bot-tou. "Wasserstein gan." arXiv preprint arXiv:1701.07875 (2017).
    [23] Srivastava, Rupesh Kumar, Klaus Greff, and Jürgen Schmidhuber. "Highway networks." arXiv preprint arXiv:1505.00387 (2015).
    [24] Cho, Kyunghyun, et al. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." arXiv preprint arXiv:1406.1078 (2014)
    [25] CHELBA, Ciprian, et al. One billion word benchmark for measuring progress in statistical language modeling. arXiv preprint arXiv:1312.3005, 2013.
    [26] PAPINENI, Kishore, et al. BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 2002. p. 311-318.
    [27] FEDUS, William; GOODFELLOW, Ian; DAI, Andrew M. MaskGAN: Better Text Generation via Filling in the _. arXiv preprint arXiv:1801.07736, 2018.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE