簡易檢索 / 詳目顯示

研究生: 楊凱州
Yang, Kai-Chou
論文名稱: 基於自推理的通用型句編碼器
General Sentence Encoder based on Self-Inference
指導教授: 高宏宇
Kao, Hung-Yu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 62
中文關鍵詞: 自然語言理解句編碼器深度學習
外文關鍵詞: Natural Language Understanding, Sentence Encoder, Deep Learning
相關次數: 點閱:150下載:7
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 雖然深度學習已在自然語言處理的諸多領域取得重大突破,但目前的架構往往只會基於固定一種編碼器來為上下文建模,而不能隨著應用情境動態地選擇最合適的編碼方式,這將大幅限縮模型的靈活性以及通用性。為了解決此問題,我們於本論文提出了自推理神經網路 (Self-Inference Neural Network, SINN),其為卷積神經網路與遞歸神經網路組合而成的混合系統。自推理神經網路能並行地以多種編碼方式解讀一段文本,再藉由一個啟發式的比對機制來學習各個編碼器間的互補關係,從而讓模型能動態地根據上下文以及應用情境構建出一個最佳的編碼組合。

    我們自文本分類、情感偵測以及自然語言推理這三大任務中挑選出了六個指標性且性質各異的資料集,用以評估自推理神經網路的效能與通用型。實驗結果顯示透過有效地混合來自不同觀點的資訊,即便是單層的自推理神經網路也足以勝過許多深且複雜的架構。我們的模型於四個資料集上優於現今最佳的句編碼系統,也於其他兩個資料集取得了突出的表現。此外,自推理神經網路的編碼以及推理過程皆具高度的可解釋性。透過對語義混合過程的視覺化分析,我們打開了類神經網路的黑盒子,並據此逐個案例地探討每種編碼方式的適用性以及使用時機。

    While deep learning has achieved great success in natural language understanding, the mainstream models are usually based on a single encoding method, which is not general because each encoding method has its own suitable usage scenarios. To address this issue, we propose Self Inference Neural Network (SINN), a simple yet efficient sentence encoder which can dynamically leverage the knowledge from recurrent neural networks and convolutional neural networks. SINN would gather the semantic evidence from various perspectives in an interaction space and subsequently fuse them using a shared vector gate to the most relevant mixture of contextual information.

    We evaluate the proposed method on six competitive benchmarks among three natural language processing tasks. Experimental results demonstrate that our model sets a new state-of-the-art on four out of six benchmarks and has competitive performance on the remaining two over all sentence encoding methods. Furthermore, the encoding and inference process in our model is highly interpretable. Through visualizations of the fusion component, we open the black box of our network and explore the applicability and usage scenarios of each basic encoding method case by case.

    中文摘要 i Abstract ii Acknowledgements iii Contents iv List of Tables vii List of Figures viii 1 Introduction 1 1.1 Background 1 1.2 Motivation 3 1.3 Our work 6 2 Background 10 2.1 Sentence Encoder 10 2.1.1 Word Encoder 11 2.1.2 Context Encoder 12 2.1.3 Pooling Layer 12 2.2 Context Encoding 13 2.2.1 Recurrent Encoder 13 2.2.2 Convolutional Encoder 14 2.2.3 Self-Attentive Encoder 15 2.3 Hybrid System 16 2.3.1 Why to Mix Encoding Methods 17 2.3.2 How to Mix Encoding Methods 18 2.3.3 Issues of Hybrid Systems 19 3 The Proposed Method 21 3.1 Word Encoding Layer 22 3.2 Context Encoding Layer 24 3.3 Self-Inference 26 3.3.1 Attend 27 3.3.2 Interact 28 3.3.3 Extract 30 3.4 Representation Fusion 31 3.5 Pooling layer 33 4 Experiments 34 4.1 Dataset 34 4.2 Experimental Settings 37 4.3 Performance Comparison 38 4.3.1 Natural Language Inference 38 4.3.2 Text Classification 40 4.3.3 Sentiment Analysis 42 5 Analysis 43 5.1 Fusion Method 43 5.2 Ablation Study 46 5.3 Error Analysis 47 5.4 How to Infer 49 6 Conclusion 52 6.1 Future Work 53 References 55

    [1] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 3111–3119. Curran Associates, Inc., 2013.
    [2] L. C. Jain and L. R. Medsker. Recurrent Neural Networks: Design and Applications. CRC Press, Inc., Boca Raton, FL, USA, 1st edition, 1999.
    [3] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097–1105. Curran Associates, Inc., 2012.
    [4] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. CoRR, abs/1706.03762, 2017.
    [5] Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. Recurrent convolutional neural networks for text classification. In Twenty-ninth AAAI conference on artificial intelligence, 2015.
    [6] Leo Breiman. Random forests. Mach. Learn., 45(1):5–32, October 2001.
    [7] Robert E. Schapire. A brief introduction to boosting. In Proceedings of the 16th International Joint Conference on Artificial Intelligence - Volume 2, IJCAI’99, pages 1401–1406, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc.
    [8] Jeffrey Pennington, Richard Socher, and Christopher D. Manning. Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, 2014.
    [9] Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Armand Joulin. Advances in pre-training distributed word representations. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018), 2018.
    [10] Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. A compare-propagate architecture with alignment factorization for natural language inference. CoRR, abs/1801.00102, 2018.
    [11] Ankur P. Parikh, Oscar T¨ackstr¨om, Dipanjan Das, and Jakob Uszkoreit. A decomposable attention model for natural language inference. CoRR, abs/1606.01933, 2016.
    [12] Douwe Kiela, Changhan Wang, and Kyunghyun Cho. Context-attentive embeddings for improved sentence representations. CoRR, abs/1804.07983, 2018.
    [13] Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, and Hui Jiang. Enhancing and combining sequential and tree LSTM for natural language inference. CoRR, abs/1609.06038, 2016.
    [14] Qian Chen, Zhen-Hua Ling, and Xiaodan Zhu. Enhancing sentence embedding with generalized pooling. CoRR, abs/1806.09828, 2018.
    [15] Deunsol Yoon, Dongbok Lee, and SangKeun Lee. Dynamic self-attention : Computing attention over words dynamically for sentence embedding. CoRR, abs/1808.07383, 2018.
    [16] M. Schuster and K.K. Paliwal. Bidirectional recurrent neural networks. Trans. Sig. Proc., 45(11):2673–2681, November 1997.
    [17] Sepp Hochreiter and J¨urgen Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735–1780, November 1997.
    [18] Junyoung Chung, C¸ aglar G¨ulc¸ehre, KyungHyun Cho, and Yoshua Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR, abs/1412.3555, 2014.
    [19] Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. CoRR, abs/1802.05365, 2018.
    [20] Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. A convolutional neural network for modelling sentences. CoRR, abs/1404.2188, 2014.
    [21] Rie Johnson and Tong Zhang. Semi-supervised convolutional neural networks for text categorization via region embedding. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 919–927. Curran Associates, Inc., 2015.
    [22] Yoon Kim. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882, 2014.
    [23] Rie Johnson and Tong Zhang. Deep pyramid convolutional neural networks for text categorization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 562–570, 2017.
    [24] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
    [25] Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Sen Wang, and Chengqi Zhang. Reinforced self-attention network: a hybrid of hard and soft attention for sequence modeling. CoRR, abs/1801.10296, 2018.
    [26] Jinbae Im and Sungzoon Cho. Distance-based self-attention network for natural language inference. CoRR, abs/1712.02047, 2017.
    [27] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
    [28] Peng Zhou, Zhenyu Qi, Suncong Zheng, Jiaming Xu, Hongyun Bao, and Bo Xu. Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. CoRR, abs/1611.06639, 2016.
    [29] Xingyou Wang, Weijie Jiang, and Zhiyong Luo. Combination of convolutional and recurrent neural network for sentiment analysis of short texts. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 2428–2437, 2016.
    [30] Chunting Zhou, Chonglin Sun, Zhiyuan Liu, and Francis Lau. A c-lstm neural network for text classification. arXiv preprint arXiv:1511.08630, 2015.
    [31] T. N. Sainath, O. Vinyals, A. Senior, and H. Sak. Convolutional, long short-term memory, fully connected deep neural networks. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4580–4584, April 2015.
    [32] Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alexander J. Smola, and Eduard H. Hovy. Hierarchical attention networks for document classification. In HLT-NAACL, 2016.
    [33] Natural Language Computing Group. R-net: Machine reading comprehension with self-matching networks. May 2017.
    [34] Rupesh Kumar Srivastava, Klaus Greff, and J¨urgen Schmidhuber. Highway networks. CoRR, abs/1505.00387, 2015.
    [35] Yoon Kim, Yacine Jernite, David Sontag, and Alexander M. Rush. Character-aware neural language models. CoRR, abs/1508.06615, 2015.
    [36] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, June 2016.
    [37] Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2015.
    [38] AdinaWilliams, Nikita Nangia, and Samuel Bowman. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122. Association for Computational Linguistics, 2018.
    [39] Tushar Khot, Ashish Sabharwal, and Peter Clark. Scitail: A textual entailment dataset from science question answering. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
    [40] Ellen Voorhees Dawn and Dawn M. Tice. The trec-8 question answering track evaluation. In In Text Retrieval Conference TREC-8, pages 83–105, 1999.
    [41] Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level convolutional networks for text classification. In Advances in neural information processing systems, pages 649–657, 2015.
    [42] Dan Hendrycks and Kevin Gimpel. Bridging nonlinearities and stochastic regularizers with gaussian error linear units. CoRR, abs/1606.08415, 2016.
    [43] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
    [44] Leslie N. Smith. No more pesky learning rate guessing games. CoRR, abs/1506.01186, 2015.
    [45] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567, 2015.
    [46] Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, SiWei, Hui Jiang, and Diana Inkpen. Recurrent neural network-based sentence encoder with gated attention for natural language inference. CoRR, abs/1708.01353, 2017.
    [47] Aarne Talman, Anssi Yli-Jyr¨a, and J¨org Tiedemann. Natural language inference with hierarchical bilstm max pooling architecture. CoRR, abs/1808.08762, 2018.
    [48] Yixin Nie and Mohit Bansal. Shortcut-stacked sentence encoders for multi-domain inference. CoRR, abs/1708.02312, 2017.
    [49] Wenpeng Yin, Dan Roth, and Hinrich Sch¨utze. End-task oriented textual entailment via deep explorations of inter-sentence interactions. In ACL, 2018.
    [50] Chunhua Liu, Shan Jiang, Hainan Yu, and Dong Yu. Multi-turn inference matching network for natural language inference. CoRR, abs/1901.02222, 2019.
    [51] XiaoyanWang, Pavan Kapanipathi, Ryan Musa, Mo Yu, Kartik Talamadupula, Ibrahim Abdelaziz, Maria Chang, Achille Fokoue, Bassem Makni, Nicholas Mattei, and Michael Witbrock. Improving natural language inference using external knowledge in the science questions domain. CoRR, abs/1809.05724, 2019.
    [52] Todor Mihaylov, Peter Clark, Tushar Khot, and Ashutosh Sabharwal. Can a suit of armor conduct electricity? a new dataset for open book question answering. In EMNLP, 2018.
    [53] Seonhoon Kim, Jin-Hyuk Hong, Inho Kang, and Nojun Kwak. Semantic sentence matching with densely-connected recurrent and co-attentive information. CoRR, abs/1805.11360, 2018.
    [54] Jeremy Howard and Sebastian Ruder. Fine-tuned language models for text classification. CoRR, abs/1801.06146, 2018.
    [55] Alexis Conneau, Holger Schwenk, Lo¨ıc Barrault, and Yann LeCun. Very deep convolutional networks for natural language processing. CoRR, abs/1606.01781, 2016.
    [56] Kyunghyun Cho, Bart van Merri¨enboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–1734, Doha, Qatar, October 2014. Association for Computational Linguistics.

    下載圖示 校內:2020-08-12公開
    校外:2020-08-12公開
    QR CODE