簡易檢索 / 詳目顯示

研究生: 王仁暐
Wang, Jen-Wei
論文名稱: 階層式自注意力增強中文問題分類編碼器
HAEE: Question Classification Using Hierarchical Intra-Attention Enhancement Encoder
指導教授: 黃仁暐
Huang, Jen-Wei
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2019
畢業學年度: 108
語文別: 中文
論文頁數: 44
中文關鍵詞: 問題分類雙向閘門控制循環單元注意力機制
外文關鍵詞: Question Classification, Bidirectional Gated Recurrent Unit, Attention mechanism
相關次數: 點閱:113下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著電子商務的發展,自動問答在客服系統中扮演著相當重要的角色來降低人力。問題分類是根據答案類型為問題分配標籤,是問答系統中的任務之一。以前的方法通常使用人工定義的特徵,如命名實體識別,但它需要事前定義的字典或工具。近幾年,機器學習方法應用於該任務並得到很高的準確度。在本篇論文中,我們提出了HAEE,一種階層式自注意力增強編碼器,它由雙向閘門控制控循環單元和自注意力機制所組成。另外,我們採用字符的輸入來解決字詞未出現在字典裡的問題,並創建多個自注意力機制來模擬字符(中文)或單詞(英文)之間的關係,以增強每個字符對這句話的影響。我們利用實際的企業環境以及幾個資料集來評估HAEE模型。從實驗結果來看,在分類任務中,HAEE優於現存表現最為優異的幾個模型,特別是針對中文資料集。

    Automated question-answering systems play an important role in e-commerce customer service systems. Question classification involves assigning labels to questions according to the type of answer required. Most previous approaches, such as named entity recognition, are based on a predefined dictionary in conjunction with machine learning to enhance accuracy. In this paper, we propose a hierarchical enhancement encoder featuring bidirectional gated recurrent networks and character input to address the out-of-vocabulary problem. We also created multiple intra-attentions to simulate relationships among characters (in Chinese) or words (in English) to enhance the influence of tokens within a sentence. In experiments conducted in a real-world corporate setting with several datasets, the proposed HAEE system outperformed existing state-of-the-art models in question classification tasks, particularly when applied to a Chinese corpus.

    中文摘要 i Abstract ii Acknowledgment iii Table of Contents iv List of Tables vi List of Figures vii 1 Introduction 1 2 Related Work 4 3 Preliminaries 6 3.1 Sentence preprocess 7 3.1.1 Tokenization 7 3.1.2 Stemming and Lemmatization 8 3.2 Information retrieval (IR) 9 3.3 Classification 10 4 Methodology 12 4.1 Preprocessing 12 4.2 Character Level 14 4.2.1 Character Level Intra-Attention: 14 4.2.2 Character Level Encoder: 15 4.3 Semantic Level 19 Semantic Level Intra-Attention: 19 4.3.2 Semantic Level Encoder: 19 4.4 Deep Semantic Level 20 4.5 Prediction 21 5 Experiments 24 5.1 Dataset Description 24 5.2 Model Setup 25 5.3 Compare Methods 26 5.4 Evaluation Metric 27 5.5 Experiment Results and Analysis 27 5.5.1 Number of intra-attentions: 27 5.5.2 Number of levels: 29 5.5.3 Result for all datasets: 30 5.5.4 The speed of convergence: 31 5.5.5 Ablation analysis: 32 5.5.6 The need of word segmentation: 33 5.6 Conclusions 36 6 Future Works 37 7 Case Study 38 7.1 CSQC dataset analysis 38 7.2 Case study for CSQC dataset 39 Reference 42

    [1] J. Ma, K. Ganchev, and D. Weiss, “State-of-the-art chinese word segmentation with bi-lstms.” in EMNLP, E. Riloff, D. Chiang, J. Hockenmaier, and J. Tsujii, Eds.
    Association for Computational Linguistics, 2018, pp. 4902–4908. [Online]. Available:
    http://dblp.uni-trier.de/db/conf/emnlp/emnlp2018.html#MaGW18
    [2] Y. Tay, L. A. Tuan, and S. C. Hui, “Multi-cast attention networks.” in KDD, Y. Guo and F. Farooq, Eds. ACM, 2018, pp. 2299–2308. [Online]. Available: http://dblp.uni-trier.de/db/conf/kdd/kdd2018.html#TayTH18
    [3] S. Yoon, F. Dernoncourt, D. S. Kim, T. Bui, and K. Jung, “A compare-aggregate model with latent clustering for answer selection.” CoRR, vol. abs/1905.12897, 2019. [Online]. Available: http://dblp.uni-trier.de/db/journals/corr/corr1905.html#abs-1905-12897
    [4] Y. Fu and Y. Feng, “Natural answer generation with heterogeneous memory.” In NAACL-HLT, M. A. Walker, H. Ji, and A. Stent, Eds. Association for Computational Linguistics, 2018, pp. 185–195. [Online]. Available: http://dblp.uni-trier.de/db/conf/naacl/naacl2018-1.html#FuF18
    [5] G. Liu and J. Guo, “Bidirectional lstm with attention mechanism and convolutional layer for text classification.” Neurocomputing, vol. 337, pp. 325–338, 2019. [Online]. Available: http://dblp.uni-trier.de/db/journals/ijon/ijon337.html#LiuG19
    [6] W. Xia, W. Zhu, B. Liao, M. Chen, L. Cai, and L. Huang, “Novel architecture for long short-term memory used in question classification.” Neurocomputing, vol. 299, pp. 20–31, 2018. [Online]. Available: http://dblp.uni-trier.de/db/journals/ijon/ijon299.
    html#XiaZLCCH18
    [7] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in Neural Information Processing Systems 26, C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, Eds., 2013, pp. 3111–3119. [Online]. Available: http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality
    [8] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation.” in EMNLP, vol. 14, 2014, pp. 1532–1543.
    [9] Q. V. Le and T. Mikolov, “Distributed representations of sentences and documents.” In ICML, vol. 14, 2014, pp. 1188–1196.
    [10] Y. Kim, “Convolutional neural networks for sentence classification.” in EMNLP, A. Moschitti, B. Pang, and W. Daelemans, Eds. ACL, 2014, pp. 1746–1751. [Online]. Available: http://dblp.uni-trier.de/db/conf/emnlp/emnlp2014.html#Kim14
    [11] H. He, K. Gimpel, and J. J. Lin, “Multi-perspective sentence similarity modeling with convolutional neural networks.” in EMNLP, L. M`arquez, C. Callison-Burch, J. Su, D. Pighin, and Y. Marton, Eds. The Association for Computational Linguistics, 2015, pp. 1576–1586. [Online]. Available: http://dblp.uni-trier.de/db/conf/emnlp/emnlp2015.html#HeGL15
    [12] T. Mikolov, “Recurrent neural network based language model.” in Interspeech, vol. 2, 2010,p. 3.
    [13] Y. Bengio, R. Ducharme, and P. Vincent, “A neural probabilistic language model.” In NIPS, T. K. Leen, T. G. Dietterich, and V. Tresp, Eds. MIT Press, 2000, pp. 932–938. [Online]. Available: http://dblp.uni-trier.de/db/conf/nips/nips2000.html#BengioDV00
    [14] M. Tan, B. Xiang, and B. Zhou, “Lstm-based deep learning models for nonfactoid answer selection.” CoRR, vol. abs/1511.04108, 2015. [Online]. Available: http://dblp.uni-trier.de/db/journals/corr/corr1511.html#TanXZ15
    [15] G. Shen, Y. Yang, and Z.-H. Deng, “Inter-weighted alignment network for sentence pair modeling.” in EMNLP, M. Palmer, R. Hwa, and S. Riedel, Eds. Association for Computational Linguistics, 2017, pp. 1179–1189. [Online]. Available: http://dblp.uni-trier.de/db/conf/emnlp/emnlp2017.html#ShenYD17
    [16] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, “Deep contextualized word representations,” in Proc. of NAACL, 2018.
    [17] R. Zhang, H. Lee, and D. R. Radev, “Dependency sensitive convolutional neural networks for modeling sentences and documents.” in HLT-NAACL, K. Knight, A. Nenkova, and O. Rambow, Eds. The Association for Computational Linguistics, 2016, pp. 1512–1521. [Online]. Available: http://dblp.uni-trier.de/db/conf/naacl/naacl2016.html#ZhangLR16
    [18] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” 2014, cite arxiv:1409.0473Comment: Accepted at ICLR 2015 as oral presentation. [Online]. Available: http://arxiv.org/abs/1409.0473
    [19] R. K. Srivastava, K. Greff, and J. Schmidhuber, “Highway networks,” 2015, cite arxiv:1505.00387Comment: 6 pages, 2 figures. Presented at ICML 2015 Deep Learning workshop. Full paper is at arXiv:1507.06228. [Online]. Available: http://arxiv.org/abs/1505.00387
    [20] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need.” in NIPS, I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett, Eds., 2017, pp. 5998–6008. [Online]. Available: http://dblp.uni-trier.de/db/conf/nips/nips2017.html#VaswaniSPUJGKP17
    [21] C. Cortes and V. Vapnik, “Support-vector networks.” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995. [Online]. Available: http://dblp.uni-trier.de/db/journals/ml/ml20.html#CortesV95
    [22] D. Gupta, R. Pujari, A. Ekbal, P. Bhattacharyya, A. Maitra, T. Jain, and S. Sengupta,“Can taxonomy help? improving semantic question matching using question taxonomy,” in Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, 2018, pp. 499–513. [Online]. Available: http://aclweb.org/anthology/C18-1042

    下載圖示 校內:2024-10-30公開
    校外:2024-10-30公開
    QR CODE