| 研究生: |
許元大 Hsu, Yuan-ta |
|---|---|
| 論文名稱: |
增強式注意力深度學習架構與混合嵌入模型之中文機器閱讀理解系統 Stronger Attention and Hybrid Embedding model for Chinese Machine Reading Comprehension system |
| 指導教授: |
王駿發
Wang, Jhing-Fa |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 英文 |
| 論文頁數: | 49 |
| 中文關鍵詞: | 字嵌入模型 、詞嵌入模型 、指針網路模型 、長短期記憶模型 、增強式注意力架構 、中文機器閱讀理解系統 |
| 外文關鍵詞: | word embedding model, character embedding model, pointer network model, long short-term memory model, stronger attention architecture, Chinese machine reading comprehension system |
| 相關次數: | 點閱:155 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,很多人將深度學習技術應用在機器閱讀理解上,且其成效越來越不錯,但這些模型通常都使用在英文文章理解上,所以直接拿相同模型去中文場域上使用,效果並不會太好。本研究中提出一個基於增強式注意力架構與混合嵌入模型之中文機器閱讀理解系統,此系統能夠有效地解決中文場域上的機器閱讀理解(MRC)問題。本系統主要分為5層,1.嵌入層 2.編碼層 3.匹配層 4.輸出層 5.生成層。嵌入層,利用字嵌入模型與詞嵌入模型分別將文章與問題的語句分別轉換成字向量與詞向量;編碼層,透過長短期記憶模型(LSTM, Long Short-Term Memory) 將嵌入層輸出的向量(文章與問題之字詞向量)分別編碼成富有上下文關聯性的向量,也代表分別理解文章與問題內的含意;匹配層,藉由增強式注意力架構將文章中與問題相關性較高的字做加權。直觀來看,這一層主要是根據問題在文章中標記重點;輸出層,使用長短期記憶模型(LSTM, Long Short-Term Memory)與指針網路模型(Pointer Network)將匹配層與編碼層的輸出解碼成開始位置與結束位置;生成層,藉由解碼後的開始位置與結束位置在文章中擷取出預測答案。另外,我們利用台達閱讀理解資料庫(DRCD)進行大量實驗, 此資料庫為用戶查詢閱讀理解(user query reading comprehension) 的繁體中文資料庫。在實驗結果上,我們模型的準確率達到70%。此資料庫上,以往模型的準確率大致為55%左右。相較於以往的研究,我們的準確率高於15%。從此得知,我們的模型不但能適當地解決機器閱讀理解(MRC)在繁體中文的問題,並且能在此資料庫上獲得最佳的準確率。
In recent years, many people have applied deep learning technology to machine reading comprehension, so technology of MRC are getting better and better. These models of MRC are usually used in the understanding of English articles. If we use the same model directly in Chinese, the effect is not good. In this study, Stronger Attention and Hybrid Embedding model for Chinese Machine Reading Comprehension system is proposed. This system can effectively solve the machine reading comprehension (MRC) problem in Chinese. The system is mainly divided into 5 layers, 1. Hybrid Embedding layer 2. Encoding layer 3. Stronger Attention layer 4. Output layer 5. Generation layer. The hybrid embedding layer uses the character-embedding model and the word-embedding model to respectively convert the word of the article and the question into a character vector and a word vector. In the encoding layer, the vector which is output of the embedding layer are respectively encoded into vectors with contextual relevance by a Long Short-Term Memory. This contextual relevance vector also represents the meaning of the article and the question. In the stronger attention layer, the words in the article that are more relevant to the problem are weighted by stronger attentional architecture. Intuitively, mark the emphasis in the article based on the question. In the output layer, this layer decodes the output of the stronger attention layer and the encoding layer into a starting position and an ending position by long short-term memory model and pointer network model. In the generating layers, the predicted answer in the article is extracted by the decoded start position and end position. In addition, we conducted a series of experiments on our model with the Delta Reading Comprehension Dataset (DRCD), which is a traditional Chinese dataset for user query reading comprehension. In the experimental results, the accuracy of our model reached 70%. In this dataset, the accuracy of previous models is roughly 55%. Compared to previous studies, our accuracy is higher than 15%. From this, we know that our model not only properly solve the problem of machine reading comprehension (MRC) in traditional Chinese, but also get the best accuracy in this dataset.
[1] M.Richardson,C.J.Burges,andE.Renshaw,‘‘Mctest:Achallengedataset for the open-domain machine comprehension of text,’’ in Proc. Conf. Empirical Methods Natural Lang. Process., 2013, pp. 193–203.
[2] G.Lai,Q.Xie,H.Liu,Y.Yang,andE.Hovy.(2017).‘‘RACE:Large-scale reading comprehension dataset from examinations.’’ [Online]. Available: https://arxiv.org/abs/1704.04683.
[3] K. M. Hermann et al., ‘‘Teaching machines to read and comprehend,’’ in Proc. Adv. Neural Inf. Process. Syst., 2015, pp. 1693–1701.
[4] F.Hill,A.Bordes,S.Chopra,andJ.Weston.(2015).‘‘Thegoldilocksprinciple: Reading children’s books with explicit memory representations.’’ [Online]. Available: https://arxiv.org/abs/1511.02301.
[5] P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang. (2016). ‘‘Squad: 100000+questions for machine comprehension of text.’’ [Online]. Available: https://arxiv.org/abs/1606.05250.
[6] A. Trischler et al. (2016). ‘‘Newsqa: A machine comprehension dataset.’’ [Online]. Available: https://arxiv.org/abs/1611.09830.
[7] T. Nguyen, M. Rosenberg, S. Xia, J. Gao, and D. Li. (2016). ‘‘Ms marco: A human generated machine reading comprehension dataset.’’ [Online]. Available: https://arxiv.org/abs/1611.09268.
[8] T. Kočiský et al., ‘‘The narrativeqa reading comprehension challenge,’’ Trans. Assoc. Comput. Linguistics, vol. 6, pp. 317–328, Jul. 2018.
[9] S. Wang and J. Jiang. (2016). ‘‘Machine comprehension using match-LSTM and answer pointer.’’ [Online]. Available: https://arxiv. org/abs/1608.07905.
[10] M. Seo, A. Kembhavi, A. Farhadi, and H. Hajishirzi. (2016). ‘‘Bidirectional attention flow for machine comprehension.’’ [Online]. Available: https://arxiv.org/abs/1611.01603.
[11] C. Xiong, V. Zhong, and R. Socher. (2014). ‘‘Dynamic coattention networks for question answering.’’ [Online]. Available: https://arxiv. org/abs/1611.01604.
[12] W. Wang, N.Yang, F. Wei, B.Chang, and M. Zhou,‘‘Gated self-matching networks for reading comprehension and question answering,’’ in Proc. 55thAnnu.MeetingAssoc.Comput.Linguistics,vol.1,2017,pp.189–198.
[13] A. W. Yu et al. (2018). ‘‘QANet: Combining local convolution with global self-attention for reading comprehension.’’ [Online]. Available: https://arxiv.org/abs/1804.09541.
[14] Y. Shen, P.-S. Huang, J. Gao, and W. Chen, ‘‘ReasoNet: Learning to stop reading in machine comprehension,’’ in Proc. 23rd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2017, pp. 1047–1055.
[15] C. Tan, F. Wei, N. Yang, B. Du, W. Lv, and M. Zhou. (2017). ‘‘S-NET: From answer extraction to answer generation for machine reading comprehension.’’ [Online]. Available: https://arxiv.org/abs/ 1706.04815.
[16] W. He et al. (2017). ‘‘DuReader: A chinese machine reading comprehension dataset from real-world applications.’’ [Online]. Available: https://arxiv.org/abs/1711.05073.
[17] Z. Li, J. Xu, Y. Lan, J. Guo, Y. Feng, and X. Cheng, ‘‘Hierarchical answer selection framework for multi-passage machine reading comprehension,’’ in Proc. China Conf. Inf. Retr., 2018, pp. 93–104.
[18] M. Yan et al. (2018). ‘‘A deep cascade model for multi-document reading comprehension.’’ [Online]. Available: https://arxiv.org/abs/1811.11374.
[19] J.Liu,W.Wei,M.Sun,H.Chen,Y.Du,andD.Lin,‘‘Amulti-answermultitask framework for real-world machine reading comprehension,’’ in Proc. Conf. Empirical Methods Natural Lang. Process., 2018, pp. 2109–2118.
[20] Chih Chieh Shao, Trois Liu, Yuting Lai, Yiying Tseng, and Sam Tsai. 2018. DRCD: a Chinese machine reading comprehension dataset. arXiv:1806.00920.
[21] R.Baeza-YatesandB.Ribeiro-Neto,ModernInformationRetrieval.Reading, MA, USA: Addison-Wesley, 2011.
[22] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, ‘‘Learning representations by back-propagating errors,’’ Nature, vol. 323, no. 6088, p. 533, 1986.
[23] J. Pennington, R. Socher, and C. Manning, ‘‘Glove: Global vectors for word representation,’’ in Proc. Conf. Empirical Methods Natural Lang. Process. (EMNLP), 2014, pp. 1532–1543.
[24] K. Cho et al. (2014). ‘‘Learning phrase representations using RNN encoder-decoder for statistical machine translation.’’ [Online]. Available: https://arxiv.org/abs/1406.1078.
[25] I. V. Serban, A. Sordoni, Y. Bengio, A. C. Courville, and J. Pineau, ‘‘Building end-to-end dialogue systems using generative hierarchical neural network models,’’ in Proc. AAAI, 2016, pp. 3776–378.
[26] M. E. Peters et al. (2018). ‘‘Deep contextualized word representations.’’ [Online]. Available: https://arxiv.org/abs/1802.05365.
[27] Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. ‘‘Dropout: a simple way to prevent neural networks from overfitting. ’’ Journal of Machine Learning Research, 2014.
[28] Matthew D. Zeiler. ‘‘ADADELTA: an adaptive learning rate method. ’’ CoRR, abs/1212.5701, 2012.
[29] Wei Yang et al. ‘‘Data Augmentation for BERT Fine-Tuning in Open-Domain Question Answering. ’’ David R. Cheriton School of Computer Science, University of Waterloo, abs/1904.06652, 2019.
校內:2022-08-01公開