簡易檢索 / 詳目顯示

研究生: 甘家豪
Kan, Chia-Hao
論文名稱: 結合T5與SBERT之非同步遠距教學答案-問句生成系統
Asynchronous distance teaching answer-question generation system combining T5 and SBERT
指導教授: 王惠嘉
Wang, Hei-Chia
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 55
中文關鍵詞: 遠距教學問句生成T5SBERT
外文關鍵詞: Distance learning, Question generation, T5, SBERT
相關次數: 點閱:81下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 當今世界科技飛速發展,創意和溝通的管道可以有更廣泛的方法去傳播,傳統的面授教育也產生重大革新。在2019年12月,Covid-19突然爆發,迫使許多學生轉移到遠距教學模式。遠距教學中的非同步課程可以讓學生自行管理上課進度,但老師會不容易得知學生是否專心於課堂上。因此本研究想要為非同步遠距教學的老師,自動生成一系列以非同步遠距上課內容為主的問句,作為輔助教學的工具之一,而欲生成課程問句需透過句子選擇與問句建構兩步驟。
    過往的問句生成的研究方法主要分為傳統基於規則的(Automatic Question Generation, AQG) 、基於神經網絡(Neural Question Generation, NQG)、以及近年來突然興起的預訓練問句生成。由於深度神經網路通常有大量參數,所以在沒有足夠訓練數據的情況下,容易擬合過度且泛化能力(generalization ability)較差,而預訓練問題生成可透過預訓練語言模型,僅需少量的樣本便可進行問題生成,且運算時間較短。然而現今的問句生成方法多為機器導向而非應用於教育領域,因此方法中會忽略句子選擇之步驟,但是生成課程問句時句子選擇應在其中扮演著重要的角色,必須選擇教材中對學習有幫助、值得出題的句子才有出題及學習的意義。因此本研究提出ADT-QG(Asynchronous Distance Teaching-Question Generation)模型,在模型架構中加入Sentences-BERT(SBERT)為了獲取相似度更高的句子來做問句生成,以及透過Text-to-Text Transformer (T5)模型來做答案-問句生成,並且配合Wiki語料庫生成選項,期望生成較流暢且符合教學內容之問句。
    最後本研究實驗發現,與其他做相似度計算的模型相比,SBERT能選擇出更有資訊且可問的相關句子。而跟其他問題生成模型相比加入透過使用T5模型的ADT-QG模型能達到較佳的表現,且在人工評估上都有達到5分以上的表現,更優於過往模型。

    With the rapid development of science and technology in today's world, the channels of creativity and communication can be spread in a wider range of methods, and traditional face-to-face education has also produced major innovations. In December 2019, the Covid-19 outbreak forced 1.57 billion students and teachers around the world to use distance learning. Therefore, this research wants to automatically generate a series of questions for asynchronous distance teaching as one of the tools to assist teaching.
    In recent years, pre-trained language models can be used to generate questions, and only a small number of samples can be used to generate questions without retraining the model. However, in the field of education sentence selection should play an important role in generating curriculum questions. It is necessary to select sentences in the teaching material that are helpful to learning and worthy of questions. Therefore, this study proposes the ADT-QG(Asynchronous Distance Teaching-Question Generation)model, adding SBERT to the model architecture to obtain sentences with higher similarity, and training the question generation through the T5 model.
    Finally, this research experiment found that compared with other models for similarity calculation, SBERT can select more informative and questionable related sentences. Compared with other question generation models, adding the ADT-QG model using the T5 model can achieve better performance, showing that the choice of the pre-training model really helps the model to grasp the direction of text generation.

    第1章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的 6 1.3 研究範圍與限制 7 1.4 研究流程 7 1.5 論文大綱 8 第2章 文獻探討 9 2.1 問題生成 9 2.1.1 自動問題生成(Automatic Question Generation, AQG) 9 2.1.2 類神經問題生成(Neural Question Generation, NQG) 9 2.1.3 預訓練問題生成 10 2.2 詞嵌入(Word Embedding) 11 2.2.1 Word2Vec 11 2.2.2 GloVe 13 2.2.3 Bidirectional Encoder Representations from Transformers, BERT 14 2.2.4 Sentence BERT, SBERT 14 2.3 深度學習 15 2.3.1 Sequence-to-Sequence 16 2.3.2 注意力機制(Attention Mechanism) 16 2.3.3 Transformer model 18 2.3.4 Transfer Text-to-Text Transformer, T5 18 2.4 語音轉文字 19 2.5 小結 20 第3章 研究架構 21 3.1 研究架構 21 3.2 資料前處理模組 22 3.2.1 語音識別模組 22 3.2.2 投影片前處理 23 3.2.3 語音過濾模組 24 3.3 答案生成模組 26 3.3.1 答案模組輸入格式 27 3.3.2 答案選擇預訓練模組 27 3.3.3 答案生成模組 31 3.3.4 選擇題選項生成模組 32 3.4 問句生成模組 33 3.4.1 問句生成輸入序列組成 33 3.4.2 問句預訓練模組 33 3.4.3 問句生成模組 34 3.5 小結 34 第4章 系統建置與驗證 36 4.1 系統環境建置 36 4.2 實驗方法 36 4.2.1 資料來源 36 4.2.2 實驗設計 37 4.2.3 評估指標 38 4.3 參數設定 41 4.3.1 參數一:問句生成模組的網路訓練參數 41 4.4 實驗結果與分析 41 4.4.1 實驗一:與其他語音過濾選擇方法的比較 41 4.4.2 實驗二:語音過濾相似度參數設定選擇 42 4.4.3 實驗三:以人工評估選擇題選項生成結果 43 4.4.4 實驗四:加入特徵標記對問句生成結果的影響 44 4.4.5 實驗五:與其他答案-問句生成模型的比較 45 4.4.6 實驗六:以人工評估ADT-QG模型問句生成結果 46 4.4.7 實驗七:人工評估ADT-QG模型生成之結果與課程的關聯性 47 第5章 結論與未來方向 48 5.1 研究成果 48 5.2 未來研究方向 49 參考文獻 51

    Agarwal, M., & Mannem, P. (2011). Automatic Gap-Fill Question Generation from Text Books. Paper presented at the the 6th Workshop on Innovative Use of NLP for Building Educational Applications, Portland, Oregon.
    Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural Machine Translation by Jointly Learning to Align and Translate. Paper presented at the arXiv preprint arXiv:1409.0473.
    Belkin, M., Hsu, D., Ma, S., & Mandal, S. (2019). Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences, 116(32), 15849-15854.
    Burke, L. A., & Ray, R. (2008). Re-setting the concentration levels of students in higher education: an exploratory study. Teaching in Higher Education, 13(5), 571-582.
    Chan, Y.-H., & Fan, Y.-C. (2019). A recurrent BERT-based model for question generation. Paper presented at the Proceedings of the 2nd Workshop on Machine Reading for Question Answering. Hong Kong, China.
    Chen, G., Yang, J., & Gasevic, D. (2019). A Comparative Study on Question-Worthy Sentence Selection Strategies for Educational Question Generation. Paper presented at the the International Conference on Artificial Intelligence in Education, Chicago, USA.
    Cheng, J., & Lapata, M. (Writers). (2016). Neural Summarization by Extracting Sentences and Words. In the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, Germany
    Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. arXiv preprint arXiv:1406.1078.
    d'Inverno, R., Davis, H., & White, S. (2003). Using a personal response system for promoting student interaction. Teaching Mathematics and Its Applications: International Journal of the IMA, 22(4), 163-169.
    Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
    Dhawan, S. (2020). Online learning: A panacea in the time of COVID-19 crisis. Journal of Educational Technology Systems, 49(1), 5-22.
    Dietrich, N., Kentheswaran, K., Ahmadi, A., Teychené, J., Bessière, Y., Alfenore, S., . . . Guigui, C. (2020). Attempts, successes, and failures of distance learning in the time of COVID-19. Journal of Chemical Education, 97(9), 2448-2457.
    Du, X., Shao, J., & Cardie, C. (2017). Learning to ask: Neural question generation for reading comprehension. arXiv preprint arXiv:1705.00106.
    Golub, D., Huang, P.-S., He, X., & Deng, L. (2017). Two-stage synthesis networks for transfer learning in machine comprehension. arXiv preprint arXiv:1706.09789.
    Heilman, M., & Smith, N. A. (2009). Question generation via overgenerating transformations and ranking. Retrieved from
    Heilman, M., & Smith, N. A. (2010). Good Question! Statistical Ranking for Question Generation. Paper presented at the the Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Los Angeles, California.
    Howard, J., & Ruder, S. (2018). Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146.
    Hrastinski, S. (2008). Asynchronous and synchronous e-learning. Educause quarterly, 31(4), 51-55.
    Huang, Y.-T., Tseng, Y.-M., Sun, Y. S., & Chen, M. C. (2014). TEDQuiz: automatic quiz generation for TED talks video clips to assess listening comprehension. Paper presented at the 2014 IEEE 14Th international conference on advanced learning technologies. Athens, Greece
    Kaplan, A. M., & Haenlein, M. (2016). Higher education and the digital revolution: About MOOCs, SPOCs, social media, and the Cookie Monster. Business Horizons, 59(4), 441-450.
    Kedia, A., Chinthakindi, S. C., Back, S., Lee, H., & Choo, J. (2019). ASGen: Answer-containing Sentence Generation to Pre-Train Question Generator for Scale-up Data in Question Answering. In International conference on learning representations. Addis Ababa, Ethiopia
    Kumar, G., Banchs, R., & D’Haro, L. F. (2015). Revup: Automatic Gap-Fill Question Generation from Educational Texts. Paper presented at the the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, Denver, Colorado.
    Kumar, V., Ramakrishnan, G., & Li, Y. F. (2018). A Framework for Automatic Question Generation from Text Using Deep Reinforcement Learning. arXiv preprint arXiv:1808.04961.
    Lai, G., Xie, Q., Liu, H., Yang, Y., & Hovy, E. (2017). Race: Large-scale reading comprehension dataset from examinations. arXiv preprint arXiv:1704.04683.
    Lauren, P., Qu, G., Yang, J., Watta, P., Huang, G.-B., & Lendasse, A. (2018). Generating Word Embeddings from an Extreme Learning Machine for Sentiment Analysis and Sequence Labeling Tasks. Cognitive Computation, 10(4), 625-638.
    Levy, O., Goldberg, Y., & Dagan, I. (2015). Improving Distributional Similarity with Lessons Learned from Word Embeddings. Transactions of the Association for Computational Linguistics, 3, 211-225.
    Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., . . . Zettlemoyer, L. (2019). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
    Lin, K.-Y. (2015). Evaluating the effect of a clicker in an information literacy course for college nursing students in Taiwan. CIN: Computers, Informatics, Nursing, 33(3), 115-121.
    Lindberg, D., Popowich, F., Nesbit, J., & Winne, P. (2013). Generating Natural Language Questions to Support Learning On-Line. Paper presented at the the 14th European Workshop on Natural Language Generation, Sofia, Bulgaria.
    Lindberg, D., Popowich, F., Nesbit, J., & Winne, P. (2013). Generating natural language questions to support learning on-line. Paper presented at the Proceedings of the 14th European Workshop on Natural Language Generation. Sofia, Bulgaria
    Liu, M., Calvo, R. A., Aditomo, A., & Pizzato, L. A. (2012). Using Wikipedia and Conceptual Graph Structures to Generate Questions for Academic Writing Support. IEEE Transactions on Learning Technologies, 5(3), 251-263.
    Liu, M., Rus, V., & Liu, L. (2017). Automatic Chinese Multiple Choice Question Generation Using Mixed Similarity Strategy. IEEE Transactions on Learning Technologies, 11(2), 193-202.
    Liu, S., Li, Z., Zhang, Y., & Cheng, X. (2019). Introduction of key problems in long-distance learning and training. Mobile Networks and Applications, 24(1), 1-4.
    Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
    Mitkov, R. (2003). Computer-aided generation of multiple-choice tests. Paper presented at the Proceedings of the HLT-NAACL 03 workshop on Building educational applications using natural language processing.USA.
    Mostow, J., & Jang, H. (2012). Generating diagnostic multiple choice comprehension cloze questions. Paper presented at the Proceedings of the Seventh Workshop on Building Educational Applications Using NLP. Montréal, Canada
    Naili, M., Chaibi, A. H., & Ghezala, H. H. B. (2017). Comparative Study of Word Embedding Methods in Topic Segmentation. Procedia Computer Science, 112, 340-349.
    Nema, P., & Khapra, M. M. (2018). Towards a Better Metric for Evaluating Question Generation Systems. Paper presented at the the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium
    Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global Vectors for Word Representation. Paper presented at the Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Doha, Qatar.
    Perveen, A. (2016). Synchronous and asynchronous e-language learning: A case study of virtual university of Pakistan. Open Praxis, 8(1), 21-39.
    Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365.
    Petzold, A. M. (2020). Letter to the Editor: Resources and recommendations for a quick transition to online instruction in physiology. Advances in physiology education, 44(2), 217-219.
    Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding with unsupervised learning.
    Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., . . . Liu, P. J. (2019). Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683.
    Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. (2016). Squad: 100,000+ Questions for Machine Comprehension of Text. Paper presented at the the Empirical Methods in Natural Language Processing (EMNLP), Austin, Texas, USA.
    Reimers, N., & Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
    Roh, H.-K., & Lee, K.-H. (2017). A Basic Performance Evaluation of the Speech Recognition APP of Standard Language and Dialect using Goolgle, Naver, and Daum KAKAO APIs. Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology, 7(12), 819-829.
    Schnabel, T., Labutov, I., Mimno, D., & Joachims, T. (2015). Evaluation methods for unsupervised word embeddings. Paper presented at the Proceedings of the 2015 conference on empirical methods in natural language processing. Lisbon, Portugal
    Spector, J. M., Merrill, M. D., Elen, J., & Bishop, M. J. (2014). Handbook of research on educational communications and technology: Springer.
    Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to Sequence Learning with Neural Networks. Paper presented at the the Advances in neural information processing systems, Montreal, Quebec, Canada.
    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., . . . Polosukhin, I. (2017). Attention is all you need. Paper presented at the Advances in neural information processing systems.
    Wang, W., Hao, T., & Liu, W. (2007). Automatic question generation for learning evaluation in medicine. Paper presented at the International conference on web-based learning. United Kingdom
    Wang, Z., Lan, A. S., Nie, W., Waters, A. E., Grimaldi, P. J., & Baraniuk, R. G. (2018). QG-Net: a Data-Driven Question Generation Model for Educational Content. Paper presented at the the Fifth Annual ACM Conference on Learning at Scale, London, United Kingdom.
    Xu, K., Zhang, M., Li, J., Du, S. S., Kawarabayashi, K.-i., & Jegelka, S. (2020). How neural networks extrapolate: From feedforward to graph neural networks. arXiv preprint arXiv:2009.11848.
    Yepes, A. J. (2017). Word embeddings and recurrent neural networks based on Long-Short Term Memory nodes in supervised biomedical word sense disambiguation. Journal of biomedical informatics, 73, 137-147.
    Young, T., Hazarika, D., Poria, S., & Cambria, E. (2018). Recent Trends in Deep Learning Based Natural Language Processing. ieee Computational intelligenCe magazine, 13(3), 55-75.
    Zhou, Q., Yang, N., Wei, F., Tan, C., Bao, H., & Zhou, M. (2017). Neural question generation from text: A preliminary study. Paper presented at the National CCF Conference on Natural Language Processing and Chinese Computing.Dalian,China.

    無法下載圖示 校內:2027-08-23公開
    校外:2027-08-23公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE