研究生: |
王孟萱 Wang, Meng-Hsuan |
---|---|
論文名稱: |
基於知識圖譜與Rag方法建構LLM解題系統:以論語為例 Constructing an LLM problem-solving system based on knowledge graph and Retrieval Augmented generation technology - taking the Analects of Confucius as an example |
指導教授: |
陳牧言
Chen, Mu-Yen |
學位類別: |
碩士 Master |
系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 中文 |
論文頁數: | 82 |
中文關鍵詞: | 大型語言模型 、檢索增強生成 、知識圖譜 、自動化詳解生成 |
外文關鍵詞: | Large language model, Retrieval-augmented generation, Knowledge graph, Automated detailed explanation generation |
相關次數: | 點閱:101 下載:53 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
錯題分析是學習中的重要環節,學生透過分析正確答案與詳解,能有效的理解錯誤並深入理解所學內容。然而,傳統的詳解生成仰賴於專業人士及其背景知識與經驗,這不僅耗費大大量時間與精力,且難以即時回應學生的需求。值得慶幸的是,隨著大型語言模型的迅速發展,智慧解題和詳解生成系統的迎來了新的可能性。
本研究旨在開發一個基於《論語》的數位解題系統,透過結合《論語》內容建構知識圖譜,為自動生成詳解提供準確的參考資料來源,本研究同時結合檢索增強生成方法,以期提高大型語言模型在生成過程中的準確性與有效性,進而解決傳統語言模型在文字生成中可能出現的準確性不足問題。
在實驗設計方面,本研究比較了不同語言模型在生成詳解時的表現,並透過多樣化的評估方法對生成結果進行了詳細的分析。實驗結果顯示,檢索增強生成方法在一定程度上改善了語言模型的回答品質,但改善效果受到多種原因影響,包括模型的規模、所提供的資料品質,以及輸入文本的設計等。尤其是當輸入文本設計不當時,檢索增強生成方法也可能對語言模型起到誤導的效果,進而影響生成結果的正確性。
本研究的結論顯示,儘管檢索增強生成方法對改善語言模型的生成能力有幫助,但想達到最佳的生成結果,仍有許多可以努力的空間。未來可以考慮對知識圖譜進行擴充,以增強RAG方法可提供的知識深度,或是對檢索增強生成方法提供的文本進行進一步的設計與優化等,以提高模型生成的準確性。
Error analysis is crucial for learning, as it helps students understand mistakes and gain deeper insights through correct answers and detailed explanations. Traditional methods for generating explanations rely on specialized professionals, which is time-consuming and slow to meet students’ needs. With advancements in large language models, intelligent problem-solving and explanation generation systems offer promising alternatives.
This study develops a digital problem-solving system based on the Analects of Confucius, constructing a knowledge graph to provide accurate references for automated explanations. A retrieval-augmented generation (RAG) method is also applied to enhance the accuracy and effectiveness of the language model, addressing potential deficiencies in traditional text generation.
Experimental results indicate that while RAG improves output quality, the degree of improvement depends on factors like model size, data quality, and input design. Poor input design may also mislead the model and affect accuracy.
The study concludes that while RAG enhances generation capabilities, significant potential for improvement remains. Future work could expand the knowledge graph to or optimize input text design to further improve accuracy.
[1] 張雅淳(2023, 12月)。108課綱爭議2/課本刪減文言文「學測卻大量入題」學生:像簽賭!一切靠運氣. 周刊王CTWANT新聞. 擷取自https://www.ctwant.com/article/304764?utm_source=yahoo&utm_medium=rss&utm_campaign=304764
[2] 羅梅英(2024, 1月)。113學測國文》文言文占比四成五熟讀古文易發揮 迷因、判決書、院線電影都入題. 聯合新聞網. 擷取自https://udn.com/news/story/6904/7730022
[3] Hasyim, S. (2004) “ERROR ANALYSIS in the TEACHING of ENGLISH”, k@ta, 4(1), pp. 62-74. doi: https://doi.org/10.9744/kata.4.1.62-74.
[4] Hattie, J., & Timperley, H. (2007). “The Power of Feedback. Review of Educational Research”, 77(1), 81-112. https://doi.org/10.3102/003465430298487
[5] Bandura A and Cervone D. Self-evaluation and self-efficacy mechanisms governing the motivational effects of goal systems. Journal of Personality and Social Psychology 1983;45:1017-1028
[6] Moreno R. Decreasing cognitive load for novice students: Effects of explanatory versus corrective feedback in discovery-based multimedia. Instructional Science 2004;32:99-11
[7] Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A Survey on Bias and Fairness in Machine Learning. ACM Computing Surveys, 54(6), Article 115. https://doi.org/10.1145/3457607
[8] OpenAI. (2023). GPT-4 Technical Report. arXiv preprint arXiv:2303.08774. https://arxiv.org/abs/2303.08774
[9] J. -Y. Lin, J. -Y. Huang and W. -P. Lee, "Question-Answer Generation for Data Augmentation," 2021 IEEE International Conferences on Internet of Things (iThings) and IEEE Green Computing & Communications (GreenCom) and IEEE Cyber, Physical & Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics), Melbourne, Australia, 2021, pp. 391-397, doi: 10.1109/iThings-GreenCom-CPSCom-SmartData-Cybermatics53846.2021.00069.
[10] Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading Wikipedia to Answer Open-Domain Questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1870–1879, Vancouver, Canada. Association for Computational Linguistics.
[11] Angelie Kraft and Eloïse Soulier. 2024. Knowledge-Enhanced Language Models Are Not Bias-Proof: Situated Knowledge and Epistemic Injustice in AI. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT '24). Association for Computing Machinery, New York, NY, USA, 1433–1445. https://doi.org/10.1145/3630106.3658981
[12] Google. (2012, May 16). Introducing the Knowledge Graph: things, not strings. Google Official Blog. Retrieved from https://blog.google/products/search/introducing-knowledge-graph-things-not/
[13] Wikipedia contributors. (2024, June 10). Semantic Web. In Wikipedia, The Free Encyclopedia. Retrieved 09:02, June 19, 2024, from https://en.wikipedia.org/w/index.php?title=Semantic_Web&oldid=1228239493
[14] Philipp Cimiano and Heiko Paulheim. (2017). Knowledge graph refinement: A survey of approaches and evaluation methods. Semant. web 8, 3 (2017), 489–508. https://doi.org/10.3233/SW-160218
[15] Belleau, F., Nolin, M. A., Tourigny, N., Rigault, P., & Morissette, J. (2008). Bio2RDF: towards a mashup to build bioinformatics knowledge systems. Journal of biomedical informatics, 41(5), 706–716. https://doi.org/10.1016/j.jbi.2008.03.004
[16] Wang, G., Wang, L., Liu, S., Shi, H., Pan, L. (2023). ICKG: An I Ching Knowledge Graph Tool Revealing Ancient Wisdom. In: Sun, Y., et al. Computer Supported Cooperative Work and Social Computing. ChineseCSCW 2022. Communications in Computer and Information Science, vol 1682. Springer, Singapore. https://doi.org/10.1007/978-981-99-2385-4_5
[17] Nickel, M., Murphy, K.P., Tresp, V., & Gabrilovich, E. (2015). A Review of Relational Machine Learning for Knowledge Graphs. Proceedings of the IEEE, 104, 11-33.
[18] Ji, S., Pan, S., Cambria, E., Marttinen, P., & Yu, P.S. (2020). “A Survey on Knowledge Graphs: Representation, Acquisition, and Applications.” IEEE Transactions on Neural Networks and Learning Systems, 33, 494-514.
[19] Cai, L., & Zhu, Y. (2015). The Challenges of Data Quality and Data Quality Assessment in the Big Data Era. Data Science Journal, 14(0), 2. https://doi.org/10.5334/dsj-2015-002
[20] Nadeau, D. & Sekine, S. (2007). A survey of named entity recognition and classification. Lingvisticae Investigationes, 30, 3--26. doi: 10.1075/li.30.1.03nad
[21] Côté, Pierre-Olivier & Nikanjam, Amin & Ahmed, Nafisa & Humeniuk, Dmytro & Khomh, Foutse. (2024). Data cleaning and machine learning: a systematic literature review. Automated Software Engineering. 31. 10.1007/s10515-024-00453-w.
[22] Mintz, Mike & Bills, Steven & Snow, Rion & Jurafsky, Daniel. (2009). Distant supervision for relation extraction without labeled data. 2. 1003-1011. 10.3115/1690219.1690287.
[23] Zhang, Z., Zhan, S., Zhang, H. et al. Joint model of entity recognition and relation extraction based on artificial neural network. J Ambient Intell Human Comput 13, 3503–3511 (2022). https://doi.org/10.1007/s12652-020-01949-5
[24] Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. (2014). Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '14). Association for Computing Machinery, New York, NY, USA, 601–610. https://doi.org/10.1145/2623330.2623623
[25] Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and Wei-Ying Ma. 2016. Collaborative Knowledge Base Embedding for Recommender Systems. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). Association for Computing Machinery, New York, NY, USA, 353–362. https://doi.org/10.1145/2939672.2939673
[26] Huidong Wu, Yanpeng Chang, Jianping Li, Xiaoqian Zhu. (2022). Financial fraud risk analysis based on audit information knowledge graph, Procedia Computer Science, Volume 199, 2022, Pages 780-787, ISSN 1877-0509, https://doi.org/10.1016/j.procs.2022.01.097.
[27] Meiyao Tao, Shanshan Gao, Deqian Mao, Hong Huang. (2022). Knowledge graph and deep learning combined with a stock price prediction network focusing on related stocks and mutation points, Journal of King Saud University - Computer and Information Sciences, Volume 34, Issue 7, 2022, Pages 4322-4334, ISSN 1319-1578, https://doi.org/10.1016/j.jksuci.2022.05.014.
[28] Albert Gatt and Emiel Krahmer. (2018). Survey of the state of the art in natural language generation: core tasks, applications and evaluation. J. Artif. Int. Res. 61, 1 (January 2018), 65–170.
[29] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in Neural Information Processing Systems (Vol. 30). Curran Associates, Inc. Retrieved from https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
[30] Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. “Advances in Neural Information Processing Systems”, 33, 1877-1901. Retrieved from https://arxiv.org/abs/2005.14165
[31] Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. North American Chapter of the Association for Computational Linguistics.
[32] Minaee, S., Mikolov, T., Nikzad, N., Chenaghlu, M., Socher, R., Amatriain, X., & Gao, J. (2024). Large Language Models: A Survey. arXiv preprint arXiv:2402.06196.
[33] Jin, H., Zhang, Y., Meng, D., Wang, J., & Tan, J. (2024). A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods. arXiv preprint arXiv:2403.02901.
[34] Qu, Y., Liu, P., Song, W., Liu, L., & Cheng, M. (2020). A Text Generation and Prediction System: Pre-training on New Corpora Using BERT and GPT-2. In 2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC) (pp. 323-326). doi:10.1109/ICEIEC49280.2020.9152352
[35] Sophia, J. J., & Jacob, T. P. (2024). Beyond chat-GPT: A BERT-AO approach to custom question answering system. Multimedia Tools and Applications. Advance online publication. https://doi.org/10.1007/s11042-024-19474-4
[36] Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and Policy Considerations for Deep Learning in . arXiv preprint arXiv:1906.02243.
[37] Nadeem, M., Bethke, A., & Reddy, S. (2021). StereoSet: Measuring stereotypical bias in pretrained language models. In C. Zong, F. Xia, W. Li, & R. Navigli (Eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 5356-5371). Association for Computational Linguistics. https://aclanthology.org/2021.acl-long.416
[38] Praveenraj, D., Victor, M., Vennila, C., Alawadi, A., Diyora, P., Vasudevan, N., & Avudaiappan, T. (2023). Exploring explainable artificial intelligence for transparent decision making. E3S Web of Conferences, 399. https://doi.org/10.1051/e3sconf/202339904030
[39] Yao, J.-Y., Ning, K.-P., Liu, Z.-H., Ning, M.-N., & Yuan, L. (2023). LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples. arXiv preprint arXiv:2310.01469.
[40] Martino, A., Iannelli, M., Truong, C. (2023). Knowledge Injection to Counter Large Language Model (LLM) Hallucination. In: Pesquita, C., et al. The Semantic Web: ESWC 2023 Satellite Events. ESWC 2023. Lecture Notes in Computer Science, vol 13998. Springer, Cham. https://doi.org/10.1007/978-3-031-43458-7_34
[41] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., Riedel, S., & Kiela, D. (2021). Retrieval-augmented generation for knowledge-intensive NLP tasks. arXiv. https://arxiv.org/abs/2005.11401
[42] Bratanič, T. (2023, June 6). Retrieval-augmented generation [Image]. Neo4j Developer Blog. https://neo4j.com/developer-blog/fine-tuning-retrieval-augmented-generation/
[43] Egger, R. (2022). Text Representations and Word Embeddings. In: Egger, R. (eds) Applied Data Science in Tourism. Tourism on the Verge. Springer, Cham. https://doi.org/10.1007/978-3-030-88389-8_16
[44] Fu, Z., Tan, X., Peng, N., Zhao, D., & Yan, R. (2017). Style Transfer in Text: Exploration and Evaluation. arXiv preprint arXiv:1711.06861.
[45] Kulkarni, M., Tangarajan, P., Kim, K., & Trivedi, A. (2024). Reinforcement learning for optimizing RAG for domain chatbots. arXiv preprint arXiv:2401.06800. Retrieved from https://arxiv.org/abs/2401.06800
[46] Matsumoto, N., Moran, J., Choi, H., Hernandez, M. E., Venkatesan, M., Wang, P., ... Moore, J. H. (2024). KRAGEN: A knowledge graph-enhanced RAG framework for biomedical problem solving using large language models. Bioinformatics, 40(6), btae353. https://doi.org/10.1093/bioinformatics/btae353
[47] Hu, Y., Lei, Z., Zhang, Z., Pan, B., Ling, C., & Zhao, L. (2024). GRAG: Graph Retrieval-Augmented Generation. arXiv preprint arXiv:2405.16506. Retrieved from https://arxiv.org/abs/2405.16506
[48] 教育部. (2021). 重編國語辭典修訂本. 國家教育研究院. https://dict.revised.moe.edu.tw/
[49] 《論語》衛靈公第十五. (n.d.).《論語》Analects. Retrieved August 5, 2024, from https://www.ifreesite.com/scriptures/analects-15.htm
[50] [題庫][國中國文][課文][論語詩選] BLOCK 學習網. (n.d.). BLOCK 學習網. Retrieved August 14, 2024, from https://www.block.tw/question/74/501003?page=1
[51] 財團法人國家實驗研究院. (2024). Llama3-TAIDE-LX-8B-Chat-Alpha1. Hugging Face. Retrieved from https://huggingface.co/taide/Llama3-TAIDE-LX-8B-Chat-Alpha1
[52] OpenAI. (2023). Introducing GPT-3.5: The Next Generation of Language Models. OpenAI Blog. Retrieved from https://openai.com/blog/
[53] Lin, C. Y. (2004, July). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out (pp. 74-81).