| 研究生: |
吳承翰 Wu, Cheng-Han |
|---|---|
| 論文名稱: |
結合多種群基因演算法改善大型語言模型問答系統生成品質之研究 A Study on Improving the Generation Quality of Large Language Model in Question-Answering Systems Using Multi-population Genetic Algorithm |
| 指導教授: |
劉任修
Liu, Ren-Shiou |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 中文 |
| 論文頁數: | 73 |
| 中文關鍵詞: | 多種群基因演算法 、提示工程 、幻覺問題 、檢索增強生成 |
| 外文關鍵詞: | Multi-Population Genetic Algorithm, Prompt Engineering, Hallucination, Retrieval-Augmented Generation |
| 相關次數: | 點閱:566 下載:5 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
當生成式大型語言模型(Large Language Models, LLMs)應用於問答系統時,可能產生與事實不符的答案,亦即「幻覺」問題,對系統的可靠性與實用性構成潛在威脅。為解決此問題,現有研究提出多種進階的檢索增強生成(Retrieval-Augmented Generation, RAG)技術,透過整合外部知識資源以提升答案的正確性。然而,RAG 仍受到多項因素限制,例如:語意向量檢索方式可能導致原始文本中的關鍵資訊流失;使用者提問若過於籠統或模糊,將導致檢索出無關的資訊;針對複雜任務,單次檢索策略難以模擬人類推理的反覆試探過程,進而限制了答案的準確性與深度。
因此,本研究提出一種結合多種群基因演算法(Multi-Population Genetic Algorithm, MPGA)的問答推演方法,目的是提升問答系統中生成答案的品質。該方法主要分為四個部分:第一部分透過多頭自注意力(Multi-head Self-Attention)機制捕捉文本內容中潛在的多面向含意;第二部分以餘弦相似度為基準,並搭配 $K$-近鄰演算法($K$-Nearest Neighbor, $K$-NN),從向量資料庫中檢索出與問題高度相關的文本片段;第三部分透過 MPGA 對檢索結果進行優化,強化作為模型生成依據的上下文品質;第四部分利用提示指令(Prompt)將問題與優化後的上下文整合,輸入至語言模型生成最終答案。
實驗結果顯示,本研究方法在 MS MARCO 資料集上展現出優異的表現。透過有效運用多頭自注意力機制中不同自注意力頭(Self-Attention Heads)的輸出,以捕捉文本內容的多重語意特徵,提升模型輸出答案內容的準確性和涵蓋範圍。且在 ROUGE 評估指標以及人工評估中,均優於 Base Model,展現出本方法良好的實務應用潛力。
Existing Retrieval-Augmented Generation methods remain insufficient in handling hallucination due to several underlying limitations. First, semantic vector-based retrieval may lead to the loss of critical information from the original text. Second, ambiguous queries can result in the retrieval of unhelpful content. Third, the single-round retrieval strategy cannot simulate the human reasoning process. Therefore, we propose a novel answer reasoning framework that integrates Multi-Population Genetic Algorithms to enhance the quality of responses. The framework first leverages a multi-head self-attention mechanism to capture diverse semantic aspects of the input text. Subsequently, relevant passages retrieved from multiple semantic perspectives are iteratively refined through the optimization process, making them more suitable as contexts for the model. Experimental results demonstrate that our approach effectively exploits semantic diversity across attention heads, leading to responses with improved diversity, completeness, and factual accuracy.
Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E. D., Gutierrez, J. B., and Kochut, K. (2017). Text summarization techniques: A brief survey. International Journal of Advanced Computer Science and Applications, 8(10).
Asai, A., Wu, Z., Wang, Y., Sil, A., and Hajishirzi, H. (2024). Self-RAG: Learning to retrieve, generate, and critique through self-reflection. In The Twelfth International Conference on Learning Representations (ICLR 2024).
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186.
Dhuliawala, S., Komeili, M., Xu, J., Raileanu, R., Li, X., Celikyilmaz, A., and Weston, J. (2024). Chain-of-verification reduces hallucination in large language models. In Findings of the Association for Computational Linguistics: ACL 2024, pages 3563–3578. Association for Computational Linguistics.
Diao, S., Wang, P., Lin, Y., Pan, R., Liu, X., and Zhang, T. (2024). Active prompting with chain-of-thought for large language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1330–1350. Association for Computational Linguistics.
Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., Truitt, S., and Larson, J. (2024). From local to global: A graph rag approach to query-focused summarization. ArXiv, abs/2404.16130.
Gao, L., Ma, X., Lin, J., and Callan, J. (2023). Precise zero-shot dense retrieval without relevance labels. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1762–1777. Association for Computational Linguistics.
Ge, Y., Hua, W., Mei, K., Ji, J., Tan, J., Xu, S., Li, Z., and Zhang, Y. (2023). Openagi: When llm meets domain experts. In Advances in Neural Information Processing Systems (NeurIPS).
Glass, M., Rossiello, G., Chowdhury, M. F. M., Naik, A., Cai, P., and Gliozzo, A. (2022). Re2G: Retrieve, rerank, generate. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2701–2715. Association for Computational Linguistics.
Guo, Q., Wang, R., Guo, J., Li, B., Song, K., Tan, X., Liu, G., Bian, J., and Yang, Y. (2024a). Connecting large language models with evolutionary algorithms yields powerful prompt optimizers. In The Twelfth International Conference on Learning Representations (ICLR 2024).
Guo, Y., Liang, Y., Wu, C., Wu, W., Zhao, D., and Duan, N. (2024b). Learning to plan by updating natural language. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 10062–10098. Association for Computational Linguistics.
Gupta, S., Ranjan, R., and Singh, S. N. (2024). A comprehensive survey of retrieval-augmented generation (rag): Evolution, current landscape and future directions. ArXiv, abs/2410.12837.
Hinton, G. E., Vinyals, O., and Dean, J. (2015). Distilling the knowledge in a neural network. ArXiv, abs/1503.02531.
Jeong, M., Sohn, J., Sung, M., and Kang, J. (2024). Improving medical reasoning through retrieval and self-reflection with retrieval-augmented large language models. Bioinformatics, 40(Supplement 1):i119–i129.
Jiang, A. Q., Sablayrolles, A., Roux, A., Mensch, A., Savary, B., Bamford, C., Chaplot, D. S., de Las Casas, D., Hanna, E. B., Bressand, F., Lengyel, G., Bour, G., Lample, G., Lavaud, L. R., Saulnier, L., Lachaux, M.-A., Stock, P., Subramanian, S., Yang, S., Antoniak, S., Scao, T. L., Gervet, T., Lavril, T., Wang, T., Lacroix, T., and Sayed, W. E. (2024). Mixtral of experts. ArXiv, abs/2401.04088.
Jiang, Z., Xu, F. F., Araki, J., and Neubig, G. (2020). How can we know what language models know? Transactions of the Association for Computational Linguistics, 8(16):423–438.
Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., andYih, W.-t. (2020). Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781. Association for Computational Linguistics.
Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., and Iwasawa, Y. (2022). Large language models are zero-shot reasoners. ArXiv, abs/2205.11916.
Kulkarni, H., Goharian, N., Frieder, O., and MacAvaney, S. (2024). Genetic approach to mitigate hallucination in generative IR. In The Second Workshop on Generative Information Retrieval.
Kulkarni, H., Young, Z., Goharian, N., Frieder, O., and MacAvaney, S. (2023). Genetic generative information retrieval. In Proceedings of the ACM Symposium on Document Engineering 2023. Association for Computing Machinery.
Levine, Y., Dalmedigos, I., Ram, O., Zeldes, Y., Jannai, D., Muhlgay, D., Osin, Y., Lieber, O., Lenz, B., Shalev-Shwartz, S., Shashua, A., Leyton-Brown, K., and Shoham, Y. (2022). Standing on the shoulders of giant frozen language models. ArXiv, abs/2204.10019.
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., K¨ uttler, H., Lewis, M., Yih, W.-t., Rockt¨aschel, T., Riedel, S., and Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, page 16.
Lin, C.-Y. (2004). ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81. Association for Computational Linguistics.
Lipowski, A. and Lipowska, D. (2012). Roulette-wheel selection via stochastic acceptance. Physica A: Statistical Mechanics and its Applications, 391(6):2193–2196.
Manning, C. D., Raghavan, P., and Sch¨ utze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.
Nguyen, T., Rosenberg, M., Song, X., Gao, J., Tiwary, S., Majumder, R., and Deng, L. (2016). MS MARCO: A human generated machine reading comprehension dataset. Computing Research Repository (CoRR), abs/1611.09268.
Pandya, K. and Holia, M. S. (2023). Automating customer service using langchain: Building custom open-source gpt chatbot for organizations. ArXiv, abs/2310.05421.
Pradeep, R., Liu, Y., Zhang, X., Li, Y., Yates, A., and Lin, J. (2022). Squeezing water from a stone: A bag of tricks for further improving cross-encoder effectiveness for reranking. In Advances in Information Retrieval: 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part I, page 655–670. Springer-Verlag.
Prasad, A., Hase, P., Zhou, X., and Bansal, M. (2023). GrIPS: Gradient-free, edit-based instruction search for prompting large language models. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 3845–3864. Association for Computational Linguistics.
Pryzant, R., Iter, D., Li, J., Lee, Y., Zhu, C., and Zeng, M. (2023). Automatic prompt optimization with “gradient descent” and beam search. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7957-7968. Association for Computational Linguistics.
Rackauckas, Z. (2024). Rag-fusion: A new take on retrieval augmented generation. International Journal on Natural Language Computing (IJNLC), 13(11):37–47.
Radford, A., Narasimhan, K., Salimans, T., and Sutskever, I. (2018). Improving language understanding by generative pre-training.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(1):67.
Robertson, S. and Zaragoza, H. (2009). The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr., 3(4):333–389.
Sarthi, P., Abdullah, S., Tuli, A., Khanna, S., Goldie, A., and Manning, C. D. (2024). RAPTOR: Recursive abstractive processing for tree-organized retrieval. In The Twelfth International Conference on Learning Representations (ICLR 2024).
Shao, Z., Gong, Y., Shen, Y., Huang, M., Duan, N., and Chen, W. (2023). Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 9248–9274. Association for Computational Linguistics.
Trivedi, H., Balasubramanian, N., Khot, T., and Sabharwal, A. (2023). Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10014–10037. Association for Computational Linguistics.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Wang, W., Bao, H., Huang, S., Dong, L., and Wei, F. (2021). MiniLMv2: Multi-head self-attention relation distillation for compressing pretrained transformers. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 2140–2151. Association for Computational Linguistics.
Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E. H., and Zhou, D. (2022). Self-consistency improves chain of thought reasoning in language models. ArXiv, abs/2203.11171.
Wei, J., Wang, X., Schuurmans, D., Bosma, M., hsin Chi, E. H., Xia, F., Le, Q., and Zhou, D. (2022). Chain of thought prompting elicits reasoning in large language models. ArXiv, abs/2201.11903.
Wiratunga, N., Abeyratne, R., Jayawardena, L., Martin, K., Massie, S., Nkisi-Orji, I., Weerasinghe, R., Liret, A., and Fleisch, B. (2024). CBR-RAG: Case-based reasoning for retrieval augmented generation in llms for legal question answering. In International Conference on Case-Based Reasoning, pages 445–460. Springer.
Wu, J., Chang, C.-C., Yu, T., He, Z., Wang, J., Hou, Y., and McAuley, J. (2024). CoRAL: Collaborative retrieval-augmented large language models improve long-tail recommendation. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, page 3391–3401. Association for Computing Machinery.
Xu, Z., Cruz, M. J., Guevara, M., Wang, T., Deshpande, M., Wang, X., and Li, Z. (2024). Retrieval-augmented generation with knowledge graphs for customer service question answering. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024, page 2905–2909. ACM.
Yan, S.-Q., Gu, J.-C., Zhu, Y., and Ling, Z.-H. (2024). Corrective Retrieval Augmented Generation. arXiv preprint arXiv:2401.15884.
Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W., Salakhutdinov, R., and Manning, C. D. (2018). HotpotQA: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369–2380.
Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L., Cao, Y., and Narasimhan, K. (2023). Tree of thoughts: deliberate problem solving with large language models. In Proceedings of the 37th International Conference on Neural Information Processing Systems. Curran Associates Inc.
Yu, W., Iter, D., Wang, S., Xu, Y., Ju, M., Sanyal, S., Zhu, C., Zeng, M., and Jiang, M. (2023). Generate rather than retrieve: Large language models are strong context generators. In International Conference for Learning Representation (ICLR 2023).
Zhang, W., Deng, Y., Liu, B., Pan, S., and Bing, L. (2024). Sentiment analysis in the era of large language models: A reality check. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 3881–3906. Association for Computational Linguistics.
Zhang, Y., Li, Y., Cui, L., Cai, D., Liu, L., Fu, T., Huang, X., Zhao, E., Zhang, Y., Chen, Y., Wang, L., Luu, A. T., Bi, W., Shi, F., and Shi, S. (2023a). Siren’s song in the ai ocean: A survey on hallucination in large language models. ArXiv, abs/2309.01219.
Zhang, Z., Zhang, A., Li, M., and Smola, A. (2023b). Automatic chain of thought prompting in large language models. In The Eleventh International Conference on Learning Representations (ICLR 2023).
Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., and Ba, J. (2022). Large language models are human-level prompt engineers. ArXiv, abs/2211.01910.