| 研究生: |
王博慶 Wang, Bo-Ching |
|---|---|
| 論文名稱: |
融合案例推理與人類回饋強化學習之檢索增強生成架構研究 A Retrieval-Augmented Generation Framework Integrating Case-Based Reasoning and Reinforcement Learning from Human Feedback |
| 指導教授: |
李昇暾
Li, Sheng-Tun |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 中文 |
| 論文頁數: | 63 |
| 中文關鍵詞: | 檢索增強生成 、根據人類回饋的強化學習 、案例式推理 |
| 外文關鍵詞: | Retrieval-Augmented Generation, Reinforcement Learning from Human Feedback, Case-Based Reasoning |
| 相關次數: | 點閱:12 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著大型語言模型(Large Language Models, LLM)在自然語言處理領域的廣泛應用,其於問答生成、知識回應與語意推理等任務中展現出卓越能力。然而,在醫療、法律等高精度領域中,LLM 所產生之「幻覺現象」(hallucination)可能導致回應不符事實或缺乏依據,進而影響使用者決策準確性。為降低此風險,檢索增強生成 (Retrieval-Augmented Generation, RAG)架構被提出,透過整合外部知識檢索以輔助生成內容,有效提升回應的事實性與一致性。但現有 RAG 系統仍普遍缺乏對歷史問答資料的重用能力與生成品質的自我調整機制。
本研究提出一套融合案例式推理(Case-Based Reasoning, CBR)與根據人類回饋的強化學習(Reinforcement Learning from Human Feedback, RLHF)之模組化 RAG 架構,稱為 RLHF-CBR-RAG。透過語意索引與案例擴充,系統能調用具語境對齊性的歷史問答範例強化回應基底;同時結合模擬回饋訓練機制,將回應內容轉化為多維品質分數並作為獎勵信號,引導策略更新,實現自我優化與知識記憶擴充能力。
為驗證本架構之效能,實驗採用汽車維修、醫療診斷與勞基法三組特定領域資料集,進行十摺交叉驗證與模組消融實驗,並透過 RAGAS faithfulness 與 BERTScore 等多元指標進行評估。結果顯示,RLHF-CBR-RAG 在事實性與語境一致性上皆優於傳統 RAG 變體,並於資料稀疏場景中展現出明顯的案例重用效益。進一步的 Spearman 分析亦指出,語意相似性與實際品質間存在顯著落差,驗證本研究採用 LLM-as-judge 與 faithfulness 指標進行品質評估的必要性與正確性。
本研究所提出之 RLHF-CBR-RAG 架構,具備動態學習、人類回饋對齊與案例記憶重用能力,不僅提升生成內容的可信度與專業性,更奠定未來高可靠問答系統於特定領域實務應用中的發展基礎。
This study proposes a modular RAG framework, RLHF-CBR-RAG, integrating Case-Based Reasoning (CBR) and Reinforcement Learning from Human Feedback (RLHF) to address knowledge misalignment and hallucination in domain-specific QA. It enhances contextual grounding by retrieving semantically relevant QA cases and optimizes generation via feedback-based reward signals without hand-crafted reward functions. The system supports dynamic learning and knowledge reuse for better query adaptation. Experiments on automotive repair, medical QA, and labor law datasets show improved factual consistency (RAGAS), despite slightly lower BERTScore. Results highlight that semantic similarity doesn't always indicate factual accuracy. RLHF-CBR-RAG offers scalable, trustworthy QA for specialized domains.
Aamodt, A., & Plaza, E. (1994). Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI communications, 7(1), 39-59.
Askell, A., Bai, Y., Chen, A., Drain, D., Ganguli, D., Henighan, T., Jones, A., Joseph, N., Mann, B., & DasSarma, N. (2021). A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861.
Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., & McKinnon, C. (2022). Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073.
Cai, T., Tan, Z., Song, X., Sun, T., Jiang, J., Xu, Y., Zhang, Y., & Gu, J. (2024). FoRAG: Factuality-optimized Retrieval Augmented Generation for Web-enhanced Long-form Question Answering. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,
Casper, S., Davies, X., Shi, C., Krendl Gilbert, T., Scheurer, J., Rando Ramirez, J., Freedman, R., Korbak, T., Lindner, D., & Freire, P. (2023). Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback. Transactions on Machine Learning Research.
Ding, H., Pang, L., Wei, Z., Shen, H., & Cheng, X. (2024). Retrieve only when it needs: Adaptive retrieval augmentation for hallucination mitigation in large language models. arXiv preprint arXiv:2402.10612.
Es, S., James, J., Anke, L. E., & Schockaert, S. (2024). Ragas: Automated evaluation of retrieval augmented generation. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations,
Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., & Wang, H. (2023). Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997.
Guan, L., Huang, Y., & Liu, J. (2025). Biomedical Question Answering via Multi-Level Summarization on a Local Knowledge Graph. arXiv preprint arXiv:2504.01309.
Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., & Qin, B. (2025). A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems, 43(2), 1-55.
Ji, Y., Li, Z., Meng, R., Sivarajkumar, S., Wang, Y., Yu, Z., Ji, H., Han, Y., Zeng, H., & He, D. (2024). RAG-RLRC-LaySum at BioLaySumm: Integrating Retrieval-Augmented Generation and Readability Control for Layman Summarization of Biomedical Texts. Proceedings of the 23rd Workshop on Biomedical Natural Language Processing,
Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y. J., Madotto, A., & Fung, P. (2023). Survey of hallucination in natural language generation. ACM computing surveys, 55(12), 1-38.
Jin, X., & Wang, Y. (2023). Understand legal documents with contextualized large language models. arXiv preprint arXiv:2303.12135.
Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255-260.
Kim, J., & Min, M. (2024). From rag to qa-rag: Integrating generative ai for pharmaceutical regulatory compliance process. arXiv preprint arXiv:2402.01717.
Kolodner, J. L. (1992). An introduction to case-based reasoning. Artificial intelligence review, 6(1), 3-34.
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., & Rocktäschel, T. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.
Liu, Y., Peng, X., Zhang, X., Liu, W., Yin, J., Cao, J., & Du, T. (2024). RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback. Findings of the Association for Computational Linguistics ACL 2024,
Murphy, K. P. (2012). Machine learning: a probabilistic perspective. MIT press.
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., & Ray, A. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730-27744.
Rafailov, R., Sharma, A., Mitchell, E., Manning, C. D., Ermon, S., & Finn, C. (2024). Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36.
Reddy, S. (2022). Automating human evaluation of dialogue systems. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop,
Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP),
Sharma, S., Yoon, D. S., Dernoncourt, F., Sultania, D., Bagga, K., Zhang, M., Bui, T., & Kotte, V. (2024). Retrieval augmented generation for domain-specific question answering. arXiv preprint arXiv:2404.14760.
Sutton, R. S. (2018). Reinforcement learning: An introduction. A Bradford Book.
Wiratunga, N., Abeyratne, R., Jayawardena, L., Martin, K., Massie, S., Nkisi-Orji, I., Weerasinghe, R., Liret, A., & Fleisch, B. (2024). CBR-RAG: Case-based reasoning for retrieval augmented generation in LLMs for legal question answering. International Conference on Case-Based Reasoning,
Yang, R. (2024). CaseGPT: A case reasoning framework based on language models and retrieval-augmented generation. arXiv preprint arXiv:2407.07913.
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. BERTScore: Evaluating Text Generation with BERT. International Conference on Learning Representations,
Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., & Xing, E. (2023). Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems, 36, 46595-46623.
校內:2030-06-16公開