簡易檢索 / 詳目顯示

研究生: 王博慶
Wang, Bo-Ching
論文名稱: 融合案例推理與人類回饋強化學習之檢索增強生成架構研究
A Retrieval-Augmented Generation Framework Integrating Case-Based Reasoning and Reinforcement Learning from Human Feedback
指導教授: 李昇暾
Li, Sheng-Tun
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 63
中文關鍵詞: 檢索增強生成根據人類回饋的強化學習案例式推理
外文關鍵詞: Retrieval-Augmented Generation, Reinforcement Learning from Human Feedback, Case-Based Reasoning
相關次數: 點閱:12下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著大型語言模型(Large Language Models, LLM)在自然語言處理領域的廣泛應用,其於問答生成、知識回應與語意推理等任務中展現出卓越能力。然而,在醫療、法律等高精度領域中,LLM 所產生之「幻覺現象」(hallucination)可能導致回應不符事實或缺乏依據,進而影響使用者決策準確性。為降低此風險,檢索增強生成 (Retrieval-Augmented Generation, RAG)架構被提出,透過整合外部知識檢索以輔助生成內容,有效提升回應的事實性與一致性。但現有 RAG 系統仍普遍缺乏對歷史問答資料的重用能力與生成品質的自我調整機制。
    本研究提出一套融合案例式推理(Case-Based Reasoning, CBR)與根據人類回饋的強化學習(Reinforcement Learning from Human Feedback, RLHF)之模組化 RAG 架構,稱為 RLHF-CBR-RAG。透過語意索引與案例擴充,系統能調用具語境對齊性的歷史問答範例強化回應基底;同時結合模擬回饋訓練機制,將回應內容轉化為多維品質分數並作為獎勵信號,引導策略更新,實現自我優化與知識記憶擴充能力。
    為驗證本架構之效能,實驗採用汽車維修、醫療診斷與勞基法三組特定領域資料集,進行十摺交叉驗證與模組消融實驗,並透過 RAGAS faithfulness 與 BERTScore 等多元指標進行評估。結果顯示,RLHF-CBR-RAG 在事實性與語境一致性上皆優於傳統 RAG 變體,並於資料稀疏場景中展現出明顯的案例重用效益。進一步的 Spearman 分析亦指出,語意相似性與實際品質間存在顯著落差,驗證本研究採用 LLM-as-judge 與 faithfulness 指標進行品質評估的必要性與正確性。
    本研究所提出之 RLHF-CBR-RAG 架構,具備動態學習、人類回饋對齊與案例記憶重用能力,不僅提升生成內容的可信度與專業性,更奠定未來高可靠問答系統於特定領域實務應用中的發展基礎。

    This study proposes a modular RAG framework, RLHF-CBR-RAG, integrating Case-Based Reasoning (CBR) and Reinforcement Learning from Human Feedback (RLHF) to address knowledge misalignment and hallucination in domain-specific QA. It enhances contextual grounding by retrieving semantically relevant QA cases and optimizes generation via feedback-based reward signals without hand-crafted reward functions. The system supports dynamic learning and knowledge reuse for better query adaptation. Experiments on automotive repair, medical QA, and labor law datasets show improved factual consistency (RAGAS), despite slightly lower BERTScore. Results highlight that semantic similarity doesn't always indicate factual accuracy. RLHF-CBR-RAG offers scalable, trustworthy QA for specialized domains.

    摘要 I SUMMARY II 致謝 VI 目錄 VIII 圖目錄 XI 表目錄 XII 第一章 緒論 1 1.1 研究背景 1 1.2 研究動機 2 1.3 研究架構 2 第二章 文獻回顧 4 2.1 特定領域問答 4 2.1.1 特定領域問答的應用背景與挑戰 4 2.1.2特定領域問答結合機器學習之應用 5 2.2 檢索增強生成 5 2.2.1 檢索增強生成基本架構 6 2.2.2 進階檢索增強生成與模組化檢索增強生成 7 2.3 案例式推理應用於檢索增強生成 8 2.4 根據人類回饋的強化學習 9 2.4.1 機器學習 9 2.4.2 強化學習 10 2.4.3 根據人類回饋的強化學習 10 2.4.4 近端策略優化 11 2.5 小結 12 第三章 研究方法 14 3.1 問題定義 14 3.2 研究框架 15 3.2.1資料索引 16 3.2.2 RAG流程 16 3.2.3 回饋訓練 17 3.2.4 RLHF-CBR-RAG演算法 19 3.3 RLHF框架 20 3.3.1 策略的生成與回報 20 3.3.2 計算獎勵信號 21 3.3.3 策略優化目標 21 3.3.4 策略更新 22 3.4 符號總表 23 第四章 實驗結果與分析 24 4.1 資料集和實驗流程 24 4.1.1 資料集描述 24 4.1.2實驗流程 26 4.2 實驗設定 27 4.2.1 超參數設置 27 4.2.2 實驗對照 30 4.3 評估指標 31 4.3.1 BERTScore 31 4.3.2 RAGAS可信度指標 33 4.3.3 Spearman等級相關係數 34 4.4 實驗結果與分析 34 4.4.1 模型效能與指標關聯性分析 35 4.4.2 消融實驗分析 40 第五章 結論 43 5.1 結論與貢獻 43 5.2 研究限制與未來展望 44 參考文獻 46

    Aamodt, A., & Plaza, E. (1994). Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI communications, 7(1), 39-59.
    Askell, A., Bai, Y., Chen, A., Drain, D., Ganguli, D., Henighan, T., Jones, A., Joseph, N., Mann, B., & DasSarma, N. (2021). A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861.
    Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., & McKinnon, C. (2022). Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073.
    Cai, T., Tan, Z., Song, X., Sun, T., Jiang, J., Xu, Y., Zhang, Y., & Gu, J. (2024). FoRAG: Factuality-optimized Retrieval Augmented Generation for Web-enhanced Long-form Question Answering. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,
    Casper, S., Davies, X., Shi, C., Krendl Gilbert, T., Scheurer, J., Rando Ramirez, J., Freedman, R., Korbak, T., Lindner, D., & Freire, P. (2023). Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback. Transactions on Machine Learning Research.
    Ding, H., Pang, L., Wei, Z., Shen, H., & Cheng, X. (2024). Retrieve only when it needs: Adaptive retrieval augmentation for hallucination mitigation in large language models. arXiv preprint arXiv:2402.10612.
    Es, S., James, J., Anke, L. E., & Schockaert, S. (2024). Ragas: Automated evaluation of retrieval augmented generation. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations,
    Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., & Wang, H. (2023). Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997.
    Guan, L., Huang, Y., & Liu, J. (2025). Biomedical Question Answering via Multi-Level Summarization on a Local Knowledge Graph. arXiv preprint arXiv:2504.01309.
    Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., & Qin, B. (2025). A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Transactions on Information Systems, 43(2), 1-55.
    Ji, Y., Li, Z., Meng, R., Sivarajkumar, S., Wang, Y., Yu, Z., Ji, H., Han, Y., Zeng, H., & He, D. (2024). RAG-RLRC-LaySum at BioLaySumm: Integrating Retrieval-Augmented Generation and Readability Control for Layman Summarization of Biomedical Texts. Proceedings of the 23rd Workshop on Biomedical Natural Language Processing,
    Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y. J., Madotto, A., & Fung, P. (2023). Survey of hallucination in natural language generation. ACM computing surveys, 55(12), 1-38.
    Jin, X., & Wang, Y. (2023). Understand legal documents with contextualized large language models. arXiv preprint arXiv:2303.12135.
    Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255-260.
    Kim, J., & Min, M. (2024). From rag to qa-rag: Integrating generative ai for pharmaceutical regulatory compliance process. arXiv preprint arXiv:2402.01717.
    Kolodner, J. L. (1992). An introduction to case-based reasoning. Artificial intelligence review, 6(1), 3-34.
    Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., & Rocktäschel, T. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.
    Liu, Y., Peng, X., Zhang, X., Liu, W., Yin, J., Cao, J., & Du, T. (2024). RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback. Findings of the Association for Computational Linguistics ACL 2024,
    Murphy, K. P. (2012). Machine learning: a probabilistic perspective. MIT press.
    Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., & Ray, A. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730-27744.
    Rafailov, R., Sharma, A., Mitchell, E., Manning, C. D., Ermon, S., & Finn, C. (2024). Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36.
    Reddy, S. (2022). Automating human evaluation of dialogue systems. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop,
    Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP),
    Sharma, S., Yoon, D. S., Dernoncourt, F., Sultania, D., Bagga, K., Zhang, M., Bui, T., & Kotte, V. (2024). Retrieval augmented generation for domain-specific question answering. arXiv preprint arXiv:2404.14760.
    Sutton, R. S. (2018). Reinforcement learning: An introduction. A Bradford Book.
    Wiratunga, N., Abeyratne, R., Jayawardena, L., Martin, K., Massie, S., Nkisi-Orji, I., Weerasinghe, R., Liret, A., & Fleisch, B. (2024). CBR-RAG: Case-based reasoning for retrieval augmented generation in LLMs for legal question answering. International Conference on Case-Based Reasoning,
    Yang, R. (2024). CaseGPT: A case reasoning framework based on language models and retrieval-augmented generation. arXiv preprint arXiv:2407.07913.
    Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. BERTScore: Evaluating Text Generation with BERT. International Conference on Learning Representations,
    Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., & Xing, E. (2023). Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems, 36, 46595-46623.

    無法下載圖示 校內:2030-06-16公開
    校外:2030-06-16公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE