簡易檢索 / 詳目顯示

研究生: 羅子涵
Lo, Tzu-Han
論文名稱: 學會拒絕:透過關係增強的自然語言推理提升檢索增強生成的負面拒絕能力
Learn How to Reject: Improving Negative Rejection in RAG with Relation-Enhanced NLI
指導教授: 高宏宇
Kao, Hung-Yu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 人工智慧科技碩士學位學程
Graduate Program of Artificial Intelligence
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 51
中文關鍵詞: 自然語言處理檢索增強生成負面拒絕自然語言推理邏輯推理
外文關鍵詞: Natural Language Processing, Retrieval-augmented Generation, Negative Rejection, Natural Language Inference, Logical Reasoning
相關次數: 點閱:54下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 檢索增強生成 (Retrieval-Augmented Generation) 旨在利用外部知識,緩解大型語言模型的幻覺、資訊過時等問題,並提升大型語言模型在執行任務上的正確率及精準度。然而,有時候外部知識並不存在有用資訊,而當提供的外部知識不含任何有用資訊時,模型應該要拒絕使用這些知識來執行任務,這樣的能力被稱為「負面拒絕」(Negative Rejection)。我們將這個概念引用至邏輯推理 (Logical Reasoning):當推理過程所需要的資訊不存在時,應該要停止推理,並且指出缺乏的資訊為何。儘管目前的大型語言模型在許多方面都有優秀的表現,然而在執行負面拒絕時,還是差強人意。為了解決這一問題,我們設計了一套兩階段的流程,以協助大型語言模型更好的執行負面拒絕。我們的貢獻大致如下:一、提出「關係增強的自然語言推理」(Relation-Enhanced Natural Language Inference),改善原本自然語言推理 (Natural Language Inference) 的缺陷,更有效的辨別兩段文字之間的資訊不一致。二、提出「資訊不足引導」(Insufficient Information Guidance),透過簡單並有效的提示工程 (Prompt Engineering),引導大型語言模型完成負面拒絕。

    Retrieval-Augmented Generation (RAG) aims to leverage external knowledge to mitigate issues such as hallucinations and outdated information in Large Language Models (LLMs), thereby improving the accuracy and precision in task execution. However, sometimes external knowledge does not contain any useful information.When the provided external knowledge lacks relevant information, LLMs should be capable of rejecting to use these information for task execution. This capability is called ”Negative Rejection”. We apply this concept to logical reasoning: when the necessary information for reasoning is absent, LLMs should stop reasoning and identify the missing information. While current LLMs perform well in many aspects, they often struggle with effectively executing negative rejection. To address this issue, we have designed a two-stage process to assist LLMs in better executing negative rejection. Our contributions are as follows: First, we propose ”Relation-Enhanced Natural Language Inference”, which improves upon traditional Natural Language Inference (NLI) by more effectively identifying inconsistencies between two pieces of text. Second, we introduce ”Insufficient Information Guidance,” which uses simple and effective prompt engineering to guide large language models in performing negative rejection.

    摘要 i Abstract ii 誌謝 iii Table of Contents iv List of Tables vi List of Figures vii Chapter 1. Introduction 1 Chapter 2. Related Work 6 2.1. Negative Rejection of Retrieval-augmented Generation 6 2.2. Logical Reasoning 6 2.3. Natural Language Inference 7 2.3.1. Natural Language Inference in Retrieval Question Answering 8 2.4. Relation Extraction 8 Chapter 3. Methodology 10 3.1. Relation Extraction 11 3.2. Relation-enhanced NLI 11 3.3. Insufficient Information Guidance 14 Chapter 4. Experiment 16 4.1. Datasets 16 4.1.1. bAbI Task 15 and Task 16 16 4.1.2. RuleTaker 18 4.1.3. CLUTRR 18 4.1.4. RGB 19 4.1.5. SQuAD 2.0 19 4.2. Baselines 19 4.2.1. Prompting 20 4.2.2. NLI 25 4.2.3. NLI + Insufficient Information Guidance 25 4.3. Evaluation Metric 25 4.4. Main Results 25 Chapter 5. Analysis 27 5.1. Analysis of Logical Reasoning 27 5.2. Analysis of Insufficient Information Guidance 27 5.2.1. Ablation Study 27 5.2.2. Comparison between Different Insufficient Information Sign 29 5.3. Context Length Robustness of Relation-enhanced NLI 29 Chapter 6. Conclusion 39 6.1. Conclusion 39 6.2. Limitation 39 References 40

    [1] Yejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiezheng Yu, Willy Chung, et al. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arxiv. arXiv preprint arXiv:2302.04023, 2023.
    [2] Samuel R Bowman, Gabor Angeli, Christopher Potts, and Christopher D Manning. A large annotated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326, 2015.
    [3] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
    [4] Meng Cao, Yue Dong, Jiapeng Wu, and Jackie Chi Kit Cheung. Factual error correction for abstractive summarization models. arXiv preprint arXiv:2010.08712, 2020.
    [5] Jiawei Chen, Hongyu Lin, Xianpei Han, and Le Sun. Benchmarking large language models in retrieval-augmented generation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 17754–17762, 2024.
    [6] Peter Clark, Oyvind Tafjord, and Kyle Richardson. Transformers as soft reasoners over language. arXiv preprint arXiv:2002.05867, 2020.
    [7] Mike Conover, Matt Hayes, Ankit Mathur, Jianwei Xie, Jun Wan, Sam Shah, Ali Ghodsi, Patrick Wendell, Matei Zaharia, and Reynold Xin. Free dolly: Introducing the world’s first truly open instruction-tuned llm, 2023.
    [8] Ido Dagan, Oren Glickman, and Bernardo Magnini. The pascal recognising textual entailment challenge. In Machine learning challenges workshop, pages 177–190. Springer, 2005.
    [9] Claire Gardent, Anastasia Shimorina, Shashi Narayan, and Laura Perez-Beltrachini. The webnlg challenge: Generating text from rdf data. In 10th International Conference on Natural Language Generation, pages 124–133. ACL Anthology, 2017.
    [10] Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. Retrieval augmented language model pre-training. In International conference on machine learning, pages 3929–3938. PMLR, 2020.
    [11] Sanda Harabagiu and Andrew Hickl. Methods for using textual entailment in opendomain question answering. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 905–912, 2006.
    [12] Hangfeng He, Hongming Zhang, and Dan Roth. Rethinking with retrieval: Faithful large language model inference. arXiv preprint arXiv:2301.00303, 2022.
    [13] Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403, 2022.
    [14] Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. CoRR, abs/1910.13461, 2019.
    [15] Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
    [16] Nelson F Liu, Tianyi Zhang, and Percy Liang. Evaluating verifiability in generative search engines. arXiv preprint arXiv:2304.09848, 2023.
    [17] Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint arXiv:2202.12837, 2022.
    [18] Thien Huu Nguyen and Ralph Grishman. Relation extraction: Perspective from convolutional neural networks. In Proceedings of the 1st workshop on vector space modeling for natural language processing, pages 39–48, 2015.
    [19] Pranav Rajpurkar, Robin Jia, and Percy Liang. Know what you don’t know: Unanswerable questions for squad. arXiv preprint arXiv:1806.03822, 2018.
    [20] Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250, 2016.
    [21] Matthew Richardson, Christopher JC Burges, and Erin Renshaw. Mctest: A challenge dataset for the open-domain machine comprehension of text. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 193–203, 2013.
    [22] Xinyue Shen, Zeyuan Chen, Michael Backes, and Yang Zhang. In chatgpt we trust? measuring and characterizing the reliability of chatgpt. arXiv preprint arXiv:2304.08979, 2023.
    [23] Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed H Chi, Nathanael Schärli, and Denny Zhou. Large language models can be easily distracted by irrelevant context. In International Conference on Machine Learning, pages 31210– 31227. PMLR, 2023.
    [24] Koustuv Sinha, Shagun Sodhani, Jin Dong, Joelle Pineau, and William L Hamilton. Clutrr: A diagnostic benchmark for inductive reasoning from text. arXiv preprint arXiv:1908.06177, 2019.
    [25] Somin Wadhwa, Silvio Amir, and Byron C Wallace. Revisiting relation extraction in the era of large language models. In Proceedings of the conference. Association for Computational Linguistics. Meeting, volume 2023, page 15566. NIH Public Access, 2023.
    [26] Cunxiang Wang, Xiaoze Liu, Yuanhao Yue, Xiangru Tang, Tianhang Zhang, Cheng Jiayang, Yunzhi Yao, Wenyang Gao, Xuming Hu, Zehan Qi, et al. Survey on factuality in large language models: Knowledge, retrieval and domain-specificity. arXiv preprint arXiv:2310.07521, 2023.
    [27] Peter Cathcart Wason and Philip Nicholas Johnson-Laird. Psychology of reasoning: Structure and content, volume 86. Harvard University Press, 1972.
    [28] Jason Weston, Antoine Bordes, Sumit Chopra, Alexander M Rush, Bart Van Merriënboer, Armand Joulin, and Tomas Mikolov. Towards ai-complete question answering: A set of prerequisite toy tasks. arXiv preprint arXiv:1502.05698, 2015.
    [29] Adina Williams, Nikita Nangia, and Samuel Bowman. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122. Association for Computational Linguistics, 2018.
    [30] GuoDong Zhou, Jian Su, Jie Zhang, and Min Zhang. Exploring various knowledge in relation extraction. In Proceedings of the 43rd annual meeting of the association for computational linguistics (acl '05), pages 427–434, 2005.

    下載圖示 校內:2025-09-01公開
    校外:2025-09-01公開
    QR CODE