簡易檢索 / 詳目顯示

研究生: 胡劍杰
Runn, Prasoprat
論文名稱: 透過高品質的理由生成與關鍵字引導增強自動思維鏈提示
Enhancing Automatic Chain-of-Thought Prompting with Qualified Rationale Generation and Keyword Guidance
指導教授: 高宏宇
Kao, Hung-Yu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 人工智慧科技碩士學位學程
Graduate Program of Artificial Intelligence
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 50
外文關鍵詞: Natural Language Processing, In-context Learning, Chain-of-Thought
相關次數: 點閱:53下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • This thesis explores advanced methods for optimizing chain-of-thought prompting in large language models (LLMs) to address challenges related to the inconvenience of creating chain-of-thought prompts. Building upon the core idea of step-by-step reasoning, we introduce a novel framework that enhances the generation of chain-of-thought prompts using high-quality rationales, an appropriate sample ordering method, and additional keywords as guidance. Our approach enhances existing sample selection methods and the traditional rationale generated through Zero-shot-CoT by pruning low-quality rationales to ensure high-quality inputs. Additionally, we incorporate essential keywords to enrich the prompts, providing context for improved performance in complex reasoning tasks.
    Extensive experiments across six datasets—GSM8K, MultiArith, SingleEQ, SVAMP, CommonsenseQA, and StrategyQA—demonstrate the significant impact of sample ordering, rationale quality pruning, and keyword utilization on the effectiveness of chain-of-thought prompting. Our findings reveal that ascending order generally outperforms descending order. Pruning rationales that fail to reach correct answers proves beneficial. Moreover, providing keywords in the prompt further enhances the model's performance, and using five keywords strikes the optimal balance between providing sufficient context and avoiding information overload.
    This research highlights the importance of balancing information in prompt design and offers a robust method to enhance the performance of LLMs in complex problem-solving tasks. The insights gained from this study provide a foundation for future research aimed at further optimizing prompt generation strategies and exploring their applications across diverse and challenging domains.

    Abstract i Acknowledgements ii Table of Contents iii List of Tables v List of Figures vi Chapter 1 Introduction 1 1.1 Large Language Models 1 1.2 Methods to Improve Hallucination 2 1.3 Motivation 3 1.4 Our Works 4 Chapter 2 Related Work 6 2.1 In-context Learning 6 2.2 Chain-of-Thought Prompt 6 2.3 Zero-shot-CoT 7 2.4 Automated Generation 8 2.5 Sample Selection 9 2.6 Post-hoc explanation method 10 Chapter 3 Methodology 11 3.1 Framework 11 3.2 Dataset Preparation 11 3.3 Sample Selection 12 3.4 Rationale Generation 13 3.5 Rationale Quality Pruning 13 3.6 Keywords Extraction 13 3.7 Prompt Assembling 14 Chapter 4 Experiment 15 4.1 Dataset 15 4.1.1 GSM8K 16 4.1.2 MultiArith 16 4.1.3 SingleEQ 16 4.1.4 SVAMP 16 4.1.5 CommonsenseQA 17 4.1.6 StrategyQA 17 4.2 Baseline 18 4.2.1 Zero-shot-CoT 18 4.2.2 Post-hoc explanation 18 4.2.3 Complexity-based 18 4.2.4 Information-Entropy-based 19 4.2.5 Misclassification Confidence Score 19 4.3 Experimental Settings 19 4.4 Overall Performance 19 4.5 Performance with Different Inference Model 22 Chapter 5 Analysis 24 5.1 Impact of Sample Ordering 24 5.2 Impact of Rationale Quality Pruning 27 5.3 Impact of Number of Keywords 32 5.4 Impact of Adding Keywords 34 Chapter 6 Conclusion 38 References 40

    [1] AI@Meta. Llama 3 model card. 2024.
    [2] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
    [3] Jan Clusmann, Fiona R Kolbinger, Hannah Sophie Muti, Zunamys I Carrero, JanNiklas Eckardt, Narmin Ghaffari Laleh, Chiara Maria Lavinia Löffler, Sophie-Caroline Schwarzkopf, Michaela Unger, Gregory P Veldhuizen, et al. The future landscape of large language models in medicine. Communications medicine, 3(1):141, 2023.
    [4] Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021.
    [5] Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, and Tushar Khot. Complexitybased prompting for multi-step reasoning. In The Eleventh International Conference on Learning Representations, 2022.
    [6] Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, and Jonathan Berant. Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics, 9:346–361, 2021.
    [7] Giwon Hong, Aryo Pradipta Gema, Rohit Saxena, Xiaotang Du, Ping Nie, Yu Zhao, Laura Perez-Beltrachini, Max Ryabinin, Xuanli He, and Pasquale Minervini. The hallucinations leaderboard–an open effort to measure hallucinations in large language models. arXiv preprint arXiv:2404.05904, 2024.
    [8] Jiaxin Huang, Shixiang Shane Gu, Le Hou, Yuexin Wu, Xuezhi Wang, Hongkun Yu, and Jiawei Han. Large language models can self-improve. arXiv preprint arXiv:2210.11610, 2022.
    [9] Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv preprint arXiv:2311.05232, 2023.
    [10] Ziwei Ji, Tiezheng Yu, Yan Xu, Nayeon Lee, Etsuko Ishii, and Pascale Fung. Towards mitigating hallucination in large language models via self-reflection. arXiv preprint arXiv:2310.06271, 2023.
    [11] Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
    [12] Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
    [13] Rik Koncel-Kedziorski, Hannaneh Hajishirzi, Ashish Sabharwal, Oren Etzioni, and Siena Dumas Ang. Parsing algebraic word problems into equations. Transactions of the Association for Computational Linguistics, 3:585–597, 2015.
    [14] Rik Koncel-Kedziorski, Subhro Roy, Aida Amini, Nate Kushman, and Hannaneh Hajishirzi. Mawps: A math word problem repository. In Proceedings of the 2016 conference of the north american chapter of the association for computational linguistics: human language technologies, pages 1152–1157, 2016.
    [15] Satyapriya Krishna, Jiaqi Ma, Dylan Slack, Asma Ghandeharioun, Sameer Singh, and Himabindu Lakkaraju. Post hoc explanations of language models can improve language models. Advances in Neural Information Processing Systems, 36, 2024.
    [16] Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, and Weizhu Chen. What makes good in-context examples for gpt-3? arXiv preprint arXiv:2101.06804, 2021.
    [17] Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12:157–173, 2024.
    [18] Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023.
    [19] Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, and Pontus Stenetorp. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv preprint arXiv:2104.08786, 2021.
    [20] Feng Nie, Meixi Chen, Zhirui Zhang, and Xu Cheng. Improving few-shot performance of language models via nearest neighbor calibration. arXiv preprint arXiv:2212.02216, 2022.
    [21] Arkil Patel, Satwik Bhattamishra, and Navin Goyal. Are nlp models really able to solve simple math word problems? arXiv preprint arXiv:2103.07191, 2021.
    [22] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
    [23] Jiho Shin, Clark Tang, Tahmineh Mohati, Maleknaz Nayebi, Song Wang, and Hadi Hemmati. Prompt engineering or fine tuning: An empirical assessment of large language models in automated software engineering tasks. arXiv preprint arXiv:2310.10508, 2023.
    [24] Alon Talmor, Jonathan Herzig, Nicholas Lourie, and Jonathan Berant. CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4149–4158, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.
    [25] Katherine Tian, Eric Mitchell, Huaxiu Yao, Christopher D Manning, and Chelsea Finn. Fine-tuning language models for factuality. arXiv preprint arXiv:2311.08401, 2023.
    [26] Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
    [27] Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, et al. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022.
    [28] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022.
    [29] Yuxi Xie, Kenji Kawaguchi, Yiran Zhao, James Xu Zhao, Min-Yen Kan, Junxian He, and Michael Xie. Self-evaluation guided beam search for reasoning. Advances in Neural Information Processing Systems, 36, 2024.
    [30] Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
    [31] Zhuosheng Zhang, Aston Zhang, Mu Li, and Alex Smola. Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493, 2022.
    [32] Chuyue Zhou, Wangjie You, Juntao Li, Jing Ye, Kehai Chen, and Min Zhang. Inform: Information entropy based multi-step reasoning for large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3565–3576, 2023.
    [33] Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, et al. Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625, 2022.

    下載圖示 校內:2025-09-01公開
    校外:2025-09-01公開
    QR CODE