| 研究生: |
潘班 Phan, Ben |
|---|---|
| 論文名稱: |
透過主張驗證器與檢索增強生成實現高保真生醫資料的可驗證增強方法 Verifiable Augmentation for High-Fidelity Biomedical Datasets Through Claim-Checker and Retrieval-Augmented Generation |
| 指導教授: |
蔣榮先
Chiang, Jung-Hsien |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 英文 |
| 論文頁數: | 110 |
| 中文關鍵詞: | 生成式人工智慧 、大型語言模型 、資料增強 、檢索增強生成 、主張驗證 、可驗證增強 、生物醫學自然語言處理 |
| 外文關鍵詞: | Generative Artificial Intelligence, Large Language Models, Data Augmentation, Retrieval-augmented generation, Claim Verification, Verifiable Augmentation, Biomedical NLP |
| ORCID: | https://orcid.org/0009-0006-3048-356X |
| ResearchGate: | https://scholar.google.com/citations?user=r3tfrQcAAAAJ&hl |
| 相關次數: | 點閱:16 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
生成式人工智慧,特別是大型語言模型,近年來發展迅速,在各個領域展現出生成連貫且語境豐富內容的卓越能力。然而,目前基於大型語言模型的資料增強方法在生物醫學等專業領域仍面臨重大限制。這些挑戰包括事實不一致性、準確性不足,以及由於生物醫學資料集的複雜性和領域特定性質而導致的多樣性有限。因此,傳統的生成方法往往無法捕捉重要的生物醫學實體關係和術語精確性,導致下游生物醫學自然語言處理任務的次優表現。
本論文提出一種新穎的可驗證增強方法,整合檢索增強生成與嚴格的主張驗證器驗證機制來解決這些限制。我們的方法利用檢索增強生成從權威來源如醫學文獻資料庫和維基百科檢索相關的生物醫學知識,以準確且語境精確的資訊豐富增強樣本。同時,主張驗證器模組系統性地驗證增強樣本中主張的事實正確性和完整性,顯著減少錯誤資訊並確保生成的生物醫學資料具有高保真度和多樣性。我們提出的方法透過結合外部知識檢索與細粒度主張驗證,有效提升增強生物醫學資料集的可靠性、準確性和全面性。
我們在三個生物醫學資料集上進行了廣泛實驗:生物創意第八屆生物關係抽取競賽子任務一、語義評估二○一三年任務九(藥物相互作用抽取),以及語義評估二○二五年任務九(食品危害檢測挑戰)。我們的方法在精確度、召回率和綜合評分方面始終優於缺乏檢索整合的基準方法。具體而言,我們的方法在生物創意第八屆生物關係抽取競賽子任務一中獲得百分之四十四點一七的「全部」分數,位列前三名團隊。在語義評估二○一三年任務九中,我們的方法達到令人印象深刻的百分之九十點四綜合評分,超越先前最先進的增強技術。此外,在高度競爭的語義評估二○二五年任務九挑戰中,該挑戰吸引全球超過二百六十個參與團隊,我們提出的方法在兩個子任務中均獲得第二名。這些結果證明了結合檢索增強生成與主張驗證對於增強生物醫學自然語言處理任務的實用效能和穩健性。我們的發現進一步突顯了這種整合方法在專業領域內推進可驗證增強方法論的巨大潛力。
Generative Artificial Intelligence (GenAI), particularly large language models (LLMs), has advanced rapidly in recent years, demonstrating remarkable capabilities for generating coherent and contextually rich content across various domains. However, current LLM-based data augmentation methods still face critical limitations in specialized fields such as biomedicine. These challenges include factual inconsistencies, insufficient accuracy, and limited diversity due to the complexity and domain-specific nature of biomedical datasets. Consequently, traditional generative approaches often fail to capture essential biomedical entity relationships and terminological precision, leading to suboptimal performance in downstream biomedical NLP tasks.
This thesis introduces a novel Verifiable Augmentation method that integrates Retrieval-augmented generation (RAG) with a rigorous Claim-Checker validation mechanism to address these limitations. Our approach leverages RAG to retrieve relevant biomedical knowledge from authoritative sources such as PubMed and Wikipedia, enriching augmented samples with accurate and contextually precise information. Simultaneously, the Claim-Checker module systematically verifies claims' factual correctness and completeness within augmented samples, significantly reducing erroneous information and ensuring high fidelity and diversity in the generated biomedical data. Our proposed method effectively enhances augmented biomedical datasets' reliability, accuracy, and comprehensiveness by combining external knowledge retrieval with fine-grained claim validation.
We conducted extensive experiments on three biomedical datasets: BioCreative VIII BioRED Track Subtask 1, SemEval 2013 Task 9 (drug-drug interaction extraction), and SemEval 2025 Task 9 (Food Hazard Detection Challenge). Our approach consistently outperformed baseline methods that lacked retrieval integration regarding precision, recall, and F1 scores. Specifically, our method achieved an "All" score of 44.17% on BioCreative VIII BioRED Track Subtask 1, placing among the top three teams. On SemEval 2013 Task 9, our method attained an impressive F1 score of 90.4%, surpassing previous state-of-the-art augmentation techniques. Furthermore, in the highly competitive SemEval 2025 Task 9 challenge, which attracted more than 260 participating teams globally, our proposed method secured a 2nd ranking in both subtasks. These results demonstrate the practical effectiveness and robustness of combining Retrieval-augmented generation with claim-verified validation for enhancing biomedical NLP tasks. Our findings further highlight the significant potential of this integrated approach to advance Verifiable Augmentation methodologies within specialized domains.
[1] Dongyu Ru, Lin Qiu, Xiangkun Hu, Tianhang Zhang, Peng Shi, Shuaichen Chang, Jiayang Cheng, Cunxiang Wang, Shichao Sun, Huanyu Li, Zizhao Zhang, Binjie Wang, Jiarong Jiang, Tong He, Zhiguo Wang, Pengfei Liu, Yue Zhang, and Zheng Zhang. Ragchecker: A fine-grained framework for diagnosing retrieval-augmented generation, 2024.
[2] Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yixin Dai, Jiawei Sun, Haofen Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey. CoRR, 2023.
[3] Steven Y Feng, Varun Gangal, Jason Wei, Sarath Chandar, Soroush Vosoughi, Teruko Mitamura, and Eduard Hovy. A survey of data augmentation approaches for nlp. In Findings of the Association for Computational Linguistics: ACL- IJCNLP 2021, pages 968–988, 2021.
[4] Jiayi Yuan, Ruixiang Tang, Xiaoqian Jiang, and Xia Hu. Large language models for healthcare data augmentation: An example on patient-trial matching. In AMIA Annual Symposium Proceedings, volume 2023, page 1324, 2024.
[5] Zaifu Zhan, Jun Wang, Shuang Zhou, Jiawen Deng, and Rui Zhang. Mm- rag: Multi-mode retrieval-augmented generation with large language models for biomedical in-context learning, 2025.
[6] Varun Kumar, Ashutosh Choudhary, and Eunah Cho. Data augmentation using pre-trained transformer models. In Proceedings of the 2nd Workshop on Life-long Learning for Spoken Language Systems, pages 18–26, 2020.
[7] Hengyi Cai, Hongshen Chen, Yonghao Song, Cheng Zhang, Xiaofang Zhao, and Dawei Yin. Data manipulation: Towards effective instance learning for neural dia- logue generation via learning to augment and reweight. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6334– 6343, 2020.
[8] Rico Sennrich, Barry Haddow, and Alexandra Birch. Improving neural machine translation models with monolingual data. In 54th Annual Meeting of the Associ- ation for Computational Linguistics, pages 86–96. Association for Computational Linguistics (ACL), 2016.
[9] Jason Wei and Kai Zou. EDA: Easy data augmentation techniques for boosting performance on text classification tasks. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, editors, EMNLP-IJCNLP 2019, pages 6382–6388, Hong Kong, China, November 2019. Association for Computational Linguistics.
[10] Rico Sennrich, Barry Haddow, and Alexandra Birch. Improving neural machine translation models with monolingual data. In Katrin Erk and Noah A. Smith, editors, Proceedings of the 54th Annual Meeting of the Association for Compu- tational Linguistics, pages 86–96, Berlin, Germany, August 2016. Association for Computational Linguistics.
[11] Cong-Phuoc Phan, Ben Phan, and Jung-Hsien Chiang. Optimized biomedical entity relation extraction method with data augmentation and classification using gpt-4 and gemini. Database, 2024:baae104, 2024.
[12] Xu Han, Zhengyan Zhang, Ning Ding, Yuxian Gu, Xiao Liu, Yuqi Huo, Jiezhong Qiu, Yuan Yao, Ao Zhang, Liang Zhang, et al. Pre-trained models: Past, present and future. AI Open, 2:225–250, 2021.
[13] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre- training of deep bidirectional transformers for language understanding. In Pro- ceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019.
[14] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth´ee Lacroix, Baptiste Rozi`ere, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lam- ple. Llama: Open and efficient foundation language models, 2023.
[15] Albert Q Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Deven- dra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guil- laume Lample, Lucile Saulnier, et al. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
[16] M Li, H Kilicoglu, H Xu, and R Zhang. Biomedrag: A retrieval augmented large language model for biomedicine. arxiv. arXiv preprint arXiv:2405.00465, 2024.
[17] Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Ku¨ttler, Mike Lewis, Wen-tau Yih, Tim Rockta¨schel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented genera- tion for knowledge-intensive nlp tasks. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, NeurIPS, volume 33, pages 9459–9474. Curran Associates, Inc., 2020.
[18] Rui Yang, Yilin Ning, Emilia Keppo, Mingxuan Liu, Chuan Hong, Danielle S Bitterman, Jasmine Chiat Ling Ong, Daniel Shu Wei Ting, and Nan Liu. Retrieval- augmented generation for generative artificial intelligence in health care. npj Health Systems, 2(1):2, 2025.
[19] Darya Shlyk, Tudor Groza, Stefano Montanelli, Emanuele Cavalleri, Marco Mesiti, et al. Real: A retrieval-augmented entity linking approach for biomedical concept recognition. In Proceedings of the 23rd Workshop on Biomedical Natural Language Processing, pages 380–389. Association for Computational Linguistics, 2024.
[20] Guangzhi Xiong, Qiao Jin, Xiao Wang, Minjia Zhang, Zhiyong Lu, and Aidong Zhang. Improving retrieval-augmented generation in medicine with iterative follow-up questions. In Biocomputing 2025: Proceedings of the Pacific Sympo- sium, pages 199–214. World Scientific, 2024.
[21] Jing Miao, Charat Thongprayoon, Supawadee Suppadungsuk, Oscar A. Gar- cia Valencia, and Wisit Cheungpasitporn. Integrating retrieval-augmented gener- ation with large language models in nephrology: Advancing practical applications. Medicina, 60(3), 2024.
[22] Samy Ateia and Udo Kruschwitz. Bioragent: A retrieval-augmented generation system for showcasing generative query expansion and domain-specific search for scientific q&a, 2024.
[23] Ling Luo, Po-Ting Lai, Chih-Hsuan Wei, Cecilia N Arighi, and Zhiyong Lu. Biored: a rich biomedical relation extraction dataset. Briefings in Bioinformatics, 23(5):bbac282, 2022.
[24] Isabel Segura-Bedmar, Paloma Mart´ınez, and Mar´ıa Herrero-Zazo. Semeval-2013 task 9: Extraction of drug-drug interactions from biomedical texts (ddiextraction 2013). In Workshop on Semantic Evaluation (SemEval 2013), pages 341–350, 2013.
[25] Korbinian Randl, John Pavlopoulos, Aron Henriksson, and Tony Lindgren. SemEval-2025 task 9: The food hazard detection challenge. In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), Vienna, Austria, July 2025. Association for Computational Linguistics.
[26] Haixing Dai, Zhengliang Liu, Wenxiong Liao, Xiaoke Huang, Yihan Cao, Zihao Wu, Lin Zhao, Shaochen Xu, Fang Zeng, Wei Liu, et al. Auggpt: Leveraging chatgpt for text data augmentation. IEEE Transactions on Big Data, 2025.
[27] Lasal Jayawardena, Prasan Yapa, et al. Parafusion: A large-scale llm-driven english paraphrase dataset infused with high-quality lexical and syntactic diversity. In CS & IT Conference Proceedings, volume 14. CS & IT Conference Proceedings, 2024.
[28] Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level convolutional net- works for text classification. NeurIPS, 28, 2015.
[29] Wei Yang, Yuqing Xie, Luchen Tan, Kun Xiong, Ming Li, and Jimmy Lin. Data augmentation for bert fine-tuning in open-domain question answering. arXiv preprint arXiv:1904.06652, 2019.
[30] George A Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine J Miller. Introduction to wordnet: An on-line lexical database. In- ternational journal of lexicography, 3(4):235–244, 1990.
[31] Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, H´erve J´egou, and Tomas Mikolov. Fasttext. zip: Compressing text classification models. arXiv preprint arXiv:1612.03651, 2016.
[32] Djamila Romaissa Beddiar, Md Saroar Jahan, and Mourad Oussalah. Data ex- pansion using back translation and paraphrasing for hate speech detection. Online Social Networks and Media, 24:100153, 2021.
[33] Long Zhao, Ting Liu, Xi Peng, and Dimitris Metaxas. Maximum-entropy adver- sarial data augmentation for improved generalization and robustness. NeurIPS, 33:14435–14447, 2020.
[34] Marzieh Fadaee, Arianna Bisazza, and Christof Monz. Data augmentation for low- resource neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2017.
[35] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. NeurIPS, 33:1877–1901, 2020.
[36] Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140):1–67, 2020.
[37] Bonan Min, Hayley Ross, Elior Sulem, Amir Pouran Ben Veyseh, Thien Huu Nguyen, Oscar Sainz, Eneko Agirre, Ilana Heintz, and Dan Roth. Recent advances in natural language processing via large pre-trained language models: A survey. ACM Computing Surveys, 56(2):1–40, 2023.
[38] Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Ku¨ttler, Mike Lewis, Wen-tau Yih, Tim Rockta¨schel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented genera- tion for knowledge-intensive nlp tasks. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, NeurIPS, volume 33, pages 9459–9474. Curran Associates, Inc., 2020.
[39] Tianyu Gao, Xingcheng Yao, and Danqi Chen. Simcse: Simple contrastive learning of sentence embeddings. In EMNLP 2021, page 6894. Association for Computa- tional Linguistics, 2021.
[40] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PmLR, 2021.
[41] Qizhe Xie, Zihang Dai, Eduard Hovy, Thang Luong, and Quoc Le. Unsupervised data augmentation for consistency training. NeurIPS, 33:6256–6268, 2020.
[42] Sittiphong Pornudomthap, Ronnagorn Rattanatamma, and Patsorn Sangkloy. Overcoming data limitations in thai herb classification with data augmentation and transfer learning. Journal of Advanced Computational Intelligence and Intel- ligent Informatics, 28(3):511–519, 2024.
[43] Komang Wahyu Trisna, Jinjie Huang, Yuanjian Chen, and I Gede Juliana Eka Putra. Dynamic text augmentation for robust sentiment analysis: Enhancing model performance with eda and multi-channel cnn. IEEE Access, 2025.
[44] Museong Kim and Namgyu Kim. Text augmentation using hierarchy-based word replacement. Journal of The Korea Society of Computer and Information, 26(1):57–67, 2021.
[45] Mosleh Mahamud, Zed Lee, and Isak Samsten. Distributional data augmentation methods for low resource language. arXiv preprint arXiv:2309.04862, 2023.
[46] Anisa Nur Azizah, Misbachul Falach Asy’ari, Ifnu Wisma Dwi Prastya, and Diana Purwitasari. Easy data augmentation untuk data yang imbalance pada konsultasi kesehatan daring. Jurnal Teknologi Informasi dan Ilmu Komputer, 10(5):1095– 1104, 2023.
[47] Md Saroar Jahan, Mourad Oussalah, Djamila Romaissa Beddia, Jhuma kabir Mim, and Nabil Arhab. A comprehensive study on nlp data augmentation for hate speech detection: Legacy methods, bert, and llms, 2024.
[48] Mengxin Wang, Dennis Zhang, and Heng Zhang. Large language models for mar- ket research: A data-augmentation approach. Available at SSRN 5057769, 2024.
[49] Vinay Samuel, Yue Zhou, and Henry Peng Zou. Towards data contamination de- tection for modern large language models: Limitations, inconsistencies, and oracle challenges. In Proceedings of the 31st International Conference on Computational Linguistics, pages 5058–5070, 2025.
[50] Ziwei Xu, Sanjay Jain, and Mohan S Kankanhalli. Hallucination is inevitable: An innate limitation of large language models. CoRR, 2024.
[51] Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick SH Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open- domain question answering. In EMNLP 2020, pages 6769–6781, 2020.
[52] Siru Liu, Allison B McCoy, and Adam Wright. Improving large language model applications in biomedicine with retrieval-augmented generation: a systematic re- view, meta-analysis, and clinical development guidelines. Journal of the American Medical Informatics Association, page ocaf008, 2025.
[53] Mor Zarfati, Shelly Soffer, Girish N Nadkarni, and Eyal Klang. Retrieval- augmented generation: Advancing personalized care and research in oncology. European Journal of Cancer, 220:115341, 2025.
[54] Rui Yang, Yilin Ning, Emilia Keppo, Mingxuan Liu, Chuan Hong, Danielle S Bit- terman, Jasmine Chiat Ling Ong, Daniel Shu Wei Ting, and Nan Liu. Retrieval- augmented generation for generative artificial intelligence in medicine. arXiv preprint arXiv:2406.12449, 2024.
[55] Xiangkun Hu, Dongyu Ru, Lin Qiu, Qipeng Guo, Tianhang Zhang, Yang Xu, Yun Luo, Pengfei Liu, Yue Zhang, and Zheng Zhang. Refchecker: Reference-based fine- grained hallucination checker and benchmark for large language models. arXiv preprint arXiv:2405.14486, 2024.
[56] Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations, 2020.
[57] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002.
[58] Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81, 2004.
[59] Potsawee Manakul, Adian Liusie, and Mark Gales. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. In EMNLP 2023.
[60] Sreyan Ghosh, Utkarsh Tyagi, Sonal Kumar, Chandra Kiran Reddy Evuru, S Ra- maneswaran, S Sakshi, and Dinesh Manocha. Abex: Data augmentation for low- resource nlu via expanding abstract descriptions. In Proceedings of ACL 2024, pages 726–748, 2024.
[61] Cong-Phuoc Phan, Gia-Han Ngo, Ben Phan, and Jung-Hsien Chiang. Probability model with ensemble learning and data augmentation for named entity recognition (ner) and relation extraction (re) tasks. In AMIA 2023 Annual Symposium, 2023.
校內:2030-01-01公開