成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	王家揚 Wang, Jia-Yang
論文名稱：	應用知識蒸餾技術開發輕量化 LLM 之中耳炎醫病溝通衛教系統 Development of a Doctor-Patient Communication System with a Lightweight LLM through Knowledge Distillation for PHE of Middle Ear Infection
指導教授：	杜翌群 Du, Yi-Chun
學位類別：	碩士 Master
系所名稱：	工學院 - 生物醫學工程學系 Department of BioMedical Engineering
論文出版年：	2024
畢業學年度：	112
語文別：	英文
論文頁數：	90
中文關鍵詞：	醫病溝通 (DPC) 、個人化衛教 (PHE) 、大語言模型 (LLM) 、輕量化語言模型 (Lightweight LLM) 、知識蒸餾 (KD) 、檢索增強生成 (RAG) 、基於人類反饋的強化學習 (RLHF)
外文關鍵詞：	Doctor-Patient Communication (DPC), Personalized Health Education (PHE), Large Language Model (LLM), Lightweight LLM, Knowledge Distillation (KD), RetrievalAugmented Generation (RAG), Reinforcement Learning from Human Feedback (RLHF)
相關次數：	點閱：77 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

有效的醫病溝通對提升醫療成效至關重要，其中個人化衛教扮演著重要角色。近年來，許多研究顯示大語言模型（LLM）在醫療領域具有潛力。然而，LLM 在醫療領域中的應用面臨挑戰，如資源需求龐大、昂貴、資安疑慮等。為解決這些問題，輕量化 LLM 應需而生。本研究旨開發輕量化 LLM 之醫病溝通衛教系統，首先基於醫學影像建立疾病分類模型，並轉換為LLM的輸入文字作為多模態輸入，接著利用知識蒸餾 (KD) 技術將個人化衛教知識從 LLM 中萃取，訓練微調成 PHE 模型。本研究以耳鼻喉科的中耳炎為例，實驗一結果顯示疾病分類模型具有高準確率，作為此系統多模態輸入的基礎。實驗二根據李克特量表，醫生審閱並評估由PHE模型生成的個人化衛教報告，平均分數超過4分，與微調前的模型有顯著差異（p<0.05），並評估了不同大小的模型和資料集在微調個人化衛教模型中的表現。實驗三引入了檢索增強生成技術（RAG），使系統能夠整合醫院的最新資訊，如醫師的門診時間，展現其在生成客製化衛教內容方面的優勢。實驗四引入基於人類反饋的強化學習(RLHF) 技術，使 PHE 模型能透過醫護人員修改報告進行再訓練，讓生成的報告更符合使用需求。整體而言，應用 KD 技術開發的 PHE 模型在維持資訊準確性和有效性的同時，降低模型參數大小和資源需求，提高臨床場域應用的可能性。透過 RAG 和 RLHF 技術，我們能夠在私密環境中進行客製化和持續訓練，保障醫護人員的智慧財產權與患者的隱私。此系統可根據患者資訊產生量身打造的衛教報告，做為對患者進行個人化衛教參考，減少醫護人員的負擔，患者也能收到專屬的個人化報告，有助於減少醫護人員與患者之間的溝通阻力，改善醫病溝通與照護成效。本研究展現此系統應用於臨床場域的潛力，並可做為未來相關應用之參考。

Effective Doctor-Patient Communication (DPC) is crucial for enhancing healthcare outcomes, with Personalized Health Education (PHE) playing a significant role. Recent studies have indicated Large Language Models (LLMs) hold potential in the healthcare. However, the applications of LLMs face high resource demands, cost, and security concerns. To address these issues, lightweight LLMs have emerged. This study developed a system with lightweight LLM to improve DPC by generating PHE reports based on individual patient information. Initially, a disease classification model was developed based on medical images and converted into textual input for the lightweight LLM as multimodal input. Then, employing knowledge distillation (KD) technology to extract PHE knowledge from the LLM and fine-tune lightweight LLM to PHE model. The first experiment indicated that the disease classification model boasts a high level of accuracy, serving as the foundation for the multimodal input the system. In the second experiment, doctors reviewed and evaluated PHE reports generated by the PHE model, with average scores exceeding 4 points, similar with the LLMs and significantly different from the pre-finetuned model (p<0.05). This study also evaluated the performance of different model sizes and datasets in fine-tuning PHE model. The third experiment implemented Retrieval-Augmented Generation (RAG), enabling the system to integrate the latest hospital information, such as doctors' consultation hours, demonstrating its advantage in generating customized content. The fourth experiment incorporated Reinforcement Learning with Human Feedback (RLHF), retraining PHE model based on medical staff modifications to PHE reports, making the generated reports more aligned with user needs. Overall, PHE model developed using KD techniques successfully reduced the model size and resource requirements while maintaining the information accuracy and effectiveness, enhancing its potential for clinical application. Through RAG and RLHF, the system achieved customization and continuous training in the private environment, protecting the intellectual property of medical staff and patents’ privacy. This system generated PHE reports based on patient information, serving as a reference for PHE to reduce the burden on medical staff and offering patients tailored reports, improving DPC and healthcare outcomes. This study demonstrated the potential of this system in clinical settings and future related applications.

摘要 I
Abstract II
誌謝 IV
Contents V
List of Tables VIII
List of Figures IX
CHAPTER 1. Introduction 1
1. Preface 1
2. Doctor-Patient Communication (DPC) in Healthcare 1
3. The Importance of Personalized Health Education (PHE) 3
4. The Challenge of LLMs in Healthcare 4
4.1. Model Size and Accessibility 4
4.2. Domain Data Limitations 4
4.3. Hallucination of LLMs 5
4.4. New Knowledge Adaptation and Behavior Alignment 5
5. Research Motivation and Purpose 6
CHAPTER 2. Literature Review 8
1. Deep Learning in Middle Ear Infection Classification 8
2. The Application of LLMs in Healthcare 10
3. Knowledge Distillation (KD) 12
3.1. Prompt Engineering in KD 15
3.2. Finetuning in KD 16
4. Retrieval-Augmented Generation (RAG) 17
5. Reinforcement learning from human feedback (RLHF) 19
CHAPTER 3. Materials and Methodology 21
1. The Middle Ear Infection Classification Model 21
1.1. Middle Ear Infection (Otitis Media) 22
1.2. Collection of Clinical Eardrum Image 24
1.3. Normalization and Data Augmentation 26
1.4. Model Architecture 27
1.5. K-fold cross-validation and Evaluation Metrics 29
2. KD to Develop PHE Model 31
2.1. Teacher and Student Models 31
2.2. Prompt Engineering and Design 33
2.3. Distilling PHE Reports Knowledge in the Teacher Model 37
2.4. Formatted Data Definition and Tokenization 39
2.5. ROUGE-L Metric 40
3. Implementation of RAG 41
3.1. Document Loading	43
3.2. Split Document into Chunks 43
3.3. Indexing to Vector Database 44
3.4. Retrieving Relevant Chunks from Vector Database 45
3.5. Combine Retrieval Result and Query to PHE Model 46
4. RLHF with Direct Preference Optimization (DPO) 47
CHAPTER 4. Experiment Design and Results 49
1. Evaluation of Otitis Media Classification Model 49
1.1. Experiment and Hyperparameter Setting 49
1.2. Model and Performance Analysis 50
2. Evaluation of PHE Model developed by KD 52
2.1. PHE Dataset Analysis 52
2.2. Experiment and Hyperparameter Setting of PHE Model 55
2.3. The Questionnaire of PHE Reports based on Likert Scale 57
2.4. The Questionnaire Result of PHE Reports 57
2.5. Effect of Model and Dataset Sizes for Model Performance 59
3. Experiment of RAG to Answer Specific Questions of Hospital 61
4. Experiment of RLHF with DPO 62
4.1. Human Feedback of Doctor Revised PHE Report 62
4.2. Experiment Setting of RLHF with DPO 66
4.3. Results of RLHF with DPO 66
CHAPTER 5. Discussion 69
CHAPTER 6. Conclusion and Future Work 73
References 75
                                    

[1] K. Singhal et al., "Large language models encode clinical knowledge," Nature, vol. 620, no. 7972, pp. 172-180, 2023.
[2] A. Toma, S. Senkaiahliyan, P. R. Lawler, B. Rubin, and B. Wang, "Generative AI could revolutionize health care—but not if control is ceded to big tech," Nature, vol. 624, no. 7990, pp. 36-38, 2023.
[3] "Doctor-patient communication." Avant by doctors for doctors. https://avant.org.au/resources/doctor-patient-communication (accessed July, 7, 2024).
[4] M. Shantiaei, "Patient-Physician Communication During Medical Visits: Senior Adults’ Perspectives, Expectations, and Experiences," Fortune Journal of Health Sciences, vol. 4, no. 2, pp. 324-345, 2021.
[5] S. H. Sharkiya, "Quality communication can improve patient-centred health outcomes among older patients: a rapid review," BMC Health Services Research, vol. 23, no. 1, p. 886, 2023.
[6] A. K. Ghosh, S. Joshi, and A. Ghosh, "Effective patient-physician communication—A concise review," J Assoc Physicians India, vol. 68, no. 6, pp. 53-7, 2020.
[7] J. Comstock. "Survey: 97 percent of patients OK docs' using technology during a visit." MOBIHEALTHNEWS. https://www.mobihealthnews.com/41915/survey-97-percent-of-patients-ok-docs-using-technology-during-a-visit (accessed July, 7, 2024).
[8] 鄭宜芬. "高齡少子化凸顯「三長兩短」就醫困境！." 健康醫療網. https://www.healthnews.com.tw/article/59551 (accessed July 7 2024).
[9] A. Bielecka-Dabrowa et al., "Effects of Implementing Personalized Health Education in Ambulatory Care on Cardiovascular Risk factors, compliance and satisfaction with treatment," Journal of Personalized Medicine, vol. 12, no. 10, p. 1583, 2022.
[10] A. H. Krist, S. T. Tong, R. A. Aycock, and D. R. Longo, "Engaging patients in decision-making and behavior change to promote prevention," Information Services & Use, vol. 37, no. 2, pp. 105-122, 2017.
[11] L. J. Trevena et al., "Presenting quantitative information about decision outcomes: a risk communication primer for patient decision aid developers," BMC medical informatics and decision making, vol. 13, pp. 1-15, 2013.
[12] Z. Lin, G. Qu, Q. Chen, X. Chen, Z. Chen, and K. Huang, "Pushing large language models to the 6g edge: Vision, challenges, and opportunities," arXiv preprint arXiv:2309.16739, 2023.
[13] A. Wei, N. Haghtalab, and J. Steinhardt, "Jailbroken: How does llm safety training fail?," Advances in Neural Information Processing Systems, vol. 36, 2024.
[14] K. He et al., "A Survey of Large Language Models for Healthcare: From Data, Technology, and Applications to Accountability and Ethics," Technology, and Applications to Accountability and Ethics.
[15] D. Shah. "The Ultimate Guide to Deploying Large Language Models Safely and Securely." LAKERA. https://www.lakera.ai/blog/how-to-deploy-an-llm (accessed July, 7, 2024).
[16] W. Team. "A Guide to Large Language Model Operations (LLMOps)." WHYLABS. https://whylabs.ai/blog/posts/guide-to-llmops (accessed July, 7, 2024).
[17] K. Singhal et al., "Towards expert-level medical question answering with large language models," arXiv preprint arXiv:2305.09617, 2023.
[18] V. Rawte, A. Sheth, and A. Das, "A survey of hallucination in large foundation models," arXiv preprint arXiv:2309.05922, 2023.
[19] F. Liu et al., "A medical multimodal large language model for future pandemics," NPJ Digital Medicine, vol. 6, no. 1, p. 226, 2023.
[20] Y. Yao et al., "Editing Large Language Models: Problems, Methods, and Opportunities," in The 2023 Conference on Empirical Methods in Natural Language Processing.
[21] P. Lewis et al., "Retrieval-augmented generation for knowledge-intensive nlp tasks," Advances in Neural Information Processing Systems, vol. 33, pp. 9459-9474, 2020.
[22] L. Ouyang et al., "Training language models to follow instructions with human feedback," Advances in neural information processing systems, vol. 35, pp. 27730-27744, 2022.
[23] S. Zhang et al., "Instruction tuning for large language models: A survey," arXiv preprint arXiv:2308.10792, 2023.
[24] A. Glaese et al., "Improving alignment of dialogue agents via targeted human judgements," arXiv preprint arXiv:2209.14375, 2022.
[25] X. Liu et al., "P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks," in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2022, pp. 61-68.
[26] A. R. Habib et al., "Artificial intelligence to classify ear disease from otoscopy: a systematic review and meta‐analysis," Clinical Otolaryngology, vol. 47, no. 3, pp. 401-413, 2022.
[27] T.-T. Tran, T.-Y. Fang, V.-T. Pham, C. Lin, P.-C. Wang, and M.-T. Lo, "Development of an automatic diagnostic algorithm for pediatric otitis media," Otology & Neurotology, vol. 39, no. 8, pp. 1060-1065, 2018.
[28] C.-K. Shie, C.-H. Chuang, C.-N. Chou, M.-H. Wu, and E. Y. Chang, "Transfer representation learning for medical image analysis," in 2015 37th annual international conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2015: IEEE, pp. 711-714.
[29] C. Zafer, "Fusing fine-tuned deep features for recognizing different tympanic membranes," Biocybernetics and Biomedical Engineering, vol. 40, no. 1, pp. 40-51, 2020.
[30] Y.-C. Chen et al., "Smartphone-based artificial intelligence using a transfer learning algorithm for the detection and diagnosis of middle ear diseases: A retrospective deep learning study," EClinicalMedicine, vol. 51, 2022.
[31] Y. Yue et al., "Ultrafast and Ultralight Network-Based Intelligent System for Real-time Diagnosis of Ear diseases in Any Devices," arXiv preprint arXiv:2308.10610, 2023.
[32] G. Litjens et al., "A survey on deep learning in medical image analysis," Medical image analysis, vol. 42, pp. 60-88, 2017.
[33] H. Zhou et al., "A survey of large language models in medicine: Progress, application, and challenge," arXiv preprint arXiv:2311.05112, 2023.
[34] D. McDuff et al., "Towards accurate differential diagnosis with large language models," arXiv preprint arXiv:2312.00164, 2023.
[35] Y. Gao et al., "Leveraging a medical knowledge graph into large language models for diagnosis prediction," arXiv preprint arXiv:2308.14321, 2023.
[36] S. Wang, Z. Zhao, X. Ouyang, Q. Wang, and D. Shen, "Chatcad: Interactive computer-aided diagnosis on medical image using large language models," arXiv preprint arXiv:2302.07257, 2023.
[37] C. Ma et al., "An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT," IEEE Transactions on Artificial Intelligence, 2024.
[38] M. Moghani et al., "SuFIA: Language-Guided Augmented Dexterity for Robotic Surgical Assistants," arXiv preprint arXiv:2405.05226, 2024.
[39] H. Xu, J. Wu, G. Cao, Z. Lei, Z. Chen, and H. Liu, "Enhancing Surgical Robots with Embodied Intelligence for Autonomous Ultrasound Scanning," arXiv preprint arXiv:2405.00461, 2024.
[40] I. García-Ferrero et al., "Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain," in LREC-COLING 2024-2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, 2024.
[41] C. Tang, S. Wang, T. Goldsack, and C. Lin, "Improving Biomedical Abstractive Summarisation with Knowledge Aggregation from Citation Papers," in The 2023 Conference on Empirical Methods in Natural Language Processing.
[42] Z. Ren, Y. Zhan, B. Yu, L. Ding, and D. Tao, "Healthcare Copilot: Eliciting the Power of General LLMs for Medical Consultation," arXiv preprint arXiv:2402.13408, 2024.
[43] T. Tu et al., "Towards conversational diagnostic ai," arXiv preprint arXiv:2401.05654, 2024.
[44] J. Gou, B. Yu, S. J. Maybank, and D. Tao, "Knowledge distillation: A survey," International Journal of Computer Vision, vol. 129, no. 6, pp. 1789-1819, 2021.
[45] X. Xu et al., "A survey on knowledge distillation of large language models," arXiv preprint arXiv:2402.13116, 2024.
[46] T. Brown et al., "Language models are few-shot learners," Advances in neural information processing systems, vol. 33, pp. 1877-1901, 2020.
[47] B. Lester, R. Al-Rfou, and N. Constant, "The Power of Scale for Parameter-Efficient Prompt Tuning," in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021: Association for Computational Linguistics.
[48] T. Gao, A. Fisch, and D. Chen, "Making Pre-trained Language Models Better Few-shot Learners," in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 3816-3830.
[49] C. Raffel et al., "Exploring the limits of transfer learning with a unified text-to-text transformer," Journal of machine learning research, vol. 21, no. 140, pp. 1-67, 2020.
[50] P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, "Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing," ACM Computing Surveys, vol. 55, no. 9, pp. 1-35, 2023.
[51] Y. Wang et al., "Aligning large language models with human: A survey," arXiv preprint arXiv:2307.12966, 2023.
[52] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
[53] J. Wei et al., "Finetuned Language Models are Zero-Shot Learners," in International Conference on Learning Representations.
[54] Y. Gao et al., "Retrieval-augmented generation for large language models: A survey," arXiv preprint arXiv:2312.10997, 2023.
[55] C. Zakka et al., "Almanac—retrieval-augmented language models for clinical medicine," NEJM AI, vol. 1, no. 2, p. AIoa2300068, 2024.
[56] J. C. Halamka, P. "Understanding retrieval-augmented generation." Mayo Clinic. https://chatgpt.com/c/46c757d2-e096-46af-8049-6a3e337d1e88 (accessed.
[57] S. Zeng et al., "The good and the bad: Exploring privacy issues in retrieval-augmented generation (rag)," arXiv preprint arXiv:2402.16893, 2024.
[58] R. Kirk et al., "Understanding the Effects of RLHF on LLM Generalisation and Diversity," in The Twelfth International Conference on Learning Representations.
[59] R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn, "Direct preference optimization: Your language model is secretly a reward model," Advances in Neural Information Processing Systems, vol. 36, 2024.
[60] N. Stiennon et al., "Learning to summarize with human feedback," Advances in Neural Information Processing Systems, vol. 33, pp. 3008-3021, 2020.
[61] A. Zahid, J. C. Wilson, I. D. Grice, and I. R. Peak, "Otitis media: recent advances in otitis media vaccine development and model systems," Frontiers in Microbiology, vol. 15, p. 1345027, 2024.
[62] M. Clinic. "Ear infection (middle ear)." MAYO CLINIC. https://www.mayoclinic.org/diseases-conditions/ear-infections/symptoms-causes/syc-20351616 (accessed July, 7, 2024).
[63] S. Camalan et al., "OtoMatch: Content-based eardrum image retrieval using deep learning," Plos one, vol. 15, no. 5, p. e0232776, 2020.
[64] A. Howard et al., "Searching for mobilenetv3," in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 1314-1324.
[65] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition, 2009: Ieee, pp. 248-255.
[66] P. Finardi et al., "The Chronicles of RAG: The Retriever, the Chunk and the Generator," arXiv preprint arXiv:2401.07883, 2024.
[67] B. Li and L. Han, "Distance weighted cosine similarity measure for text classification," in Intelligent Data Engineering and Automated Learning–IDEAL 2013: 14th International Conference, IDEAL 2013, Hefei, China, October 20-23, 2013. Proceedings 14, 2013: Springer, pp. 611-618.
[68] M. Raghu, C. Zhang, J. Kleinberg, and S. Bengio, "Transfusion: Understanding transfer learning for medical imaging," Advances in neural information processing systems, vol. 32, 2019.
[69] S. Sivarajkumar and Y. Wang, "HealthPrompt: a zero-shot learning paradigm for clinical natural language processing," in AMIA Annual Symposium Proceedings, 2022, vol. 2022: American Medical Informatics Association, p. 972.

校內：2025-08-31公開
校外：2025-08-31公開

簡易檢索 / 詳目顯示

相關論文