| 研究生: |
阮氏璇珍 Nguyen Thi Huyen, Tran |
|---|---|
| 論文名稱: |
基於多模態代理的 RAG 智慧客製化製造系統維護助手 Multimodal Agentic RAG-based Assistant for Intelligent Customized Support in Manufacturing Systems Maintenance |
| 指導教授: |
謝昱銘
Hsieh, Yu-Ming |
| 共同指導: |
鄭芳田
Cheng, Fan-Tien |
| 學位類別: |
碩士 Master |
| 系所名稱: |
智慧半導體及永續製造學院 - 半導體封測學位學程 Program on Semiconductor Packaging and Testing |
| 論文出版年: | 2026 |
| 畢業學年度: | 114 |
| 語文別: | 英文 |
| 論文頁數: | 34 |
| 外文關鍵詞: | Multimodal Retrieval-Augmented Generation, Agentic Orchestration, Knowledge Graph, Industrial AI, Virtual Maintenance Assistant, Semiconductor Manufacturing, xPPU, Visual Embedding, ColPali, Large Language Models |
| 相關次數: | 點閱:13 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Modern manufacturing has created real pressure to build intelligent virtual assistants that support operators throughout the maintenance lifecycle. Large language models show promise for general reasoning, but fall short in specialized industrial settings without system-specific knowledge. Retrieval-Augmented Generation (RAG) helps bridge that gap — but standard RAG hits a wall with multimodal content like images, diagrams, and sensor data, which are common in technical manufacturing documentation.
This thesis introduces the Multimodal Agentic RAG-based (MMAR) framework, built for intelligent maintenance assistance in manufacturing. MMAR is organized around three core ideas: Modular Subsystem Design, Agentic Orchestration for Adaptive Retrieval, and Validation-Driven Refinement. Validation used the xPPU system as a testbed, with domain experts from TU Munich providing ground-truth annotations. MMAR was benchmarked against V-RAG (text-only) and GraphRAG across three query complexity levels, outperforming both on Response Relevancy, F1 Score, and Recall. These results support MMAR as a deployment-ready platform for next-generation industrial assistants.
[1] C.-Y. Lin, T.-H. Tsai, and T.-L. Tseng, “Generative AI for Intelligent Manufacturing Virtual Assistants in the Semiconductor Industry,” IEEE Robot. Autom. Lett., vol. 10, no. 3, pp. 3132–3139, Apr. 2025.
[2] R. Wouhaybi, A. Gersho, M. Shoaib, B. Orandi, and J. C. Walrand, “Guest Editorial: An End-to-End Machine Learning Perspective on Industrial IoT,” IEEE Internet Things Mag., vol. 5, no. 1, pp. 22–23, Mar. 2022.
[3] Y.-G. Kim and T.-H. Park, “Anomaly Detection Using Autoencoder with Feature Vector Frequency Map,” IEEE Access, vol. 9, pp. 73808–73817, 2021.
[4] P. V. Amadori, T. Fischer, R. Wang, and Y. Demiris, “Predicting Secondary Task Performance: A Directly Actionable Metric for Cognitive Overload Detection,” IEEE Trans. Cogn. Dev. Syst., vol. 13, no. 3, pp. 1373–1385, Dec. 2022.
[5] Y. Tong et al., “Multi-Attribute Auction-Based Resource Allocation for Twins Migration in Vehicular Metaverses: A GPT-Based DRL Approach,” IEEE Trans. Cogn. Commun. Netw., vol. 11, no. 1, pp. 638–653, Feb. 2025.
[6] K. Hamad and M. Kaya, “A detailed analysis of optical character recognition technology,” Int. J. Appl. Math. Electron. Comput., Special Issue-1, pp. 233–239, 2016.
[7] K. T. Petricek et al., “AI assistants: A framework for semi-automated data wrangling,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 9, pp. 9295–9306, Sep. 2023.
[8] R. Pereira, C. Lima, T. Pinto, and A. Reis, “Virtual Assistants in Industry 3.0: A Systematic Literature Review,” Electronics, vol. 12, no. 19, 3096, 2023.
[9] C. Li et al., “Bringing a Natural Language-enabled Virtual Assistant to Industrial Mobile Robots for Learning, Training and Assistance of Manufacturing Tasks,” IEEE Conf. Publ., 2022.
[10] N. Mehdiyev, L. Mayer, J. Lahann, and P. Fettke, “Deep learning-based clustering of processes and their visual exploration,” Expert Syst., vol. 31, no. 2, e13012, Feb. 2023.
[11] Z. Tan et al., “Human–Machine Interaction in Intelligent and Connected Vehicles: A Review of Status Quo, Issues, and Opportunities,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 9, pp. 13954–13975, Sept. 2022.
[12] M. J. Callaghan, J. Harkin, T. M. McGinnity and L. P. Maguire, “Intelligent User Support in Autonomous Remote Experimentation Environments,” IEEE Trans. Ind. Electron., vol. 55, no. 6, pp. 2355–2367, June 2008.
[13] R. A. C. Diaz et al., “Context Aware Control Systems: An Engineering Applications Perspective,” IEEE Access, vol. 8, pp. 215550–215569, 2020.
[14] M. Zhang et al., “ADAGENT: Anomaly Detection Agent with Multimodal Large Models in Adverse Environments,” IEEE Access, vol. 12, pp. 172061–172074, 2024.
[15] K. Rehrl, S. Bruntsch and H.-J. Mentz, “Assisting Multimodal Travelers: Design and Prototypical Implementation of a Personal Travel Companion,” IEEE Trans. Intell. Transp. Syst., vol. 8, no. 1, pp. 31–42, March 2007.
[16] M. Perakakis and A. Potamianos, “A Study in Efficiency and Modality Usage in Multimodal Form Filling Systems,” IEEE Trans. Audio Speech Lang. Process., vol. 16, no. 6, pp. 1194–1206, Aug. 2008.
[17] D.-S. Kim and K.-S. Hong, “Multimodal Biometric Authentication Using Teeth Image and Voice in Mobile Environment,” IEEE Trans. Consum. Electron., vol. 54, no. 4, pp. 1790–1797, Nov. 2008.
[18] G. De Rossi et al., “A First Evaluation of a Multi-Modal Learning System to Control Surgical Assistant Robots via Action Segmentation,” IEEE Trans. Med. Robot. Bionics, vol. 3, no. 3, pp. 714–724, Aug. 2021.
[19] L.-B. Hernandez-Salinas et al., “IDAS: Intelligent Driving Assistance System Using RAG,” IEEE Open J. Veh. Technol., vol. 5, pp. 1139–1165, 2024.
[20] S. Ge et al., “An Innovative Solution to Design Problems: Applying the Chain-of-Thought Technique to Integrate LLM-Based Agents with Concept Generation Methods,” IEEE Access, vol. 13, pp. 10499–10512, 2025.
[21] R. Zhang et al., “Generative AI Agents With Large Language Model for Satellite Networks via a Mixture of Experts Transmission,” IEEE J. Sel. Areas Commun., vol. 42, no. 12, pp. 3581–3596, Dec. 2024.
[22] J. Höfgen et al., “Deploying Vision Retrieval Augmented Generation as Assistant for xPPU Maintenance,” in Proc. 2025 IEEE 21st Int. Conf. Autom. Sci. Eng. (CASE), 2025, pp. 1950–1955.
[23]Y. Zhao et al., “Multimodal Retrieval-Augmented Generation: A Survey,” arXiv preprint arXiv:2504.08748, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2504.08748
[24] S. Yu, C. Tang, B. Xu, J. Cui, J. Ran, Y. Yan, Z. Liu, S. Wang, X. Han, Z. Liu, and M. Sun, “VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2025. [Online]. Available: https://arxiv.org/abs/2410.10594
[25] O. Oliveira and D. Grossmann, “Document GraphRAG: Knowledge Graph Enhanced Retrieval Augmented Generation for Document Question Answering Within the Manufacturing Domain,” Electronics, vol. 14, no. 11, Art. no. 2102, 2025, doi: 10.3390/electronics14112102.