簡易檢索 / 詳目顯示

研究生: 楊明翰
Yang, Ming-Han
論文名稱: 生成式人工智慧為基之工作足跡監控模式與技術開發
Development of Models and Enabling Technologies for Generative AI–Based Work Footprint Monitoring
指導教授: 陳裕民
Chen, Yuh-Min
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 製造資訊與系統研究所
Institute of Manufacturing Information and Systems
論文出版年: 2026
畢業學年度: 114
語文別: 中文
論文頁數: 111
中文關鍵詞: 工作足跡監控代理人視覺語言模型查索增強生成知識圖譜
外文關鍵詞: Work Footprint Monitoring, Agent, Vision Language Model, Retrieval-Augmented Generation, Knowledge Graph
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著企業數位轉型的推進,營運管理模式已由經驗導向逐步轉為資料驅動。然而,現行管理制度多仍仰賴傳統工時指標,難以有效反映員工實際工作行為特性與潛在效率瓶頸。此外,主管在面對大量且複雜的工作足跡資料時,往往缺乏即時且具結構性的分析工具支援,使管理決策高度依賴個人經驗,不僅增加管理負荷,也提高決策不一致的風險。
    本研究旨在建構一套以生成式人工智慧(Generative AI, GAI)為核心的工作足跡監控模式與技術架構,研究重點聚焦於 PDCA 管理循環中的 Check(檢查)階段。透過蒐集製造執行系統(Manufacturing Execution System, MES)之行為資料,定義並偵測遲到、早退、休息次數異常與休息時間過長等多類工作異常行為。
    本研究採用中心化多代理人(Multi-Agent)架構進行實作,涵蓋路由、教學、助理與決策支援四類代理人,分別負責任務分派、操作引導、資訊摘要與決策輔助;在技術層面,整合檢索增強生成(RAG)之知識圖譜,將隱性管理經驗轉化為結構化知識鏈,以支援異常原因推論與輔導建議生成,並進一步微調視覺語言模型(VLM)Qwen2.5-VL 以輔助判讀趨勢圖中的「突刺」、「急轉」等關鍵特徵,同時結合思維鏈(Chain of Thought, CoT)機制,提升分析過程的透明度與可解釋性。
    本研究透過實作工作足跡監控平台與智慧代理人,展示異常趨勢分析、工作足跡績效評比,以及輔導成效指標的計算方式,協助主管進行系統化輔導。

    As enterprises undergo digital transformation, management practices are shifting from experience-based approaches to data-driven decision making. However, traditional working-hour indicators remain insufficient to reflect actual employee behaviors and efficiency issues, and managers often lack structured tools to analyze complex work footprint data.
    This study proposes a Generative AI–based Work Footprint Monitoring framework focusing on the Check phase of the PDCA cycle. Behavioral data from Manufacturing Execution Systems are analyzed to detect various work anomalies. A centralized Agent architecture is adopted to support task coordination, information summarization, and decision assistance. The framework integrates Retrieval-Augmented Generation with a Knowledge Graph to transform implicit managerial experience into structured knowledge for anomaly reasoning and coaching recommendations. In addition, a Vision Language Model is applied to assist in trend chart interpretation, enhancing analytical transparency and interpretability. The implemented system demonstrates effective anomaly analysis and performance evaluation, supporting systematic and data-driven managerial coaching.

    摘要 i 致謝 vi 目錄 vii 表目錄 xii 圖目錄 xiii 第 1 章 緒論 1 1.1 研究背景 1 1.2 研究動機 3 1.3 研究目的 3 1.4 研究項目與方法 4 1.5 研究步驟 7 第 2 章 文獻探討 8 2.1 領域文獻探討 8 2.1.1 工作足跡監控相關研究與應用 8 2.1.2 異常偵測與分析 9 2.2生成式人工智慧與大型語言模型技術 9 2.2.1 大型語言模型 9 2.2.2 視覺語言模型 10 2.2.3 上下文學習 11 2.2.4 提示工程 12 2.2.5 模型微調技術 13 2.3 檢索增強生成技術 14 2.3.1 檢索增強生成技術架構 14 2.3.2 知識圖譜 16 2.4 AI代理人 17 2.4.1 AI 代理人之定義 17 2.4.2多代理人 18 2.4.3代理人應用範例 20 2.5 文獻探討總結 21 第 3 章 工作足跡監控模式設計 22 3.1人員足跡監控模式 22 3.2工作足跡異常偵測與分析機制開發 25 3.2.1 工作足跡收集與異常偵測機制 25 3.2.2 工作足跡異常分析與評比 28 3.2.3 足跡績效分析與評比 30 3.2.4 輔導成效分析 32 3.3工作足跡監控平台系統架構 34 第 4 章 代理人設計與技術開發 39 4.1 代理人設計 39 4.1.1 代理人架構 39 4.1.2 路由代理人 40 4.1.3 教學代理人 41 4.1.3 助理代理人 45 4.1.4 決策支援代理人 48 4.2 向量知識檢索庫建構 50 4.2.1 知識資料處理與建構流程 51 4.2.2 向量式知識檢索策略 52 4.3 知識圖譜建構 53 4.3.1 輔導知識圖譜建構 54 4.3.2 平台知識圖譜建構 65 4.3.3 圖表知識圖譜建構 68 4.4趨勢圖視覺語言模型實作 69 4.4.1 視覺語言模型選擇 70 4.4.2 提示工程 71 4.4.3 訓練資料生成流程 74 4.4.4 模型微調 78 第 5 章 應用案例 79 5.1 分析功能應用 79 5.1.1 應用案例展示:工作異常分析與評比 79 5.1.2 應用案例展示:足跡績效分析與評比 80 5.1.3 應用案例展示:輔導成效分析 81 5.2 代理人應用 83 5.2.1 應用案例展示:平台操作指引 83 5.2.2 應用案例展示:輔助決策支援 84 5.2.3 應用案例展示:圖表輔助判讀 85 第 6 章 結論、研究限制與未來展望 86 6.1 結論 86 6.2 研究限制 87 6.3 未來展望 87 參考文獻 89

    Davenport, T. H. (2006). Competing on analytics. Harvard business review, 84(1), 98.Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3), 1-58.
    Lu, Q., Zhu, L., Xu, X., Xing, Z., Harrer, S., & Whittle, J. (2024, June). Towards responsible generative ai: A reference architecture for designing foundation model based agents. In 2024 IEEE 21st International Conference on Software Architecture Companion (ICSA-C) (pp. 119-126). IEEE.
    Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., ... & McGrew, B. (2023). Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
    Sundarajan, A. (2025). Enhancing Workplace Productivity and Well-being Using AI Agent. arXiv preprint arXiv:2501.02368.
    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
    Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training.
    Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901.
    Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M. A., Lacroix, T., ... & Lample, G. (2023). Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
    TOUVRON, Hugo, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
    Zhang, R., Han, J., Liu, C., Gao, P., Zhou, A., Hu, X., ... & Qiao, Y. (2023). Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199.
    Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., ... & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35, 27730-27744.
    Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., ... & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35, 24824-24837.
    Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 33, 9459-9474.
    Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., Melo, G. D., Gutierrez, C., ... & Zimmermann, A. (2021). Knowledge graphs. ACM Computing Surveys (Csur), 54(4), 1-37.
    Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., ... & Zhou, D. (2022). Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
    LinkedIn Engineering. (2016, October 6). Building the LinkedIn knowledge graph. LinkedIn Engineering Blog. https://www.linkedin.com/blog/engineering/knowledge/building-the-linkedin-knowledge-graph
    He, X., Tian, Y., Sun, Y., Chawla, N., Laurent, T., LeCun, Y., ... & Hooi, B. (2024). G-retriever: Retrieval-augmented generation for textual graph understanding and question answering. Advances in Neural Information Processing Systems, 37, 132876-132907.
    Wen, Y., Wang, Z., & Sun, J. (2024, August). Mindmap: Knowledge graph prompting sparks graph of thoughts in large language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 10370-10388).
    Omrani, P., Hosseini, A., Hooshanfar, K., Ebrahimian, Z., Toosi, R., & Akhaee, M. A. (2024, April). Hybrid retrieval-augmented generation approach for LLMs query response enhancement. In 2024 10th International Conference on Web Research (ICWR) (pp. 22-26). IEEE.
    Xi, Z., Chen, W., Guo, X., He, W., Ding, Y., Hong, B., ... & Gui, T. (2025). The rise and potential of large language model based agents: A survey. Science China Information Sciences, 68(2), 121101.
    Guo, T., Chen, X., Wang, Y., Chang, R., Pei, S., Chawla, N. V., ... & Zhang, X. (2024). Large language model based multi-agents: A survey of progress and challenges. arXiv preprint arXiv:2402.01680.
    Li, J., Zhang, Q., Yu, Y., Fu, Q., & Ye, D. (2024). More agents is all you need. arXiv preprint arXiv:2402.05120.
    Sreedhar, K., & Chilton, L. (2024). Simulating human strategic behavior: Comparing single and multi-agent llms. arXiv preprint arXiv:2402.08189.
    Tran, K. T., Dao, D., Nguyen, M. D., Pham, Q. V., O'Sullivan, B., & Nguyen, H. D. (2025). Multi-agent collaboration mechanisms: A survey of llms. arXiv preprint arXiv:2501.06322.
    Wang, Z., Liu, Z., Zhang, Y., Zhong, A., Wang, J., Yin, F., ... & Wen, Q. (2024, October). Rcagent: Cloud root cause analysis by autonomous agents with tool-augmented large language models. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (pp. 4966-4974).
    Dan, Y., Lei, Z., Gu, Y., Li, Y., Yin, J., Lin, J., ... & Qiu, X. (2023). Educhat: A large-scale language model-based chatbot system for intelligent education. arXiv preprint arXiv:2308.02773.
    Zambare, P., Thanikella, V. N., Kottur, N. P., Akula, S. A., & Liu, Y. (2025). Netmoniai: An agentic ai framework for network security & monitoring. arXiv preprint arXiv:2508.10052.
    Tao, W., Leu, M. C., & Yin, Z. (2020). Multi-modal recognition of worker activity for human-centered intelligent manufacturing. Engineering Applications of Artificial Intelligence, 95, 103868.
    Al Jassmi, H., Al Ahmad, M., & Ahmed, S. (2021). Automatic recognition of labor activity: a machine learning approach to capture activity physiological patterns using wearable sensors. Construction Innovation, 21(4), 555-575.
    Aloini, D., Fronzetti Colladon, A., Gloor, P., Guerrazzi, E., & Stefanini, A. (2022). Enhancing operations management through smart sensors: measuring and improving well-being, interaction and performance of logistics workers. The TQM Journal, 34(2), 303–329. https://doi.org/10.1108/TQM-06-2021-0195
    Masry, A., Do, X. L., Tan, J. Q., Joty, S., & Hoque, E. (2022, May). Chartqa: A benchmark for question answering about charts with visual and logical reasoning. In Findings of the association for computational linguistics: ACL 2022 (pp. 2263-2279).
    Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., ... & Chen, W. (2022). Lora: Low-rank adaptation of large language models. ICLR, 1(2), 3.
    Ding, R., Han, S., Xu, Y., Zhang, H., & Zhang, D. (2019, June). Quickinsights: Quick and automatic discovery of insights from multi-dimensional data. In Proceedings of the 2019 international conference on management of data (pp. 317-332).

    QR CODE