| 研究生: |
楊明翰 Yang, Ming-Han |
|---|---|
| 論文名稱: |
生成式人工智慧為基之工作足跡監控模式與技術開發 Development of Models and Enabling Technologies for Generative AI–Based Work Footprint Monitoring |
| 指導教授: |
陳裕民
Chen, Yuh-Min |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 製造資訊與系統研究所 Institute of Manufacturing Information and Systems |
| 論文出版年: | 2026 |
| 畢業學年度: | 114 |
| 語文別: | 中文 |
| 論文頁數: | 111 |
| 中文關鍵詞: | 工作足跡監控 、代理人 、視覺語言模型 、查索增強生成 、知識圖譜 |
| 外文關鍵詞: | Work Footprint Monitoring, Agent, Vision Language Model, Retrieval-Augmented Generation, Knowledge Graph |
| 相關次數: | 點閱:3 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著企業數位轉型的推進,營運管理模式已由經驗導向逐步轉為資料驅動。然而,現行管理制度多仍仰賴傳統工時指標,難以有效反映員工實際工作行為特性與潛在效率瓶頸。此外,主管在面對大量且複雜的工作足跡資料時,往往缺乏即時且具結構性的分析工具支援,使管理決策高度依賴個人經驗,不僅增加管理負荷,也提高決策不一致的風險。
本研究旨在建構一套以生成式人工智慧(Generative AI, GAI)為核心的工作足跡監控模式與技術架構,研究重點聚焦於 PDCA 管理循環中的 Check(檢查)階段。透過蒐集製造執行系統(Manufacturing Execution System, MES)之行為資料,定義並偵測遲到、早退、休息次數異常與休息時間過長等多類工作異常行為。
本研究採用中心化多代理人(Multi-Agent)架構進行實作,涵蓋路由、教學、助理與決策支援四類代理人,分別負責任務分派、操作引導、資訊摘要與決策輔助;在技術層面,整合檢索增強生成(RAG)之知識圖譜,將隱性管理經驗轉化為結構化知識鏈,以支援異常原因推論與輔導建議生成,並進一步微調視覺語言模型(VLM)Qwen2.5-VL 以輔助判讀趨勢圖中的「突刺」、「急轉」等關鍵特徵,同時結合思維鏈(Chain of Thought, CoT)機制,提升分析過程的透明度與可解釋性。
本研究透過實作工作足跡監控平台與智慧代理人,展示異常趨勢分析、工作足跡績效評比,以及輔導成效指標的計算方式,協助主管進行系統化輔導。
As enterprises undergo digital transformation, management practices are shifting from experience-based approaches to data-driven decision making. However, traditional working-hour indicators remain insufficient to reflect actual employee behaviors and efficiency issues, and managers often lack structured tools to analyze complex work footprint data.
This study proposes a Generative AI–based Work Footprint Monitoring framework focusing on the Check phase of the PDCA cycle. Behavioral data from Manufacturing Execution Systems are analyzed to detect various work anomalies. A centralized Agent architecture is adopted to support task coordination, information summarization, and decision assistance. The framework integrates Retrieval-Augmented Generation with a Knowledge Graph to transform implicit managerial experience into structured knowledge for anomaly reasoning and coaching recommendations. In addition, a Vision Language Model is applied to assist in trend chart interpretation, enhancing analytical transparency and interpretability. The implemented system demonstrates effective anomaly analysis and performance evaluation, supporting systematic and data-driven managerial coaching.
Davenport, T. H. (2006). Competing on analytics. Harvard business review, 84(1), 98.Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3), 1-58.
Lu, Q., Zhu, L., Xu, X., Xing, Z., Harrer, S., & Whittle, J. (2024, June). Towards responsible generative ai: A reference architecture for designing foundation model based agents. In 2024 IEEE 21st International Conference on Software Architecture Companion (ICSA-C) (pp. 119-126). IEEE.
Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., ... & McGrew, B. (2023). Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
Sundarajan, A. (2025). Enhancing Workplace Productivity and Well-being Using AI Agent. arXiv preprint arXiv:2501.02368.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901.
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M. A., Lacroix, T., ... & Lample, G. (2023). Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
TOUVRON, Hugo, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
Zhang, R., Han, J., Liu, C., Gao, P., Zhou, A., Hu, X., ... & Qiao, Y. (2023). Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199.
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., ... & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35, 27730-27744.
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., ... & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35, 24824-24837.
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 33, 9459-9474.
Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., Melo, G. D., Gutierrez, C., ... & Zimmermann, A. (2021). Knowledge graphs. ACM Computing Surveys (Csur), 54(4), 1-37.
Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., ... & Zhou, D. (2022). Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
LinkedIn Engineering. (2016, October 6). Building the LinkedIn knowledge graph. LinkedIn Engineering Blog. https://www.linkedin.com/blog/engineering/knowledge/building-the-linkedin-knowledge-graph
He, X., Tian, Y., Sun, Y., Chawla, N., Laurent, T., LeCun, Y., ... & Hooi, B. (2024). G-retriever: Retrieval-augmented generation for textual graph understanding and question answering. Advances in Neural Information Processing Systems, 37, 132876-132907.
Wen, Y., Wang, Z., & Sun, J. (2024, August). Mindmap: Knowledge graph prompting sparks graph of thoughts in large language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 10370-10388).
Omrani, P., Hosseini, A., Hooshanfar, K., Ebrahimian, Z., Toosi, R., & Akhaee, M. A. (2024, April). Hybrid retrieval-augmented generation approach for LLMs query response enhancement. In 2024 10th International Conference on Web Research (ICWR) (pp. 22-26). IEEE.
Xi, Z., Chen, W., Guo, X., He, W., Ding, Y., Hong, B., ... & Gui, T. (2025). The rise and potential of large language model based agents: A survey. Science China Information Sciences, 68(2), 121101.
Guo, T., Chen, X., Wang, Y., Chang, R., Pei, S., Chawla, N. V., ... & Zhang, X. (2024). Large language model based multi-agents: A survey of progress and challenges. arXiv preprint arXiv:2402.01680.
Li, J., Zhang, Q., Yu, Y., Fu, Q., & Ye, D. (2024). More agents is all you need. arXiv preprint arXiv:2402.05120.
Sreedhar, K., & Chilton, L. (2024). Simulating human strategic behavior: Comparing single and multi-agent llms. arXiv preprint arXiv:2402.08189.
Tran, K. T., Dao, D., Nguyen, M. D., Pham, Q. V., O'Sullivan, B., & Nguyen, H. D. (2025). Multi-agent collaboration mechanisms: A survey of llms. arXiv preprint arXiv:2501.06322.
Wang, Z., Liu, Z., Zhang, Y., Zhong, A., Wang, J., Yin, F., ... & Wen, Q. (2024, October). Rcagent: Cloud root cause analysis by autonomous agents with tool-augmented large language models. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (pp. 4966-4974).
Dan, Y., Lei, Z., Gu, Y., Li, Y., Yin, J., Lin, J., ... & Qiu, X. (2023). Educhat: A large-scale language model-based chatbot system for intelligent education. arXiv preprint arXiv:2308.02773.
Zambare, P., Thanikella, V. N., Kottur, N. P., Akula, S. A., & Liu, Y. (2025). Netmoniai: An agentic ai framework for network security & monitoring. arXiv preprint arXiv:2508.10052.
Tao, W., Leu, M. C., & Yin, Z. (2020). Multi-modal recognition of worker activity for human-centered intelligent manufacturing. Engineering Applications of Artificial Intelligence, 95, 103868.
Al Jassmi, H., Al Ahmad, M., & Ahmed, S. (2021). Automatic recognition of labor activity: a machine learning approach to capture activity physiological patterns using wearable sensors. Construction Innovation, 21(4), 555-575.
Aloini, D., Fronzetti Colladon, A., Gloor, P., Guerrazzi, E., & Stefanini, A. (2022). Enhancing operations management through smart sensors: measuring and improving well-being, interaction and performance of logistics workers. The TQM Journal, 34(2), 303–329. https://doi.org/10.1108/TQM-06-2021-0195
Masry, A., Do, X. L., Tan, J. Q., Joty, S., & Hoque, E. (2022, May). Chartqa: A benchmark for question answering about charts with visual and logical reasoning. In Findings of the association for computational linguistics: ACL 2022 (pp. 2263-2279).
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., ... & Chen, W. (2022). Lora: Low-rank adaptation of large language models. ICLR, 1(2), 3.
Ding, R., Han, S., Xu, Y., Zhang, H., & Zhang, D. (2019, June). Quickinsights: Quick and automatic discovery of insights from multi-dimensional data. In Proceedings of the 2019 international conference on management of data (pp. 317-332).