簡易檢索 / 詳目顯示

研究生: 黃蔡傑
Huang, Tsai-Chieh
論文名稱: 集成式電腦視覺之人工作業異常偵測方法設計與技術開發
Design and Development of an Integrated Computer Vision Method for Human Operation Anomaly Detection
指導教授: 陳裕民
Chen, Yuh-Min
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 86
中文關鍵詞: 異常偵測人臉辨識手勢辨識瑕疵檢測遷移學習
外文關鍵詞: anomaly detection, facial recognition, hand gesture recognition, product defect recognition, transfer learning
相關次數: 點閱:21下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 傳統中小製造業在進行人工作業績效評估時,常面臨紀錄主觀、資訊延遲與難以量化各人員工作表現等問題,限制異常識別與決策管理的精確性。為解決此困境,本研究提出一套集成式電腦視覺異常偵測方法,整合人臉辨識、手部動作分析與產出品瑕疵檢測等模組,建構可應用於製造現場的智慧化人工作業監控方法。透過視覺資訊自動轉化為多維度結構化數據,以支援即時監控與績效分析。
    本方法包含三個核心模塊:(1) 人臉辨識,用於識別人員身分與追蹤工時,並採用多模板特徵與自適應閾值策略,以提升遮擋與姿態變化下的穩定性;(2) 手部動作辨識,結合時間卷積網路(TCN)、SE Block與多頭自注意力機制,強化對長時序依賴與細微異常的辨識能力;(3) 瑕疵辨識,採用遷移學習並透過Vision Transformer(ViT)搭配Focal Loss,解決樣本稀少與類別不平衡問題,提升在小樣本場景下的效能。
    本研究以公開資料集與模擬情境進行測試,並分析遮擋、角度偏移與背景干擾等挑戰對模組穩定性與辨識準確率的影響。系統最終可產出包含工時、動作正確率、產量與良率等指標的結構化績效數據,支援更客觀與即時的作業管理。
    本研究驗證所提出架構具備良好可行性與實用潛力,提供中小製造業一套低成本、高效能的智慧化轉型路徑,有助於提升現場管理效率與異常應對能力,具重要的實務與研究價值。

    In traditional small and medium-sized enterprises (SMEs), the evaluation of manual labor performance is often hindered by subjective record-keeping, information latency, and non-quantifiable data, which compromises the precision of anomaly detection and management decisions. To overcome these limitations, this study presents an integrated, vision-based anomaly detection method designed to build an intelligent monitoring system for on-site manufacturing operations. By automating the conversion of visual information into multi-dimensional structured data, this system facilitates real-time monitoring and data-driven performance analysis.
    The framework consists of three core modules: (1) Face Recognition for personnel identification and work-hour tracking, which employs a multi-template feature strategy and adaptive thresholds to improve robustness against occlusion and pose variations;(2) Hand Gesture Recognition that combines a Temporal Convolutional Network (TCN), Squeeze-and-Excitation (SE) Block, and a multi-head self-attention mechanism to enhance its capability in capturing long-term temporal dependencies and subtle operational anomalies; and (3) Defect Detection that applies transfer learning with a Vision Transformer (ViT) and Focal Loss to effectively address the challenges of data scarcity and class imbalance, boosting performance in few-shot learning scenarios.
    Validation was conducted using public datasets and simulated environments, with analyses performed on the impact of real-world challenges—such as occlusion, angular deviations, and background interference—on the stability and accuracy of each module. The system's final output is a set of structured performance metrics, including work hours, motion correctness rates, output volume, and product yield, enabling more objective and timely operational management.
    This research confirms the feasibility and practical potential of the proposed architecture, offering SMEs a low-cost, high-performance pathway toward intelligent transformation. The system holds significant practical and research value by enhancing operational efficiency and the capacity for rapid anomaly response.

    摘要 iii 致謝 viii 表目錄 xii 圖目錄 xiii 第一章、緒論 1 1.1 研究背景 1 1.2 研究動機 2 1.3 研究目的 3 1.4 研究問題 4 1.5 研究步驟 5 第二章、文獻探討 7 2.1 領域文獻探討 7 2.1.1 工業人工智慧 7 2.1.2 電腦視覺 8 2.1.3 異常偵測 9 2.2 相關技術探討 10 2.2.1 特徵工程 10 2.2.2 視覺遮擋處理 12 2.2.3 遷移學習 13 2.3 相關研究探討 15 2.3.1 人臉辨識 15 2.3.2 連續手勢辨識 16 2.3.3 物品瑕疵檢測 18 2.4 文獻探討總結 19 第三章、方法設計與技術開發 21 3.1 結構化數據架構設計 21 3.2 人工作業異常偵測方法設計 22 3.3 人臉辨識方法設計 22 3.4 手部動作異常辨識 25 3.4.1資料蒐集 25 3.4.2 手部動作辨識模型架構 27 3.4.3 手部動作異常辨識方法 30 3.5 產出品異常辨識 34 3.5.1資料蒐集 34 3.5.2產出品瑕疵辨識模型架構 35 3.5.3產出品異常辨識方法 38 第四章、技術正確性驗證與應用範例 42 4.1人臉辨識方法評估 42 4.1.1 驗證指標 42 4.1.2 資料來源 44 4.1.3 實驗流程 45 4.1.4 實驗結果 45 4.2 手部動作異常辨識模型評估 47 4.2.1 驗證指標 47 4.2.2 實驗環境 49 4.2.3 實驗流程 49 4.2.4 實驗結果 52 4.3 產出品瑕疵辨識模型評估 56 4.3.1 評估指標 56 4.3.2 實驗環境 57 4.3.3 實驗流程 57 4.3.4 實驗結果 60 4.4 應用範例 62 4.4.1 資料說明 62 4.4.2 範例結果與分析 63 第五章、結論、研究限制與未來展望 64 5.1 結論 64 5.2 研究限制 65 5.3 未來展望 66 參考文獻 68

    Aggarwal, J. K., & Ryoo, M. S. (2011). Human activity analysis: A review. Acm Computing Surveys (Csur), 43(3), 1-43.
    Ahonen, T., Hadid, A., & Pietikainen, M. (2006). Face description with local binary patterns: Application to face recognition. IEEE Transactions on pattern analysis and machine intelligence, 28(12), 2037-2041.
    Bai, S., Kolter, J. Z., & Koltun, V. (2018). An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271.
    Barnett, V., & Lewis, T. (1994). Outliers in statistical data (Vol. 3). Wiley New York.
    Belhumeur, P. N., Hespanha, J. P., & Kriegman, D. J. (1997). Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Transactions on pattern analysis and machine intelligence, 19(7), 711-720.
    Benitez-Garcia, G. O.-M., Jesus; Sanchez-Perez, Gabriel; Yanai, Keiji. (2021, Jan 10–15, 2021). IPN Hand: A Video Dataset and Benchmark for Real-Time Continuous Hand Gesture Recognition 25th International Conference on Pattern Recognition (ICPR 2020), Milan, Italy.
    Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition? a new model and the kinetics dataset. proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
    Carvalho, T. P., Soares, F. A., Vita, R., Francisco, R. d. P., Basto, J. P., & Alcalá, S. G. (2019). A systematic literature review of machine learning methods applied to predictive maintenance. Computers & Industrial Engineering, 137, 106024.
    Chalapathy, R., & Chawla, S. (2019). Deep learning for anomaly detection: A survey. arXiv preprint arXiv:1901.03407.
    Chandel, H., & Vatta, S. (2015). Occlusion detection and handling: a review. International Journal of Computer Applications, 120(10).
    Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. Acm Computing Surveys (Csur), 41(3), 1-58.
    Deng, J., Guo, J., An, X., Zhu, Z., & Zafeiriou, S. (2021). Masked face recognition challenge: The insightface track report. Proceedings of the IEEE/CVF International Conference on Computer Vision,
    Deng, J., Guo, J., Xue, N., & Zafeiriou, S. (2019). Arcface: Additive angular margin loss for deep face recognition. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,
    Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., & Darrell, T. (2015). Long-term recurrent convolutional networks for visual recognition and description. Proceedings of the IEEE conference on computer vision and pattern recognition,
    Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., & Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
    Doždor, Z., Kalafatić, Z., Ban, Ž., & Hrkać, T. (2023). TY-Net: Transforming YOLO for hand gesture recognition. IEEE access, 11, 140382-140394.
    Felzenszwalb, P., McAllester, D., & Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. 2008 IEEE conference on computer vision and pattern recognition,
    Frank, A. G., Dalenogare, L. S., & Ayala, N. F. (2019). Industry 4.0 technologies: Implementation patterns in manufacturing companies. International journal of production economics, 210, 15-26.
    Gammulle, H., Denman, S., Sridharan, S., & Fookes, C. (2021). TMMF: Temporal multi-modal fusion for single-stage continuous gesture recognition. IEEE Transactions on Image Processing, 30, 7689-7701.
    Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27.
    He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition,
    Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.
    Kusiak, A. (2017). Smart manufacturing must embrace big data. nature, 544(7648), 23-25.
    Lea, C., Flynn, M. D., Vidal, R., Reiter, A., & Hager, G. D. (2017). Temporal convolutional networks for action segmentation and detection. proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
    LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436-444.
    Lee, J., Davari, H., Singh, J., & Pandhare, V. (2018). Industrial Artificial Intelligence for industry 4.0-based manufacturing systems. Manufacturing letters, 18, 20-23.
    Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., & Song, L. (2017). Sphereface: Deep hypersphere embedding for face recognition. Proceedings of the IEEE conference on computer vision and pattern recognition,
    Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60, 91-110.
    Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., Zhang, F., Chang, C.-L., Yong, M. G., & Lee, J. (2019). Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172.
    Malhotra, P., Vig, L., Shroff, G., & Agarwal, P. (2015). Long short term memory networks for anomaly detection in time series. Proceedings,
    Mittal, S., Khan, M. A., Romero, D., & Wuest, T. (2019). Smart manufacturing: Characteristics, technologies and enabling factors. Proceedings of the Institution of Mechanical Engineers, Part B: Journal of Engineering Manufacture, 233(5), 1342-1361.
    Müller, J. M., Buliga, O., & Voigt, K.-I. (2018). Fortune favors the prepared: How SMEs approach business model innovations in Industry 4.0. Technological forecasting and social change, 132, 2-17.
    Niu, S., Liu, Y., Wang, J., & Song, H. (2021). A decade survey of transfer learning (2010–2020). IEEE Transactions on Artificial Intelligence, 1(2), 151-166.
    OECD. (2021). The Digital Transformation of SMEs.
    Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18,
    Sakurada, M., & Yairi, T. (2014). Anomaly detection using autoencoders with nonlinear dimensionality reduction. Proceedings of the MLSDA 2014 2nd workshop on machine learning for sensory data analysis,
    Schlegl, T., Seeböck, P., Waldstein, S. M., Langs, G., & Schmidt-Erfurth, U. (2019). f-AnoGAN: Fast unsupervised anomaly detection with generative adversarial networks. Medical image analysis, 54, 30-44.
    Sharif Razavian, A., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: an astounding baseline for recognition. Proceedings of the IEEE conference on computer vision and pattern recognition workshops,
    Shin, H.-C., Roth, H. R., Gao, M., Lu, L., Xu, Z., Nogues, I., Yao, J., Mollura, D., & Summers, R. M. (2016). Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE transactions on medical imaging, 35(5), 1285-1298.
    Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
    Sun, C., Myers, A., Vondrick, C., Murphy, K., & Schmid, C. (2019). Videobert: A joint model for video and language representation learning. Proceedings of the IEEE/CVF international conference on computer vision,
    Sun, Y., Wang, X., & Tang, X. (2014). Deep learning face representation from predicting 10,000 classes. Proceedings of the IEEE conference on computer vision and pattern recognition,
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. Proceedings of the IEEE conference on computer vision and pattern recognition,
    Szeliski, R. (2022). Computer vision: algorithms and applications. Springer Nature.
    Taigman, Y., Yang, M., Ranzato, M. A., & Wolf, L. (2014). Deepface: Closing the gap to human-level performance in face verification. Proceedings of the IEEE conference on computer vision and pattern recognition,
    Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., & Liu, C. (2018). A survey on deep transfer learning. Artificial Neural Networks and Machine Learning–ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings, Part III 27,
    Tao, F., Qi, Q., Liu, A., & Kusiak, A. (2018). Data-driven smart manufacturing. Journal of manufacturing systems, 48, 157-169.
    Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. (2015). Learning spatiotemporal features with 3d convolutional networks. Proceedings of the IEEE international conference on computer vision,
    Turk, M., & Pentland, A. (1991). Eigenfaces for recognition. Journal of cognitive neuroscience, 3(1), 71-86.
    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
    Wang, H., Wang, Y., Zhou, Z., Ji, X., Gong, D., Zhou, J., Li, Z., & Liu, W. (2018). Cosface: Large margin cosine loss for deep face recognition. Proceedings of the IEEE conference on computer vision and pattern recognition,
    Weimer, D., Scholz-Reiter, B., & Shpitalni, M. (2016). Design of deep convolutional neural network architectures for automated feature extraction in industrial inspection. CIRP annals, 65(1), 417-420.
    Xie, X. (2008). A review of recent advances in surface defect detection using texture analysis techniques. ELCVIA: electronic letters on computer vision and image analysis, 1-22.
    Yan, S., Xiong, Y., & Lin, D. (2018). Spatial temporal graph convolutional networks for skeleton-based action recognition. Proceedings of the AAAI conference on artificial intelligence,
    Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks? Advances in neural information processing systems, 27.
    Yue-Hei Ng, J., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., & Toderici, G. (2015). Beyond short snippets: Deep networks for video classification. Proceedings of the IEEE conference on computer vision and pattern recognition,
    Zhao, Z.-Q., Zheng, P., Xu, S.-t., & Wu, X. (2019). Object detection with deep learning: A review. IEEE transactions on neural networks and learning systems, 30(11), 3212-3232.
    Zhong, R. Y., Xu, X., Klotz, E., & Newman, S. T. (2017). Intelligent manufacturing in the context of industry 4.0: a review. Engineering, 3(5), 616-630.
    Zhong, Z., Zheng, L., Kang, G., Li, S., & Yang, Y. (2020). Random erasing data augmentation. Proceedings of the AAAI conference on artificial intelligence,
    Zonta, T., Da Costa, C. A., da Rosa Righi, R., de Lima, M. J., Da Trindade, E. S., & Li, G. P. (2020). Predictive maintenance in the Industry 4.0: A systematic literature review. Computers & Industrial Engineering, 150, 106889.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE