簡易檢索 / 詳目顯示

研究生: 楊詠傑
Yang, Yung-Jie
論文名稱: 利用Mask R-CNN及MiDaS進行焊道辨識與高度估測
Using Mask R-CNN and MiDaS for Weld Bead Identification and Height Estimation
指導教授: 鍾俊輝
Chung, Chun-Hui
學位類別: 碩士
Master
系所名稱: 工學院 - 機械工程學系
Department of Mechanical Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 102
中文關鍵詞: 機器學習機器視覺單目深度估計Mask R-CNN焊道量測
外文關鍵詞: Monocular Vision, Mask R-CNN, Weld Beads Measurement, Machine Vision, Machine Learning
相關次數: 點閱:46下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 機械手臂焊道研磨一直被認為是最複雜的加工製程之一,這項製程涉及加工路徑的規劃、加工精度控制以及研磨工件品質的管理等等,而這些方面均充滿著各種變數與挑戰,特別是在加工路徑的規劃上,由於不同的焊接工藝與焊接技師的技巧不同,往往會導致實際焊道的幾何形狀與預期設計有所偏差。為了降低在加工路徑規劃中的變數影響,通常在進行實際研磨時都會先利用雷射掃描機、深度相機等具備深度資訊的設備對焊道進行檢測,用以獲得實際焊道研磨所需的座標點位座標。目前工業界上使用 3D 相機及雷射掃描機等技術皆已相對成熟,但是其中還是有可以改進的地方。如單一台設備單價高,如要應用多個產線研磨製程中,可能會有成本上的考量。其次在焊道高度量測上,不僅需要考慮量測精度,還需考慮設備的成本問題。就目標導向而言,在焊道研磨的減法製程中,並不需要非常精準的量測結果即可滿足業界需求。因此本研究旨在建立一套單目焊道量測系統,開創一個全新的量測方式,其目的在建立一套具備足夠精度且成本較低的量測系統。系統旨在利用一張 RGB 影像,並且整合 Mask R-CNN 及單目深度估計模型,就能夠獲得候選框內每個像素下的深度值,進而對焊道進行重建。最後再利用點雲計算與濾波對重建之焊道進行處理,即可以對焊道高度進行估測。結果表明,Mask R-CNN 在像素分割上上優於以往焊道偵測的研究,並在平均精度(mean Average Precision,mAP)的表現上達到了 1.0。在單目量測系統中也證明了將特定數據集資料訓練於預訓練模型上,可以有效提高單目量測對於特定資料的結果。在整體系統焊道高度結果驗證,在十個不同的焊道上其平均誤差為 0.5 mm。

    The robotic grinding process of weld beads is critical and intricate in the industry as a finishing process. Precision in grinding position is essential for high-quality outcomes, making weld bead detection and measurement crucial in automated systems. Traditional 3D measurement methods are costly and time-consuming, whereas human eyes can easily identify weld bead geometry for grinding. This study proposes an innovative approach using monocular vision with an Neural Network algorithm to estimate the weld beads height. We developed monocular depth estimation with Mask Region-based Convolutional Neural Network (Mask R-CNN) for accurate detection and measurement of weld beads. Unlike traditional methods that rely on complex sensor systems or expensive equipment, our method uses a single-camera setup, simplifying hardware requirements and the operation process. The monocular depth estimation technique provides a cost-effective means of capturing depth information from a single viewpoint. Integrated with Mask R-CNN, our system excels in precise localization and classification of weld beads, allowing the robotic arm to accurately adjust its grinding paths based on the exact dimensions and shapes of the beads. This study presents a novel method for enhancing the robotic arm's grinding process, marking a significant advancement in robotic arm grinding.

    摘要 i SUMMARY ii 致謝 xiii 目錄 xiv 表目錄 xvii 圖目錄 xviii 第一章 緒論 1 1.1 研究背景 1 1.2 文獻回顧 2 1.2.1 人工智慧與電腦視覺應用於機械手臂焊道焊接與研磨 2 1.2.2 實例分割神經網路 5 1.2.3 單目深度估計 7 1.3 研究目的 9 1.4 論文架構 12 第二章 單目視覺深度估測與 Mask R-CNN 13 2.1 深度學習理論 13 2.1.1 卷積神經網路 13 2.1.2 ResNet 殘差網路 16 2.1.3 Transformer 轉變器 18 2.1.4 Self-attention Mechanism 自注意力機制 21 2.1.5 Vision Transformer 視覺轉換器 25 2.1.6 Monocular Depth Estimation 單目深度估計 26 2.2 Mask R-CNN 26 2.2.1 Feature Pyramid Network 特徵金字塔 27 2.2.2 Region Proposal Network 區域提案網路 29 2.2.3 RoI Align 33 2.2.4 Classification Layer Bounding Box Regression Layer 分類層與回歸層 35 2.2.5 Mask Generator Layer 遮罩生成層 36 2.2.6 Multi-task Loss Function 多任務損失函數 37 2.2.7 Binary Case Confusion Matrix 二元混淆矩陣 38 2.2.8 Segmentation Metric 分割指標 39 2.3 DPT BEiT Large 512 單目深度估計模型 40 2.3.1 BEiT (BERT Pretraining of Image Transformers) Encoder 編碼器 40 2.3.2 Reassemble 重組模塊及 Fusion 融合模塊 42 2.3.3 Masked Image Modeling 遮罩影像建模 43 2.3.4 梯度匹配項的尺度和平移不變性損失函數 44 第三章 實驗與模型設計 46 3.1 實驗設備與資料收集 48 3.2 資料前處理 51 3.2.1 Mask R-CNN 焊道影像標記 51 3.2.2 深度資料處理 52 3.2.3 資料增生 Data Augmentation 55 3.3 訓練 Mask R-CNN 模型 56 3.4 訓練 MiDaS 單目深度估計模型 57 3.5 焊道高度估計系統 57 3.5.1 基於 REVOPOINT 3D Acusense 相機之實際焊道重建 58 3.5.2 基於單目深度估計模型之焊道重建 59 3.5.3 點雲計算與濾波 60 3.5.4 焊道高度驗證 63 第四章 研究結果與分析 66 4.1 Mask R-CNN 模型結果 66 4.2 以 MiDaS 模型預測之焊道高度結果 68 4.3 實驗結果分析與討論 70 4.3.1 實例分割結果 70 4.3.2 單目深度估計模型與焊道高度量測結果 71 第五章 結論與未來展望 72 5.1 研究結論 72 5.2 未來展望 72 Reference 74

    [1] Eren, B., Demir, M. H., & Mistikoglu, S. (2023). Recent developments in computer vision and artificial intelligence aided intelligent robotic welding applications. The International Journal of Advanced Manufacturing Technology, 126, 4763–4809, doi:10.1007/s00170-023-11456-4.
    [2] Lei, T., Rong, Y., Wang, H., Huang, Y., & Li, M. (2020). A review of vision-aided robotic welding. Computers in Industry, 123, 103326, doi:10.1016/j.compind.2020.103326.
    [3] Xue, B., Chang, B., Peng, G., Gao, Y., Tian, Z., Du, D., & Wang, G. (2019). A vision based detection method for narrow butt joints and a robotic seam tracking system. Sensors, 19(5), 1144. doi:10.3390/s19051144.
    [4] AL-Karkhi, N. K., Abbood, W. T., Khalid, E. A., Jameel Al-Tamimi, A. N., Kudhair, A. A., & Abdullah, O. I. (2022). Intelligent robotic welding based on a computer vision technology approach. Computers, 11(11), 155, doi:10.3390/computers11110155
    [5] Yang, L., Fan, J., Liu, Y., Li, E., Peng, J., & Liang, Z. (2021). Automatic detection and location of weld beads with deep convolutional neural networks. IEEE Transactions on Instrumentation and Measurement, 70, 5001912, doi: 10.1109/TIM.2020.3026514.
    [6] Wang, J., Mu, C., Mu, S., Zhu, R., & Yu, H. (2023). Welding seam detection and location: Deep learning network-based approach. International Journal of Pressure Vessels and Piping, 202, 104893, doi:10.1016/j.ijpvp.2023.104893
    [7] Li, Y., Wang, Q. L., Li, Y. F., Xu, D., & Tan, M. (2008). On-line visual measurement and inspection of weld bead using structured light. I2MTC 2008 - IEEE International Instrumentation and Measurement Technology Conference, Victoria, Vancouver Island, Canada, May 12-15, 2008, pp. 1-6, doi: 10.1109/IMTC.2008.4547383
    [8] Pinto-Lopera, J. E., Motta, J. M. S. T., & Alfaro, S. C. A. (2016). Real-time measurement of width and height of weld beads in GMAW processes. Sensors, 16(1500), 1-14. doi:10.3390/s16091500
    [9] Zhou, K., Gao, X., Guo, J., Ye, G., Zhong, K., & Zhang, B. (2019). Weld bead width and height measurement using RANSAC. Proceedings of the 4th International Conference on Control and Robotics Engineering (pp. 35-39). IEEE. doi:10.1109/ICCRE.2019.8724264
    [10] 林偉宏”基於雙目立體視覺與影像處理的高精度 3D 量測系統開發” 國立成功大機械工程系碩士論文, 2023
    [11] Gu, Z., Chen, J., & Wu, C. (2021). Three-dimensional reconstruction of welding pool surface by binocular vision. Chinese Journal of Mechanical Engineering, 34, 47. doi:10.1186/s10033-021-00567-2
    [12] Wu, W., Kong, L., Liu, W., & Zhang, C. (2017). Laser sensor weld beads recognition and reconstruction for rail weld beads grinding robot. 2017 5th International Conference on Mechanical, Automotive and Materials Engineering (ICMAME), Chengdu, China, pp. 1-7, doi:10.1109/CMAME.2017.8540113.
    [13] 郭家維”使用 KNN 為機械手臂研磨製程從點雲資料中進行焊道判斷” 國立成功大學機械工程學系碩士論文, 2023
    [14] Bay, H., Tuytelaars, T., & Van Gool, L. (2006). SURF: Speeded Up Robust Features. In A. Leonardis, H. Bischof, & A. Pinz (Eds.), Computer Vision – ECCV 2006 (pp. 404-417). Springer, Berlin, Heidelberg, doi:10.1007/11744023_32.
    [15] Dalal, N., & Triggs, B. (2005). Histograms of Oriented Gradients for Human Detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) (Vol. 1, pp. 886-893). IEEE, doi:10.1109/CVPR.2005.177.
    [16] Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. (2011). ORB: An efficient alternative to SIFT or SURF. In Proceedings of the IEEE International Conference on Computer Vision (ICCV 2011) (pp. 2564-2571). IEEE,doi:10.1109/ICCV.2011.6126544.
    [17] Calonder, M., Lepetit, V., Strecha, C., & Fua, P. (2010). BRIEF: Binary Robust Independent Elementary Features. In K. Daniilidis, P. Maragos, & N. Paragios (Eds.), Computer Vision – ECCV 2010 (pp. 778-792). Springer, Berlin, Heidelberg, doi:10.1007/978-3-642-1556-1_56.
    [18] Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324, doi:10.1109/5.726791.
    [19] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1097-1105, doi:10.1145/3065386.
    [20] Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations (ICLR). doi:10.48550/arXiv.1409.1556.
    [21] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015) (pp. 1-9). IEEE, doi:10.1109/CVPR.2015.7298594.
    [22] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), 770-778, doi:10.1109/CVPR.2016.90.
    [23] Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 580-587. doi:10.1109/CVPR.2014.81
    [24] Girshick, R. (2015). Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 1440-1448. doi:10.1109/ICCV.2015.169
    [25] Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137-1149. doi:10.1109/TPAMI.2016.2577031
    [26] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779-788. doi:10.1109/CVPR.2016.91
    [27] Redmon, J., & Farhadi, A. (2016). YOLOv2: Better, Faster, Stronger. arXiv preprint arXiv:1612.08242. doi:10.48550/arXiv.1612.08242
    [28] Redmon, J., & Farhadi, A. (2018). YOLOv3: An Incremental Improvement. arXiv preprint arXiv:1804.02767
    [29] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). SSD: Single Shot MultiBox Detector. European Conference on Computer Vision (ECCV), 21-37. doi:10.1007/978-3-319-46448-0_2
    [30] Long, J., Shelhamer, E., & Darrell, T. (2015). Fully Convolutional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3431-3440. doi:10.1109/CVPR.2015.7298965
    [31] Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv preprint arXiv.doi:1505.04597
    [32] Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2481-2495. doi:10.1109/TPAMI.2016.2644615
    [33] Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834-848. doi:10.1109/TPAMI.2017.2699184
    [34] Visin, F., Kastner, K., Cho, K., Matteucci, M., Courville, A., & Bengio, Y. (2015). ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks. arXiv preprint arXiv:1505.00393.
    [35] Visin, F., Romero, A., Ciccone, M., Kastner, K., Cho, K., Matteucci, M., Bengio, Y., & Courville, A. (2016). ReSeg: A Recurrent Neural Network-based Model for Semantic Segmentation. arXiv preprint arXiv:1511.07053.
    [36] Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-Image Translation with Conditional Adversarial Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5967-5976. doi:10.1109/CVPR.2017.632
    [37] Kohl, S. A. A., Romera-Paredes, B., Meyer, C., De Fauw, J., Ledsam, J. R., Maier-Hein, K. H., Eslami, S. M. A., Rezende, D. J., & Ronneberger, O. (2018). A Probabilistic U Net for Segmentation of Ambiguous Images. Advances in Neural Information Processing Systems (NeurIPS). doi:10.48550/arXiv.1806.05034
    [38] Zhao, H., Jiang, L., Jia, J., Torr, P. H. S., & Koltun, V. (2021). Point Transformer. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). doi:10.48550/arXiv.2012.09164
    [39] Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., Liu, W., & Xiao, B. (2019). Deep high-resolution representation learning for visual recognition. arXiv preprint arXiv:1908.07919.
    [40] Scharstein, D., Szeliski, R., & Zabih, R. (2001). A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms. Proceedings of the IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV). doi:10.1109/SMBV.2001.988771
    [41] Hirschmüller, H. (2008). Stereo Processing by Semiglobal Matching and Mutual Information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 328-341. doi:10.1109/TPAMI.2007.1166
    [42] Schönberger, J. L., & Frahm, J. M. (2016). Structure-from-Motion Revisited. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4104-4113. doi:10.1109/CVPR.2016.445
    [43] Eigen, D., Puhrsch, C., & Fergus, R. (2014). Depth Map Prediction from a Single Image using a Multi-Scale Deep Network. Advances in Neural Information Processing Systems (NIPS), 2366-2374.
    [44] Silberman, N., Hoiem, D., Kohli, P., & Fergus, R. (2012). Indoor Segmentation and Support Inference from RGBD Images. European Conference on Computer Vision (ECCV), 746-760. doi:10.1007/978-3-642-33715-4_54
    [45] Fu, H., Gong, M., Wang, C., Batmanghelich, K., & Tao, D. (2018). Deep Ordinal Regression Network for Monocular Depth Estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2002-2011. doi:10.1109/CVPR.2018.00213
    [46] Liao, Y., Xie, J., & Geiger, A. (2020). KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D. arXiv preprint arXiv:2003.14389.
    [47] Díaz, R., & Marathe, A. (2019). Soft Labels for Ordinal Regression. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 6449-6457. doi:10.1109/CVPR.2019.00663
    [48] Ma, F., & Karaman, S. (2018). Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 1-8. doi:10.1109/ICRA.2018.8462921
    [49] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762.
    [50] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Learning Representations (ICLR). doi:10.48550/arXiv.2010.11929
    [51] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 248-255. doi:10.1109/CVPR.2009.5206848
    [52] Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., & Koltun, V. (2019). Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. arXiv preprint arXiv:1907.01341.
    [53] HUBEL DH, WIESEL TN. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. J Physiol. 1962 Jan;160(1):106-54. doi: 10.1113/jphysiol.1962.sp006837.
    [54] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1989). Backpropagation Applied to Handwritten Zip Code Recognition. Neural Computation, 1(4), 541-551. doi:10.1162/neco.1989.1.4.541
    [55] Ranftl, R., Bochkovskiy, A., & Koltun, V. (2021). Vision transformers for dense prediction. arXiv preprint arXiv:2103.13413.
    [56] He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2961-2969. doi:10.1109/ICCV.2017.322
    [57] Lin, T. Y., Dollar, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature Pyramid Networks for Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 936-944. doi:10.1109/CVPR.2017.106
    [58] Birkl, R., Wofk, D., & Müller, M. (2023). MiDaS v3.1 – A Model Zoo for Robust Monocular Relative Depth Estimation. arXiv preprint arXiv:2307.14460.
    [59] Bao, H., Dong, L., Piao, S., & Wei, F. (2022). BEIT: BERT Pre-Training of Image Transformers. arXiv preprint arXiv:2106.08254.
    [60] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 4171-4186. doi:10.18653/v1/N19-1423
    [61] Rolfe, J. T. (2017). Discrete variational autoencoders. Proceedings of the International Conference on Learning Representations (ICLR). doi:10.48550/arXiv.1609.02200
    [62] 2016-2019, Abhishek Dutta, Visual Geometry Group, Oxford University and VIA Contributors. https://annotate.officialstatistics.org/. Accessed 19 June 2024.
    [63] Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. European Conference on Computer Vision (ECCV), 740-755. doi:10.1007/978-3-319-10602-1_48

    無法下載圖示 校內:2029-08-29公開
    校外:2029-08-29公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE