研究生: |
楊詠傑 Yang, Yung-Jie |
---|---|
論文名稱: |
利用Mask R-CNN及MiDaS進行焊道辨識與高度估測 Using Mask R-CNN and MiDaS for Weld Bead Identification and Height Estimation |
指導教授: |
鍾俊輝
Chung, Chun-Hui |
學位類別: |
碩士 Master |
系所名稱: |
工學院 - 機械工程學系 Department of Mechanical Engineering |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 中文 |
論文頁數: | 102 |
中文關鍵詞: | 機器學習 、機器視覺 、單目深度估計 、Mask R-CNN 、焊道量測 |
外文關鍵詞: | Monocular Vision, Mask R-CNN, Weld Beads Measurement, Machine Vision, Machine Learning |
相關次數: | 點閱:46 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
機械手臂焊道研磨一直被認為是最複雜的加工製程之一,這項製程涉及加工路徑的規劃、加工精度控制以及研磨工件品質的管理等等,而這些方面均充滿著各種變數與挑戰,特別是在加工路徑的規劃上,由於不同的焊接工藝與焊接技師的技巧不同,往往會導致實際焊道的幾何形狀與預期設計有所偏差。為了降低在加工路徑規劃中的變數影響,通常在進行實際研磨時都會先利用雷射掃描機、深度相機等具備深度資訊的設備對焊道進行檢測,用以獲得實際焊道研磨所需的座標點位座標。目前工業界上使用 3D 相機及雷射掃描機等技術皆已相對成熟,但是其中還是有可以改進的地方。如單一台設備單價高,如要應用多個產線研磨製程中,可能會有成本上的考量。其次在焊道高度量測上,不僅需要考慮量測精度,還需考慮設備的成本問題。就目標導向而言,在焊道研磨的減法製程中,並不需要非常精準的量測結果即可滿足業界需求。因此本研究旨在建立一套單目焊道量測系統,開創一個全新的量測方式,其目的在建立一套具備足夠精度且成本較低的量測系統。系統旨在利用一張 RGB 影像,並且整合 Mask R-CNN 及單目深度估計模型,就能夠獲得候選框內每個像素下的深度值,進而對焊道進行重建。最後再利用點雲計算與濾波對重建之焊道進行處理,即可以對焊道高度進行估測。結果表明,Mask R-CNN 在像素分割上上優於以往焊道偵測的研究,並在平均精度(mean Average Precision,mAP)的表現上達到了 1.0。在單目量測系統中也證明了將特定數據集資料訓練於預訓練模型上,可以有效提高單目量測對於特定資料的結果。在整體系統焊道高度結果驗證,在十個不同的焊道上其平均誤差為 0.5 mm。
The robotic grinding process of weld beads is critical and intricate in the industry as a finishing process. Precision in grinding position is essential for high-quality outcomes, making weld bead detection and measurement crucial in automated systems. Traditional 3D measurement methods are costly and time-consuming, whereas human eyes can easily identify weld bead geometry for grinding. This study proposes an innovative approach using monocular vision with an Neural Network algorithm to estimate the weld beads height. We developed monocular depth estimation with Mask Region-based Convolutional Neural Network (Mask R-CNN) for accurate detection and measurement of weld beads. Unlike traditional methods that rely on complex sensor systems or expensive equipment, our method uses a single-camera setup, simplifying hardware requirements and the operation process. The monocular depth estimation technique provides a cost-effective means of capturing depth information from a single viewpoint. Integrated with Mask R-CNN, our system excels in precise localization and classification of weld beads, allowing the robotic arm to accurately adjust its grinding paths based on the exact dimensions and shapes of the beads. This study presents a novel method for enhancing the robotic arm's grinding process, marking a significant advancement in robotic arm grinding.
[1] Eren, B., Demir, M. H., & Mistikoglu, S. (2023). Recent developments in computer vision and artificial intelligence aided intelligent robotic welding applications. The International Journal of Advanced Manufacturing Technology, 126, 4763–4809, doi:10.1007/s00170-023-11456-4.
[2] Lei, T., Rong, Y., Wang, H., Huang, Y., & Li, M. (2020). A review of vision-aided robotic welding. Computers in Industry, 123, 103326, doi:10.1016/j.compind.2020.103326.
[3] Xue, B., Chang, B., Peng, G., Gao, Y., Tian, Z., Du, D., & Wang, G. (2019). A vision based detection method for narrow butt joints and a robotic seam tracking system. Sensors, 19(5), 1144. doi:10.3390/s19051144.
[4] AL-Karkhi, N. K., Abbood, W. T., Khalid, E. A., Jameel Al-Tamimi, A. N., Kudhair, A. A., & Abdullah, O. I. (2022). Intelligent robotic welding based on a computer vision technology approach. Computers, 11(11), 155, doi:10.3390/computers11110155
[5] Yang, L., Fan, J., Liu, Y., Li, E., Peng, J., & Liang, Z. (2021). Automatic detection and location of weld beads with deep convolutional neural networks. IEEE Transactions on Instrumentation and Measurement, 70, 5001912, doi: 10.1109/TIM.2020.3026514.
[6] Wang, J., Mu, C., Mu, S., Zhu, R., & Yu, H. (2023). Welding seam detection and location: Deep learning network-based approach. International Journal of Pressure Vessels and Piping, 202, 104893, doi:10.1016/j.ijpvp.2023.104893
[7] Li, Y., Wang, Q. L., Li, Y. F., Xu, D., & Tan, M. (2008). On-line visual measurement and inspection of weld bead using structured light. I2MTC 2008 - IEEE International Instrumentation and Measurement Technology Conference, Victoria, Vancouver Island, Canada, May 12-15, 2008, pp. 1-6, doi: 10.1109/IMTC.2008.4547383
[8] Pinto-Lopera, J. E., Motta, J. M. S. T., & Alfaro, S. C. A. (2016). Real-time measurement of width and height of weld beads in GMAW processes. Sensors, 16(1500), 1-14. doi:10.3390/s16091500
[9] Zhou, K., Gao, X., Guo, J., Ye, G., Zhong, K., & Zhang, B. (2019). Weld bead width and height measurement using RANSAC. Proceedings of the 4th International Conference on Control and Robotics Engineering (pp. 35-39). IEEE. doi:10.1109/ICCRE.2019.8724264
[10] 林偉宏”基於雙目立體視覺與影像處理的高精度 3D 量測系統開發” 國立成功大機械工程系碩士論文, 2023
[11] Gu, Z., Chen, J., & Wu, C. (2021). Three-dimensional reconstruction of welding pool surface by binocular vision. Chinese Journal of Mechanical Engineering, 34, 47. doi:10.1186/s10033-021-00567-2
[12] Wu, W., Kong, L., Liu, W., & Zhang, C. (2017). Laser sensor weld beads recognition and reconstruction for rail weld beads grinding robot. 2017 5th International Conference on Mechanical, Automotive and Materials Engineering (ICMAME), Chengdu, China, pp. 1-7, doi:10.1109/CMAME.2017.8540113.
[13] 郭家維”使用 KNN 為機械手臂研磨製程從點雲資料中進行焊道判斷” 國立成功大學機械工程學系碩士論文, 2023
[14] Bay, H., Tuytelaars, T., & Van Gool, L. (2006). SURF: Speeded Up Robust Features. In A. Leonardis, H. Bischof, & A. Pinz (Eds.), Computer Vision – ECCV 2006 (pp. 404-417). Springer, Berlin, Heidelberg, doi:10.1007/11744023_32.
[15] Dalal, N., & Triggs, B. (2005). Histograms of Oriented Gradients for Human Detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) (Vol. 1, pp. 886-893). IEEE, doi:10.1109/CVPR.2005.177.
[16] Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. (2011). ORB: An efficient alternative to SIFT or SURF. In Proceedings of the IEEE International Conference on Computer Vision (ICCV 2011) (pp. 2564-2571). IEEE,doi:10.1109/ICCV.2011.6126544.
[17] Calonder, M., Lepetit, V., Strecha, C., & Fua, P. (2010). BRIEF: Binary Robust Independent Elementary Features. In K. Daniilidis, P. Maragos, & N. Paragios (Eds.), Computer Vision – ECCV 2010 (pp. 778-792). Springer, Berlin, Heidelberg, doi:10.1007/978-3-642-1556-1_56.
[18] Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324, doi:10.1109/5.726791.
[19] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1097-1105, doi:10.1145/3065386.
[20] Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations (ICLR). doi:10.48550/arXiv.1409.1556.
[21] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015) (pp. 1-9). IEEE, doi:10.1109/CVPR.2015.7298594.
[22] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), 770-778, doi:10.1109/CVPR.2016.90.
[23] Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 580-587. doi:10.1109/CVPR.2014.81
[24] Girshick, R. (2015). Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 1440-1448. doi:10.1109/ICCV.2015.169
[25] Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137-1149. doi:10.1109/TPAMI.2016.2577031
[26] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779-788. doi:10.1109/CVPR.2016.91
[27] Redmon, J., & Farhadi, A. (2016). YOLOv2: Better, Faster, Stronger. arXiv preprint arXiv:1612.08242. doi:10.48550/arXiv.1612.08242
[28] Redmon, J., & Farhadi, A. (2018). YOLOv3: An Incremental Improvement. arXiv preprint arXiv:1804.02767
[29] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). SSD: Single Shot MultiBox Detector. European Conference on Computer Vision (ECCV), 21-37. doi:10.1007/978-3-319-46448-0_2
[30] Long, J., Shelhamer, E., & Darrell, T. (2015). Fully Convolutional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3431-3440. doi:10.1109/CVPR.2015.7298965
[31] Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv preprint arXiv.doi:1505.04597
[32] Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2481-2495. doi:10.1109/TPAMI.2016.2644615
[33] Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834-848. doi:10.1109/TPAMI.2017.2699184
[34] Visin, F., Kastner, K., Cho, K., Matteucci, M., Courville, A., & Bengio, Y. (2015). ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks. arXiv preprint arXiv:1505.00393.
[35] Visin, F., Romero, A., Ciccone, M., Kastner, K., Cho, K., Matteucci, M., Bengio, Y., & Courville, A. (2016). ReSeg: A Recurrent Neural Network-based Model for Semantic Segmentation. arXiv preprint arXiv:1511.07053.
[36] Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-Image Translation with Conditional Adversarial Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5967-5976. doi:10.1109/CVPR.2017.632
[37] Kohl, S. A. A., Romera-Paredes, B., Meyer, C., De Fauw, J., Ledsam, J. R., Maier-Hein, K. H., Eslami, S. M. A., Rezende, D. J., & Ronneberger, O. (2018). A Probabilistic U Net for Segmentation of Ambiguous Images. Advances in Neural Information Processing Systems (NeurIPS). doi:10.48550/arXiv.1806.05034
[38] Zhao, H., Jiang, L., Jia, J., Torr, P. H. S., & Koltun, V. (2021). Point Transformer. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). doi:10.48550/arXiv.2012.09164
[39] Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., Liu, W., & Xiao, B. (2019). Deep high-resolution representation learning for visual recognition. arXiv preprint arXiv:1908.07919.
[40] Scharstein, D., Szeliski, R., & Zabih, R. (2001). A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms. Proceedings of the IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV). doi:10.1109/SMBV.2001.988771
[41] Hirschmüller, H. (2008). Stereo Processing by Semiglobal Matching and Mutual Information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 328-341. doi:10.1109/TPAMI.2007.1166
[42] Schönberger, J. L., & Frahm, J. M. (2016). Structure-from-Motion Revisited. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4104-4113. doi:10.1109/CVPR.2016.445
[43] Eigen, D., Puhrsch, C., & Fergus, R. (2014). Depth Map Prediction from a Single Image using a Multi-Scale Deep Network. Advances in Neural Information Processing Systems (NIPS), 2366-2374.
[44] Silberman, N., Hoiem, D., Kohli, P., & Fergus, R. (2012). Indoor Segmentation and Support Inference from RGBD Images. European Conference on Computer Vision (ECCV), 746-760. doi:10.1007/978-3-642-33715-4_54
[45] Fu, H., Gong, M., Wang, C., Batmanghelich, K., & Tao, D. (2018). Deep Ordinal Regression Network for Monocular Depth Estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2002-2011. doi:10.1109/CVPR.2018.00213
[46] Liao, Y., Xie, J., & Geiger, A. (2020). KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D. arXiv preprint arXiv:2003.14389.
[47] Díaz, R., & Marathe, A. (2019). Soft Labels for Ordinal Regression. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 6449-6457. doi:10.1109/CVPR.2019.00663
[48] Ma, F., & Karaman, S. (2018). Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 1-8. doi:10.1109/ICRA.2018.8462921
[49] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762.
[50] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Learning Representations (ICLR). doi:10.48550/arXiv.2010.11929
[51] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 248-255. doi:10.1109/CVPR.2009.5206848
[52] Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., & Koltun, V. (2019). Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. arXiv preprint arXiv:1907.01341.
[53] HUBEL DH, WIESEL TN. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. J Physiol. 1962 Jan;160(1):106-54. doi: 10.1113/jphysiol.1962.sp006837.
[54] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1989). Backpropagation Applied to Handwritten Zip Code Recognition. Neural Computation, 1(4), 541-551. doi:10.1162/neco.1989.1.4.541
[55] Ranftl, R., Bochkovskiy, A., & Koltun, V. (2021). Vision transformers for dense prediction. arXiv preprint arXiv:2103.13413.
[56] He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2961-2969. doi:10.1109/ICCV.2017.322
[57] Lin, T. Y., Dollar, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature Pyramid Networks for Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 936-944. doi:10.1109/CVPR.2017.106
[58] Birkl, R., Wofk, D., & Müller, M. (2023). MiDaS v3.1 – A Model Zoo for Robust Monocular Relative Depth Estimation. arXiv preprint arXiv:2307.14460.
[59] Bao, H., Dong, L., Piao, S., & Wei, F. (2022). BEIT: BERT Pre-Training of Image Transformers. arXiv preprint arXiv:2106.08254.
[60] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 4171-4186. doi:10.18653/v1/N19-1423
[61] Rolfe, J. T. (2017). Discrete variational autoencoders. Proceedings of the International Conference on Learning Representations (ICLR). doi:10.48550/arXiv.1609.02200
[62] 2016-2019, Abhishek Dutta, Visual Geometry Group, Oxford University and VIA Contributors. https://annotate.officialstatistics.org/. Accessed 19 June 2024.
[63] Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. European Conference on Computer Vision (ECCV), 740-755. doi:10.1007/978-3-319-10602-1_48