研究生: |
楊子毅 Yang, Tzu-Yi |
---|---|
論文名稱: |
以雙目視覺及卷積神經網路進行焊道辨識與三維幾何量測 Weld Bead Detection and 3D Geometric Measurement by Using Binocular Vision and Convolutional Neural Networks |
指導教授: |
鍾俊輝
Chung, Chun-hui |
學位類別: |
碩士 Master |
系所名稱: |
工學院 - 機械工程學系 Department of Mechanical Engineering |
論文出版年: | 2025 |
畢業學年度: | 113 |
語文別: | 中文 |
論文頁數: | 88 |
中文關鍵詞: | 機器視覺 、雙目立體視覺 、深度學習 、Mask R-CNN 、焊道重建 |
外文關鍵詞: | Machine Vision, Stereo Vision, Deep Learning, Mask R-CNN, Weld Bead Reconstruction |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在工業製造流程中,焊道研磨是提升工件表面平整度與外觀品質的關鍵工序,焊 道的幾何形貌將直接影響機械研磨的深度與範圍,進而左右最終的加工品質。傳統上, 業界多採用雷射掃描器或深度相機來進行焊道三維輪廓的量測,雖然這些設備具備高 度精度,但同時也面臨高昂成本、環境干擾以及操作彈性不足等問題。因此本研究旨 在以成本更低、操作更容易的方法建立一套可行的焊道三維重建流程。提出兩種以雙 目視覺為基礎的三維焊道重建方法。第一種方法利用傳統配對方法。透過兩台數位相 機拍攝焊道左右影像後,利用半全域區塊匹配(Semi-Global Block Matching, SGBM) 產生視差圖,經過優化後,並利用內外參數將視差轉換為深度資訊,建構初步的三維 輪廓,而為提升重建準確性及減少計算量, 使用 Mask R-CNN 模型框選焊道所在區 域,僅保留焊道區域進行三維點雲重建,最後再將初始重建點雲進行後處理得到誤差 更小的焊道重建點雲。第二種方法則結合深度學習,利用神經網路模型學習左右影像 間的視覺差異與深度關係,模型訓練資料由深度相機拍攝的實際焊道深度圖作為監督 標籤,並以經過 Mask R-CNN 遮罩處理後的左右圖像作為輸入,使模型聚焦於焊道 區域,提升學習效率與預測精度。相較傳統方法,深度學習路徑在焊道邊界模糊或紋 理不明顯的情況下,仍能穩定輸出合理的深度圖。為驗證兩種方法的準確性與實用性, 研究中將重建出的焊道點雲與深度相機實際拍攝的三維點雲進行比對,並採用迭代最 近點(Iterative Closest Point, ICP)演算法進行配準,計算兩組點雲間的均方根誤差 (Root Mean Square Error, RMSE)作為評估指標,並考慮研磨需求高度資訊下,計算 焊道輪廓最高點的平均絕對誤差(Mean Absolute Error, MAE)誤差。實驗結果顯示, 兩種方法皆能有效完成焊道三維重建,傳統配對方法在 ICP 配準中的 RMSE 為 0.494 mm,而焊道截面高度 MAE 誤差為 0.466 mm,而深度學習方法在 ICP 配準中的 RMSE 為 0.454 mm,而焊道截面高度 MAE 誤差為 0.391 mm,其中深度學習方法展現出更 佳的適應能力與精度表現。綜上所述,本研究主要貢獻在於提出以雙目視覺結合深度 學習的焊道三維重建方法,成功設計適用於焊道輪廓預測的輕量化模型,並建立一套 可重現的資料處理與訓練流程,提供具實用性與擴展性的焊道幾何資訊擷取方案。
Weld bead grinding is a critical process in industrial manufacturing that enhances the smoothness and visual quality of workpiece surfaces. The geometric profile of the weld bead directly influences the depth and range of mechanical grinding, thereby affecting the final processing quality. Traditional measurement methods often rely on laser scanners or depth cameras, which can provide high-precision 3D information but suffer from high costs, strong susceptibility to environmental interference, and limited operational flexibility. To address these issues, this study proposes two stereo vision-based methods for 3D reconstruction of weld beads, aiming to provide a cost-effective solution with acceptable accuracy. The first method adopts a traditional disparity computation pipeline. Using two digital cameras to capture left and right images of the weld bead, disparity maps are generated via Semi-Global Block Matching (SGBM). After optimization, these are converted into depth information based on the intrinsic and extrinsic camera parameters. Mask R-CNN is employed to detect and localize the weld bead region, thereby reducing computational overhead. Post-processing is then applied to obtain a refined 3D point cloud. The second method incorporates deep learning. A convolutional neural network is trained to learn the relationship between stereo image pairs and depth information, using ground-truth depth maps captured by a depth camera as supervision. Mask R-CNN is integrated to focus the model on the weld bead region, improving both prediction accuracy and robustness. In experiments, the point clouds generated by both methods were aligned with depth camera measurements using Iterative Closest Point (ICP), and evaluated using Root Mean Square Error (RMSE) and the Mean Absolute Error (MAE) of the weld bead cross-section’s highest point. Results show that the traditional method achieved an RMSE of 0.494 mm and an MAE of 0.466 mm, while the deep learning method achieved an RMSE of 0.454 mm and an MAE of 0.391 mm. The latter maintained higher accuracy and stability, particularly under conditions of blurred boundaries or weak texture. In conclusion, the proposed stereo vision-based 3D reconstruction method, enhanced with deep learning, not only reduces system cost but also demonstrates strong scalability and practicality, offering an efficient and feasible alternative for industrial weld bead geometry measurement.
[1] Yusof, F., & Jamaluddin, M. F. (2014). Welding Defects and Implications on Welded Assemblies. In Comprehensive Materials Processing, Vol. 6, pp. 125–134. Elsevier. doi:10.1016/B978-0-08-096532-1.00605-1
[2] Rizvi, S. A., & Alib, W. (2019). Welding defects, causes and their remedies: A review. Teknomekanik, 2(2), 39–47. doi:10.24036/tm.v2i2.3272
[3] AL-Karkhi, N. K., Abbood, W. T., Khalid, E. A., Jameel Al-Tamimi, A. N., Kudhair, A. A., & Abdullah, O. I. (2022). Intelligent robotic welding based on a computer vision technology approach. Computers, 11(11), 155, doi:10.3390/computers11110155
[4] Yang, L., Fan, J., Liu, Y., Li, E., Peng, J., & Liang, Z. (2021). Automatic detection and location of weld beads with deep convolutional neural networks. IEEE Transactions on Instrumentation and Measurement, 70, 5001912, doi: 10.1109/TIM.2020.3026514.
[5] Wang, J., Mu, C., Mu, S., Zhu, R., & Yu, H. (2023). Welding seam detection and location: Deep learning network-based approach. International Journal of Pressure Vessels and Piping, 202, 104893, doi:10.1016/j.ijpvp.2023.104893
[6] Han, Y., Fan, J. & Yang, X. (2020). A structured light vision sensor for on-line weld bead measurement and weld quality inspection. Int J Adv Manuf Technol 106, 2065–2078. https://doi.org/10.1007/s00170-019-04450-2
[7] Ye, Y., Chen, S., & Song, Z. (2022). Benchmarks for Industrial Inspection Based on Structured Light. arXiv, arXiv:2207.00796. doi:10.48550/arXiv.2207.00796
[8] Li, Y., Wang, Q. L., Li, Y. F., Xu, D., & Tan, M. (2008). On-line visual measurement and inspection of weld bead using structured light. I2MTC 2008 - IEEE International Instrumentation and Measurement Technology Conference, Victoria, Vancouver Island, Canada, May 12-15, 2008, pp. 1-6, doi: 10.1109/IMTC.2008.4547383
[9] Pinto-Lopera, J. E., Motta, J. M. S. T., & Alfaro, S. C. A. (2016). Real-time measurement of width and height of weld beads in GMAW processes. Sensors, 16(1500), 1-14. doi:10.3390/s16091500
[10] Zhou, K., Gao, X., Guo, J., Ye, G., Zhong, K., & Zhang, B. (2019). Weld bead width and height measurement using RANSAC. Proceedings of the 4th International Conference on Control and Robotics Engineering, pp. 35-39. IEEE. doi:10.1109/ICCRE.2019.8724264
[11] 林偉宏”基於雙目立體視覺與影像處理的高精度 3D 量測系統開發” 國立成功大機械工程系碩士論文, 2023
[12] Gu, Z., Chen, J., & Wu, C. (2021). Three-dimensional reconstruction of welding pool surface by binocular vision. Chinese Journal of Mechanical Engineering, 34, 47. doi:10.1186/s10033-021-00567-2
[13] Qin, Y., & Xu, Z. (2022). 3D Reconstruction of Traditional Handicrafts Based on Binocular Vision. Advances in Multimedia, 2022(1), 9456232. https://doi.org/10.1155/2022/9456232
[14] Huo, G., Wu, Z., Li, J., & Li, S. (2018). Underwater Target Detection and 3D Reconstruction System Based on Binocular Vision. Sensors, 18(10), 3570. https://doi.org/10.3390/s18103570
[15] Li, H., Wang, S., Bai, Z., Wang, H., Li, S., & Wen, S. (2023). Research on 3D Reconstruction of Binocular Vision Based on Thermal Infrared. Sensors, 23(17), 7372. https://doi.org/10.3390/s23177372
[16] Wu, W., Kong, L., Liu, W., & Zhang, C. (2017). Laser sensor weld beads recognition and reconstruction for rail weld beads grinding robot. 2017 5th International Conference on Mechanical, Automotive and Materials Engineering (ICMAME), Chengdu, China, pp. 1-7, doi:10.1109/CMAME.2017.8540113.
[17] 郭家維”使用 KNN 為機械手臂研磨製程從點雲資料中進行焊道判斷” 國立成功大學機械工程學系碩士論文, 2023
[18] 楊詠傑”利用Mask R-CNN及MiDaS進行焊道辨識與高度估測” 國立成功大學機械工程學系碩士論文, 2024
[19] Bay, H., Tuytelaars, T., & Van Gool, L. (2006). SURF: Speeded Up Robust Features. In A. Leonardis, H. Bischof, & A. Pinz (Eds.), Computer Vision – ECCV 2006 (pp. 404-417). Springer, Berlin, Heidelberg, doi:10.1007/11744023_32.
[20] Dalal, N., & Triggs, B. (2005). Histograms of Oriented Gradients for Human Detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), Vol. 1, pp. 886-893. IEEE, doi:10.1109/CVPR.2005.177.
[21] Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. (2011). ORB: An efficient alternative to SIFT or SURF. In Proceedings of the IEEE International Conference on Computer Vision (ICCV 2011) , pp. 2564-2571. IEEE,doi:10.1109/ICCV.2011.6126544.
[22] Calonder, M., Lepetit, V., Strecha, C., & Fua, P. (2010). BRIEF: Binary Robust Independent Elementary Features. In K. Daniilidis, P. Maragos, & N. Paragios (Eds.), Computer Vision – ECCV 2010, pp. 778-792. Springer, Berlin, Heidelberg, doi:10.1007/978-3-642-1556-1_56.
[23] Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324, doi:10.1109/5.726791.
[24] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1097-1105, doi:10.1145/3065386.
[25] Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations (ICLR). doi:10.48550/arXiv.1409.1556.
[26] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015) , pp. 1-9. IEEE, doi:10.1109/CVPR.2015.7298594.
[27] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), 770-778, doi:10.1109/CVPR.2016.90.
[28] Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 580-587. doi:10.1109/CVPR.2014.81
[29] Girshick, R. (2015). Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 1440-1448. doi:10.1109/ICCV.2015.169
[30] Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137-1149. doi:10.1109/TPAMI.2016.2577031
[31] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779-788. doi:10.1109/CVPR.2016.91
[32] Redmon, J., & Farhadi, A. (2016). YOLOv2: Better, Faster, Stronger. arXiv preprint arXiv:1612.08242. doi:10.48550/arXiv.1612.08242
[33] Redmon, J., & Farhadi, A. (2018). YOLOv3: An Incremental Improvement. arXiv preprint arXiv:1804.02767
[34] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). SSD: Single Shot MultiBox Detector. European Conference on Computer Vision (ECCV), 21-37. doi:10.1007/978-3-319-46448-0_2
[35] Long, J., Shelhamer, E., & Darrell, T. (2015). Fully Convolutional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3431-3440. doi:10.1109/CVPR.2015.7298965
[36] Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv preprint arXiv.doi:1505.04597
[37] Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2481-2495. doi:10.1109/TPAMI.2016.2644615
[38] Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834-848. doi:10.1109/TPAMI.2017.2699184
[39] Visin, F., Kastner, K., Cho, K., Matteucci, M., Courville, A., & Bengio, Y. (2015). ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks. arXiv preprint arXiv:1505.00393.
[40] Visin, F., Romero, A., Ciccone, M., Kastner, K., Cho, K., Matteucci, M., Bengio, Y., & Courville, A. (2016). ReSeg: A Recurrent Neural Network-based Model for Semantic Segmentation. arXiv preprint arXiv:1511.07053.
[41] Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-Image Translation with Conditional Adversarial Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5967-5976. doi:10.1109/CVPR.2017.632
[42] Kohl, S. A. A., Romera-Paredes, B., Meyer, C., De Fauw, J., Ledsam, J. R., Maier-Hein, K. H., Eslami, S. M. A., Rezende, D. J., & Ronneberger, O. (2018). A Probabilistic U Net for Segmentation of Ambiguous Images. Advances in Neural Information Processing Systems (NeurIPS). doi:10.48550/arXiv.1806.05034
[43] Zhao, H., Jiang, L., Jia, J., Torr, P. H. S., & Koltun, V. (2021). Point Transformer. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). doi:10.48550/arXiv.2012.09164
[44] Zhang, Z. (2000). A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11), 1330–1334. https://doi.org/10.1109/34.888718
[45] Scharstein, D., & Szeliski, R. (2007). Evaluation of cost functions for stereo matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/CVPR.2007.383248
[46] Heo, Y.-S., Lee, K.-M., & Lee, S.-U. (2008). Illumination and camera invariant stereo matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/CVPR.2008.4587654
[47] Lin, Y., Gao, Y., & Wang, Y. (2021). An improved SSD algorithm for automotive distance measurement. Frontiers in Physics, 9, 787471. https://doi.org/10.3389/fphy.2021.787471
[48] Boykov, Y., & Jolly, M.-P. (2001). Interactive Graph Cuts for Optimal Boundary & Region Segmentation of Objects in N-D Images. In Proceedings of the Eighth IEEE International Conference on Computer Vision. ICCV 2001, Vol. 1, pp. 105–112. https://doi.org/10.1109/ICCV.2001.937505
[49] Felzenszwalb, P. F., & Huttenlocher, D. P. (2006). Efficient Belief Propagation for Early Vision. International Journal of Computer Vision, 70(1), 41–54. https://doi.org/10.1007/s11263-006-5536-9
[50] Morris, M. E., & Kanade, T. (1999). A Novel Bayesian Approach to Dynamic Programming for stereo vision. In Proceedings of the International Conference on Pattern Recognition (ICPR), Vol. 3, pp. 780–785. https://doi.org/10.1109/ICPR.1998.711885
[51] Konolige, K. (1998). Small Vision Systems: Hardware and Implementation. In: Shirai, Y., Hirose, S. (eds) Robotics Research. Springer, London. https://doi.org/10.1007/978-1-4471-1580-9_19
[52] Hirschmüller, H. (2005). Accurate and efficient stereo processing by Semi‑Global Matching and mutual information. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 807–814. https://doi.org/10.1109/CVPR.2005.39
[53] Setyawan, R. A., Sunoko, R., Choiron, M. A., & Rahardjo, P. M. (2018). Implementation of stereo vision distance measurement methods using Semi‑Global Block Matching. Indonesian Journal of Electrical Engineering & Computer Science, 12(2), 585–591. https://doi.org/10.11591/ijeecs.v12.i2.pp585-591
[54] Chang, J.-R., & Chen, Y.-S. (2018). Pyramid Stereo Matching Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 5410–5418). IEEE. https://doi.org/10.1109/CVPR.2018.00567
[55] Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., & Bry, A. (2017). GC-Net: End-to-End Learning of Geometry and Context for Deep Stereo Regression. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 66–75. https://doi.org/10.1109/ICCV.2017.15
[56] Wang, Y., Lai, Z., Huang, G., Wang, B. H., Van Der Maaten, L., Campbell, M., & Weinberger, K. Q. (2019, May). Anytime stereo image depth estimation on mobile devices. In 2019 international conference on robotics and automation (ICRA) (pp. 5893-5900). IEEE, doi: 10.1109/ICRA.2019.8794003.
[57] Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., & Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4040-4048, doi: 10.1109/CVPR.2016.438.
[58] Khamis, S., Fanello, S., Rhemann, C., Kowdle, A., Valentin, J., Izadi, S. (2018). StereoNet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds) Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science(), vol 11219. Springer, Cham. https://doi.org/10.1007/978-3-030-01267-0_35
[59] Tankovich, V., Hane, C., Zhang, Y., Kowdle, A., Fanello, S., & Bouaziz, S. (2021). Hitnet: Hierarchical iterative tile refinement network for real-time stereo matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 14362-14372, doi: 10.1109/CVPR46437.2021.01413.
[60] Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. European Conference on Computer Vision (ECCV), 740-755. doi:10.1007/978-3-319-10602-1_48
[61] Hoppe, H. (2008, June). Poisson surface reconstruction and its applications. In Proceedings of the 2008 ACM symposium on Solid and physical modeling , pp. 10-10. https://doi.org/10.1145/1364901.1364904
[62] Besl, P. J., & McKay, N. D. (1992, April). Method for registration of 3-D shapes. In Sensor fusion IV: control paradigms and data structures, Vol. 1611, pp. 586-606. Spie, doi: 10.1109/34.121791.