簡易檢索 / 詳目顯示

研究生: 李佩如
Li, Pei-Ju
論文名稱: 基於視覺語言模型之像素理解與特徵過濾建置 AI 整合橋樑裂縫自動檢測系統
AI-Integrated Automated Bridge Crack Detection System Based on Vision Language Model-Assisted Pixel Understanding and Feature Filtering
指導教授: 陳朝鈞
Chen, Chao-Chun
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 製造資訊與系統研究所
Institute of Manufacturing Information and Systems
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 53
中文關鍵詞: 自動化橋樑檢測橋樑裂縫辨識像素理解視覺語言模型特徵過濾深度學習小目標偵測
外文關鍵詞: Automated Bridge Inspection, Bridge Crack Detection, Pixel Understanding, Vision Language Models, VLMs, Feature Filtering, Deep Learning, Small Object Detection
相關次數: 點閱:58下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著橋樑基礎設施數量持續增加,傳統橋樑檢測方法大多仰賴現場人員目視巡查及人工標記,需大量人力且效率低下。此外,這些方法具有主觀性高、作業危險性大及難以全面涵蓋橋樑底部或高處等盲點的問題,容易導致漏檢或誤判狀況,影響後續維護決策之準確性與即時性。雖然近年來已逐步導入無人機(UAV)與人工智慧(AI)等先進技術,但由於橋樑影像資料環境多變且複雜,影像中小尺度裂縫與背景雜訊難以明確區分,且純視覺特徵檢測方法缺乏對影像深層語意的理解與推理能力,因此無法有效解決現場實務應用中檢測誤判率高的問題,亟需導入更高效且可靠的自動化語意理解技術,以滿足現代橋樑維護管理之需求。

    本研究針對橋樑影像資料的特徵複雜性、小尺度裂縫辨識困難及純視覺模型缺乏深層語意理解等挑戰,提出一套整合視覺語言模型(VLM)輔助之像素語意理解與特徵過濾機制的橋樑裂縫自動檢測系統。本系統透過無人機(UAV)搭載高解析度相機獲取橋體影像,並採用多階段處理架構:首先,透過基於視覺語言模型的語意區域過濾模組,精確篩選出具結構語意之區域,抑制複雜背景雜訊干擾;其次,將有效區域經滑動視窗切割處理,使用深度學習裂縫分割模型自動偵測裂縫特徵,並進行影像重組與後處理以提升結果精度。此外,本系統亦建置支承墊辨識模組,自動偵測支承墊位置並精確標示刻度,輔助後續維護作業。

    研究結果顯示,本方法在橋樑裂縫與支承墊辨識任務中,皆展現優於現有主流模型的準確性與穩定性。實務效能驗證表明,全自動化流程能顯著提升檢測效率,且大幅降低人工參與之需求,於實際橋樑檢測場景中具備高度實用價值。本研究所提出之方法與系統架構,未來可望進一步推廣應用於其他基礎設施之智慧維護管理領域。

    With the continuous increase in bridge infrastructure, traditional bridge inspection methods—mostly relying on on-site visual inspections and manual annotations—have become increasingly inadequate due to their high labor demands and low efficiency. Moreover, these methods are subject to high subjectivity, pose safety risks for inspectors, and often fail to cover inaccessible areas such as the underside or upper portions of bridges. These limitations frequently lead to missed detections or misjudgments, thereby compromising the accuracy and timeliness of maintenance decisions.

    Although recent advancements have introduced technologies such as Unmanned Aerial Vehicles (UAVs) and Artificial Intelligence (AI) into bridge inspection workflows, challenges persist. The complex and variable conditions of bridge imagery make it difficult to distinguish small-scale cracks from background noise. Furthermore, conventional vision-based methods often lack the semantic reasoning capabilities necessary to understand deeper contextual information within images. As a result, the problem of high false detection rates in real-world applications remains unresolved, underscoring the urgent need for more efficient and semantically aware automated inspection technologies.

    To address the challenges of complex visual features, small-scale crack detection, and the lack of semantic reasoning in existing models, this study proposes an AI-integrated bridge crack detection system enhanced by Vision Language Model (VLM)-based pixel-level semantic understanding and feature filtering. The proposed system captures high-resolution bridge imagery using UAVs and adopts a multi-stage processing architecture. First, a semantic region filtering module based on VLMs is used to selectively retain structurally meaningful regions while suppressing background noise. Next, the refined regions are segmented using a sliding-window mechanism and processed by a deep learning-based crack segmentation model. The outputs are then reconstructed and post-processed to improve detection accuracy. Additionally, the system includes a bearing pad detection module, which automatically localizes bearing pads and infers precise scale markings to assist downstream maintenance tasks.

    Experimental results demonstrate that the proposed method outperforms existing mainstream models in both bridge crack detection and bearing pad recognition in terms of accuracy and robustness. Real-world efficiency evaluations further confirm that the fully automated pipeline significantly reduces manual labor while improving inspection speed, highlighting its strong practical value in bridge inspection scenarios. The proposed approach and system architecture also hold great potential for broader applications in the intelligent maintenance and management of other types of infrastructure.

    摘要 i 英文延仲摘要 ii 誌謝 v Table of Contents vi List of Tables viii List of Figures ix Chapter 1. Introduction 1 1.1. 研究背景 1 1.2. 研究動機及目的 3 Chapter 2. Related Work 5 2.1.自動化檢测技術 5 2.2. 影像處理 5 2.3. 影像辨識技術 7 2.4. 視覺語言模型 ( Vision-Language Models, VLMs) 8 Chapter 3. Method 10 3.1. 系统概觀 10 3.2. 符號定義 13 3.3. 模組設計與功能 15 3.3.1. 基於像素理解之特徵過濾模組 15 3.3.2. 橋標裂縫檢測模組 18 3.3.3. 支承墊檢測模组 21 Chapter 4. Result 25 4.1. 實驗環境 25 4.1.1. 電腦規格 25 4.1.2. 資料數量 26 4.1.3. 實驗項目 27 4.2. 實驗1.裂縫檢測結果可視化 27 4.3. 實驗2.支承墊辨識結果可視化 28 4.4. 實驗3.橋標裂縫檢測模型準確度比較 29 4.5. 實驗4.支承墊檢測模組準確度比較 31 4.6. 實驗5.人工、半自動與全自動裂縫檢測速度比較 33 Chapter 5. Conclusion 36 Chapter 6. Future Work 38 References 39

    [1] T. A. S. of Civil Engineers (ASCE), “2025 asce’s 2025 infrastructure report card,” 2025.
    [2] 交通部運輸研究所, “全國橋梁統計,” 2025.
    [3] N. R. Chirdeep, S. Shekhar, and A. Bahurudeen, “Climate change impact on seismic vulnerability of aging highway bridges,” ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering, vol. 9, no. 2, p. 04023002, 2023.
    [4] A. Zeinali, M. Sadeghi, and A. Farahmand, “Environmental influences on bridge deterioration based on periodic inspection data,” Buildings, vol. 10, no. 130, 2022.
    [5] F. Y. Toriumi, T. N. Bittencourt, and M. M. Futai, “Uav-based inspection of bridge and tunnel structures: an application review,” Rev. IBRACON Estrut. Mater., vol. 16, no. 1, p. e16103, 2023.
    [6] T.-N. Phan, H.-H. Nguyen, T.-T.-H. Ha, H.-T. Thai, and K.-H. Le, “Deep learning models for uav-assisted bridge inspection: A yolo benchmark analysis,” arXiv preprint arXiv:2411.04475, 2024.
    [7] Z. Ameli, Y. Aremanda, W. A. Friess, and E. N. Landis, “Impact of uav hardware options on bridge inspection mission capabilities,” Drones, vol. 6, no. 64, pp. 1–20, 2022.
    [8] A. Tire and collaborators at LTRC-LSU, “An automatic deep learning-based crack identification system using uav images of bridges,” LTRC Report 20-3, Louisiana Transportation Research Center, 2020.
    [9] Z. Ameli et al., “Impact of uav hardware options on bridge inspection mission capabilities,” Drones, vol. 6, no. 64, pp. 1–20, 2022.
    [10] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, and G. Goh, “Learning transferable visual models from natural language supervision,” in Proc. ICML, 2021.
    [11] J. Li, D. Li, S. Savarese, and S. Hoi, “Blip: Bootstrapping language-image pretraining for unified vision-language understanding and generation,” arXiv preprint arXiv:2201.12086, 2022.
    [12] J. Li, D. Li, S. Savarese, and S. Hoi, “Blip-2: Bootstrapping language-image pretraining with frozen image encoders and large language models,” arXiv preprint arXiv:2301.12597, 2023.
    [13] OpenAI, “Gpt-4 vision system card,” OpenAI, 2023.
    [14] F. Y. Toriumi et al., “Integration of artificial intelligence and iot with uavs for precision inspection: A review,” Engineering Applications of Artificial Intelligence, 2025.
    [15] Z. Ren, F. Fang, N. Yan, and Y. Wu, “State of the art in defect detection based on machine vision,” International Journal of Precision Engineering and Manufacturing, vol. 22, no. 4, pp. 1201–1217, 2021.
    [16] D. C. Tsouros, S. Bibi, and P. G. Sarigiannidis, “A review on uav-based applications for precision agriculture,” Information, vol. 10, no. 249, p. 349, 2019.
    [17] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. W. M. van der Laak, B. van Ginneken, and C. I. Sánchez, “A survey on deep learning in medical image analysis,” Medical Image Analysis, vol. 42, pp. 60–88, 2017.
    [18] A. Khan et al., “Deep convolutional neural networks in medical image analysis,” Information, vol. 16, no. 3, p. 195, 2023.
    [19] H. M. La, N. Gucunski, et al., “Development of an autonomous bridge deck inspection robotic system,” arXiv, 2017.
    [20] N. T. Library, “Efficient, low-cost bridge cracking detection and quantification,” tech. rep., U.S. DOT, 2021.
    [21] Anonymous, “Deep learning for efficient high-resolution image processing,” ScienceDirect, 2025.
    [22] Anonymous, “No-more-sliding-window: Efficient 3d medical image segmentation,” arXiv, 2025.
    [23] J. Lv, Q. Shen, M. Lv, Y. Li, L. Shi, and P. Zhang, “Deep learning-based semantic segmentation of remote sensing images: a review,” Frontiers in Ecology and Evolution, vol. 11, p. 1201125, 2023.
    [24] W. Guo, H. Zhang, L. Chen, et al., “Hyperspectral image segmentation for optimal satellite operations,” Remote Sensing, vol. 17, no. 4, p. 642, 2023.
    [25] Anonymous, “A review of recent advances in data-driven computer vision methods for structural damage evaluation,” Archives of Computational Methods in Engineering, 2025.
    [26] Anonymous, “Segment anything in medical images,” Nature Communications, 2024.
    [27] X. Tang, J. Qu, W. Li, et al., “Region segmentation of images based on a raster-scan paradigm,” Sensors, vol. 13, no. 6, p. 80, 2023.
    [28] F. Liu and L. Fan, “A review of advancements in low-light image enhancement using deep learning,” arXiv, 2025.
    [29] D. Zhang, P. Yue, Y. Yan, Q. Niu, J. Zhao, and H. Ma, “Multi-source remote sensing images semantic segmentation based on differential feature attention fusion,” Remote Sensing, vol. 16, no. 24, p. 4717, 2024.
    [30] J. Canny, “A computational approach to edge detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 8, no. 6, pp. 679–698, 1986.
    [31] P. Soille, “Morphological image processing,” in Image Analysis and Processing, Springer, 2004.
    [32] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.
    [33] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to
    document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
    [34] Z. Wong, Z. Rguibi, and D. Zitouni, “A review of deep learning for medical image analysis,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 13, no. 3, p. e1513, 2023.
    [35] X. Yang and et al., “A review of object detection approaches for traffic surveillance systems,” Journal of Traffic and Transportation Engineering, 2023.
    [36] Y. Kim and et al., “Rsef-edge: Edge-computing-based u-net architecture for pavement crack detection,” Frontiers in Built Environment, 2023.
    [37] F. Liu, L. Fan, et al., “Pavement crack detection based on the improved swin-unet model,” Buildings, vol. 14, no. 5, p. 1442, 2023.
    [38] H. Zhang et al., “Deep learning in crack detection: A comprehensive scientometric review,” Neural Computing and Applications, 2025.
    [39] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014.
    [40] J. Redmon and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
    [41] W. Liu and et al., “Ssd: Single shot multibox detector,” in European Conference on Computer Vision, 2016.
    [42] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.
    [43] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2015.
    [44] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in Proceedings of the IEEE International Conference on Computer Vision, 2017.
    [45] F. Neha, D. Bhati, D. K. Shukla, S. M. Dalvi, N. Mantzou, and S. Shubbar, “U-net in medical image segmentation: A review of its applications across modalities,” arXiv, 2024.
    [46] e. a. Qing, “A novel pixel-level crack segmentation network (u-net-fml),” Scientific Reports, 2025.
    [47] L. Ali, H. AlJassmi, M. Swavaf, W. Khan, and F. Alnajjar, “Rs-net: Residual sharp u-net architecture for pavement crack segmentation and severity assessment,” Journal of Big Data, vol. 11, 2024.
    [48] T. Kunlamai, W.-C. Liao, K. Yamaguchi, S. Hiasa, T. Fukuda, and K. Matsuoka, “Lever-aging vision–language pretraining on inspection reports for improved visual question answering in bridge inspection,” Computer-Aided Civil and Infrastructure Engineering, vol. 39, no. 3, pp. 345–361, 2024.
    [49] Z. Wang and N. M. El-Gohary, “Using vision-language pretraining for automated bridge inspection image captioning and reporting,” in Proceedings of the ASCE International Conference on Computing in Civil Engineering (i3CE 2023), (Reston, VA), American Society of Civil Engineers, 2024.
    [50] W.-C. Liao and A. Nakano, “Bridgeclip: Vision-language pretraining for multi-label bridge damage classification,” in Lecture Notes in Computer Science, LNCS, Springer, 2024. Presented in December 2024.
    [51] W. Chen, H. Zhang, and Y. Lin, “Few-shot bridge inspection with uav imagery using clip and tip-adapter,” in International Conference on Computing in Civil and Building Engineering (ICCBEI), (Tokyo, Japan), 2025. Accepted for publication.
    [52] K. Darji, Y. Li, and A. Kumar, “Evaluating vision-language models for interpreting bridge nondestructive evaluation (nde) images,” arXiv preprint arXiv:2507.12345, 2025.
    [53] Y. Liang, Z. Wu, and M. Zhang, “Crackclip: Weakly supervised concrete crack segmentation via vision-language models,” Entropy, vol. 27, no. 2, p. 127, 2025.
    [54] J. Zhang, Q. Liu, and K.-M. Wong, “Automating bridge inspection reports using large language models for structured information extraction,” Automation in Construction: X, vol. 3, p. 100045, 2024.
    [55] J. L. D. T. X. L. C. Q. L. Z. Yuqian Yuan, Wentong Li and J. Zhu, “Osprey: Pixel
    understanding with visual instruction tuning,” 2023.
    [56] A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han, and G. Ding, “Yolov10: Real-time end-to-end object detection,” arXiv preprint arXiv:2405.14458, 2024.
    [57] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” arXiv:1802.02611, 2018.
    [58] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” CoRR, vol. abs/1511.00561, 2015.
    [59] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in CVPR, 2017.
    [60] S. GmbH, “A pytorch implementation of efficientdet object detection,” https://github.com/signatrix/efficientdet, 2020.
    [61] S. Ren, K. He, R. B. Girshick, and J. Sun, “Faster R-CNN: towards real-time object detection with region proposal networks,” CoRR, vol. abs/1506.01497, 2015.
    [62] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. E. Reed, C. Fu, and A. C. Berg, “SSD: single shot multibox detector,” CoRR, vol. abs/1512.02325, 2015.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE