簡易檢索 / 詳目顯示

研究生: 何尹文
Ho, Yin-Wun
論文名稱: 基於注意力神經網路模型進行衛星影像匹配的比較研究與模型訓練
Comparative Analysis and Training of Attentional Neural Network Models for Satellite Image Matching
指導教授: 林昭宏
Lin, Chao-Hung
學位類別: 碩士
Master
系所名稱: 工學院 - 測量及空間資訊學系
Department of Geomatics
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 64
中文關鍵詞: 深度學習圖像匹配ANN正射糾正衛星影像
外文關鍵詞: Deep Learning, Image Matching, ANN, Orthorectification, Satellite Image
相關次數: 點閱:73下載:17
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在攝影測量領域中,圖像匹配是一種檢測兩幅共軛影像之間關聯性的關鍵技術。在衛星影像上,可以利用未正射化之衛星影像與它幅同地域不同衛星之正射化參考影像之間的同名點作為控制點進行衛星圖像正射化。本研究目的在於為自動化衛星影像正射糾正系統(automatically orthorectification system) 提供一個具有穩定性且可靠的匹配方法。
    相較於近景影像,衛星影像在拍攝過程中容易受到光照和大氣的干擾,如自然光亮度、天氣狀態、雲層等造成的過曝、少曝、模糊、遮蔽等影響。且衛星影像具有較高攝影成本,難以透過修正攝影參數或是重新採樣來避免圖像質量對匹配精度的影響,只能藉由影像處理來改善精度,例如調整亮度和對比度。在匹配任務中,小物體特徵的重複出現和鏡頭和地形位移造成的扭曲也可能導致匹配失敗,降低匹配的穩定性。這些問題在使用傳統的影像匹配算法(如SURF、SIFT和FAST)時容易失去匹配點,且無法透過調整參數改善。在失去匹配點的情況下,無法解算正射糾正所需要的參數,或是匹配點過度集中於部分影像,即控制點分布不均勻,容易造成圖形扭曲,無控制點區域的誤差過大,因此我們希望可以在任務中得到足夠進行方程式解算且均勻分布的匹配點。
    為了減少這些不穩定性因素,我們進行了基於深度學習的衛星圖像匹配的研究。在過去的衛星影像研究中,證實了SuperPoint和SuperGlue顯著提高了匹配穩定性,並且具有適應不同光線、地形和攝影質量的良好匹配;其中特徵提取基於SuperPoint進行,而特徵匹配基於SuperGlue。近年更多良好的深度學習影像匹配模型被提出,包括LoFTR和SGMNet。 LoFTR在低紋理特徵提取方面表現良好,而SGMNet在計算效率方面表現出色。
    本研究藉由嘗試比較以下5組基於深度學習的不同圖像匹配模型: SIFT+SuperGlue、SIFT+SGMNet、SuperPoint+SuperGlue、SuperPoint+SGMNet 和LoFTR。試圖找出操作效率和成本更好的框架。在比較不同圖像匹配模型的全局算力時,多數研究會以單應矩陣估計相似度來呈現模型精度。然而衛星拍攝傾角大多垂直於地面,不適用單應矩陣的夾角估計,且單應矩陣也不能表現局部位移的現象。因此,我們設計了另一個系統:使用RPC (Rational Polynomial Coefficient) 正射校正來獲取匹配對的真實值,並添加高斯模糊、高斯噪聲和亮度調整進行測試。此系統除了測試模型的算力和魯棒性之外,還評估了上述5組模型的運算速度和內存消耗,以改善衛星影像正射糾正系統的操作效率和成本。
    經上述實驗評估後,本篇研究最終選擇SIFT-SuperGlue、SIFT-SGMNet作為深度匹配模型進行訓練,並縮減及改寫部分架構。此外,在訓練資料集上針對平面旋轉範圍、圖片亮度進行調整,以圖模擬衛星影像與參考影像光照條件不同的情境進行訓練。透過上述修正試圖取得適用衛星影像正射糾正系統的匹配模型。

    In the field of photogrammetry, image matching is a fundamental and crucial technique for detecting homophony between two conjugate images. An automatically orthorectification system can be developed to orthorectify satellite images using tie points from orthorectified reference images as control points.
    However, compared with close-range images, satellite images are more likely to be disturbed by atmospheric and environmental factors, such as natural light brightness, cloud cover, and weather. During the matching process, the repeated features of the small objects, lens distortion, and relief displacement will also cause task failure and reduce the stability of this system. Due to the high cost of satellite images, the image quality cannot be improved by correcting photographic parameters or resampling. The problems can only be improved through image processing one by one, such as adjusting brightness and contrast, but the effect is limited.
    In the case of missing matching points or an uneven distribution of control points, this can indeed be challenging to perform accurate orthorectification. Insufficient matching points may prevent the calculation of necessary parameters, while excessive concentration of matching points in certain areas may cause distortion in the resulting orthorectified image. To address these issues, it is desirable to have a sufficient number of matching points for equation solving and ensure a uniform distribution of these points throughout the task area.
    The traditional detector method (SURF, SIFT, and FAST) has even feature distributions, but the descriptors are feature maps with gradient, which cannot represent the geometric structure. Some exrtreme cases with cloud (that has higher contrast than landcover), large rotate angle or low contrast can lead to error matching or mismatching.
    To find a method that can result an evenly distribute matching point, this research turned to the use of machine learning methods for orthorectification of satellite image. Satellite imagery research has confirmed that machine learning feature extraction methods, SuperPoint and machine learning feature matching methods, SuperGlue can significantly improve matching stability, and have good matching for different light, terrain and photographic quality. In recent years, more machine learning-based methods have been developed to compute the definition of features, such as LoFTR and SGMNet. LoFTR has good performance for low-texture feature extraction, while SGMNet has high computational efficiency.
    This paper aims to compare deep learning-based image matching models and find a framework with better operating efficiency and cost and train models for the orthorectification system. The following models are selected for comparison: SIFT+SuperGlue, SIFT+SGMNet, SuperPoint+SuperGlue, SuperPoint+SGMNet and LoFTR. To compare different models, reliable benchmarks are needed. Many benchmarks have been developed for matching comparisons, for example FM-bench is an evaluation method that simulates another phase coordinate system by transforming the homography matrix to obtain the true value of matching pairs. The tilt angles of satellite imagery photos are usually small, and the homography matrix cannot represent local displacement. Therefore, this study designs another system using RPC (Rational Polynomial Coefficient) orthorectification to obtain the ground truth coordinates of matched pairs, and adds Gaussian blur, affine matrix and contrast adjustment for testing. In addition to testing the robustness of these models, the evaluation also compares the speed and memory of these models.
    The research goal is to choose a framework can applied on the orthorectification system. Therefore, the model must have good performance on outdoor remote sensing images, which is influenced by the training data and model design. Considering the efficiency, this research chooses SuperGlue and SGMNet as the training models and uses GL3D for training, the purpose is to improve the rotation adaptability and performance.

    摘要 i ABSTRACT iii 致謝 v CONTENTS vi LIST OF TABLES ix LIST OF FIGURES xi 1. INTRODUCTION 1 1.1. Motivation 1 1.2. Machine Learning Based Model 4 1.3. Characteristics of Satellite Images 5 1.4. Contribution 6 2. RELATED WORK 8 2.1. Transformer 8 2.1.1. Attentional Aggregation 8 2.2. Feature Extraction and Local Feature Matching 8 2.2.1. SIFT 8 2.2.2. SURF 12 2.2.3. FAST 13 2.3. Evaluation on Homography estimation 14 2.3.1. RANSAC 14 2.3.2. FM-Bench 14 2.3.3. AUC 15 3. METHODOLOGY 16 3.1. SuperPoint 16 3.1.1. Synthetic Pre-Train Model 17 3.1.2. Homographic Adaptation 17 3.1.3. Loss Function 18 3.2. SuperGlue 19 3.2.1. The keypoint encoder 21 3.2.2. Attentional Aggregation 22 3.2.1. Optimal matching layer 24 3.2.2. Loss function 24 3.3. SGMNet 24 3.3.1. NMS (Non-maximum suppression) 26 3.3.2. Seeding Module 26 3.4. LoFTR 27 3.4.1. Feature Extraction 28 4. EXPERIMENT RESULT AND DISCUSSION 30 4.1. Evaluation with RPC-trans Process 30 4.1.1. Runtime and Memory Test 31 4.1.2. RMSE Test and Distribution of Matching 33 4.1.3. Image Adjustment 35 4.1.4. Discussion 36 4.2. Training 37 4.2.1. Dataset 37 4.2.2. SuperGlue Training details 39 4.2.3. SGMNet Training details 40 4.3. Training Results 42 4.3.1. Testing 43 4.3.2. Time and GPU memory test 48 4.4. Model Modification 50 4.4.1. Non-Descriptor Experiment 50 4.4.2. Dimensionality reduction 52 5. CONCLUSION 54 5.1.1. Model Comparison 54 5.1.2. Model Training 54 5.1.3. Model Modification and Future Work 55 References 56 Appendix 59

    [1] Wu, L., Chang, Y. C., Lin, B. Y., Lin, C. H., Tseng, Y. H., Chang, L. Y., ... & Lee, Y. L. (2023). Self-supervised Deep-learning-based Image Matching for FORMOSAT Optical Satellite Image Orthorectification. Journal of Photogrammetry and Remote Sensing, 28(2), 63-81.
    [2] Lowe, D. G. (2004). Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 91–110.
    [3] Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Speeded-Up Robust Features (SURF). Computer Vision and Image Understanding, 346–359.
    [4] Alcantarilla, P., Nuevo, J., & Bartoli, A. (2013). Fast Explicit Diffusion for Accelerated Features in Nonlinear Scale Spaces.
    [5] Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. (2011). ORB: An efficient alternative to SIFT or SURF. 2011 International Conference on Computer Vision. Presented at the 2011 IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain.
    [6] Rosten, E., Porter, R., & Drummond, T. (2010). Faster and Better: A Machine Learning Approach to Corner Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 105–119.
    [7] Rosten, E., & Drummond, T. (2006). Machine Learning for High-Speed Corner Detection. In Computer Vision – ECCV 2006,Lecture Notes in Computer Science, 430–443.
    [8] Chiu, L.-C., Chang, T.-S., Chen, J.-Y., & Chang, N. Y.-C. (2013). Fast SIFT Design for Real-Time Visual Feature Extraction. IEEE Transactions on Image Processing, 3158–3167.
    [9] Tareen, S. A. K., & Saleem, Z. (2018). A comparative analysis of SIFT, SURF, KAZE, AKAZE, ORB, and BRISK. 2018 International Conference on Computing, Mathematics and Engineering Technologies (iCoMET).
    [10] DeTone, D., Malisiewicz, T., & Rabinovich, A. (2018). SuperPoint: Self-Supervised Interest Point Detection and Description. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
    [11] Sarlin, P.-E., DeTone, D., Malisiewicz, T., & Rabinovich, A. (2020). SuperGlue: Learning Feature Matching With Graph Neural Networks. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
    [12] Ignacio Rocco, Mircea Cimpoi, Relja Arandjelovic, Akihiko Torii, Tomas Pajdla, & Josef Sivic (2018). Neighbourhood Consensus Networks neural information processing systems.
    [13] Chen, H., Luo, Z., Zhang, J., Zhou, L., Bai, X., Hu, Z., … Quan, L. (2021). Learning to Match Features with Seeded Graph Matching Network.
    [14] Sun, J., Shen, Z., Wang, Y., Bao, H., & Zhou, X. (2021). LoFTR: Detector-Free Local Feature Matching with Transformers. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
    [15] Leutenegger, S., Chli, M., & Siegwart, R. Y. (2011). BRISK: Binary Robust invariant scalable keypoints. 2011 International Conference on Computer Vision.
    [16] Chiu, L.-C., Chang, T.-S., Chen, J.-Y., & Chang, N. Y.-C. (2013). Fast SIFT Design for Real-Time Visual Feature Extraction. IEEE Transactions on Image Processing, 3158–3167.
    [17] DeTone, D., Malisiewicz, T., & Rabinovich, A. (2016). Deep Image Homography Estimation. arXiv: Computer Vision and Pattern Recognition.
    [18] Bian, J.-W., Wu, Y.-H., Zhao, J., Liu, Y., Zhang, L., Cheng, M.-M., & Reid, IanR. (2019). An Evaluation of Feature Matchers for Fundamental Matrix Estimation. British Machine Vision Conference.
    [19] Derpanis, K. G. (2010). Overview of the RANSAC Algorithm. Image Rochester NY, 4(1), 2-3.
    [20] Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus. Communications of the ACM, 381–395.
    [21] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, AidanN., … Polosukhin, I. (2017). Attention is All you Need. Neural Information Processing Systems.
    [22] Wang, Z., & Ji, S. (2020). Second-Order Pooling for Graph Neural Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–1.
    [23] Brachmann, E., & Rother, C. (2019). Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses. 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
    [24] Caetano, TibérioS., McAuley, J., Cheng, L., Le, QuocV., & Smola, AlexanderJ. (2008). Learning Graph Matching. arXiv: Computer Vision and Pattern Recognition.
    [25] Cavalli, L., Larsson, V., Oswald, M. R., Sattler, T., & Pollefeys, M. (2020). Handcrafted Outlier Detection Revisited. In Computer Vision – ECCV 2020,Lecture Notes in Computer Science, 770–787.
    [26] Child, R., Gray, S., Radford, A., & Sutskever, I. (2019). Generating Long Sequences with Sparse Transformers. arXiv: Learning.
    [27] Cuturi, M. (2013). Sinkhorn Distances: Lightspeed Computation of Optimal Transport. Neural Information Processing Systems.
    [28] Viniavskyi, O., Dobko, M., Mishkin, D., & Dobosevych, O. (2022). OpenGlue: Open Source Graph Neural Net Based Pipeline for Image Matching.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE