簡易檢索 / 詳目顯示

研究生: 王瑞評
Wang, Ruei-Ping
論文名稱: 雙向引導擴散式神經網路應用於立體視覺影像匹配
Dual-GDNet: Dual Guided-diffusion Network for Stereo Image Dense Matching
指導教授: 林昭宏
Lin, Chao-Hung
學位類別: 碩士
Master
系所名稱: 工學院 - 測量及空間資訊學系
Department of Geomatics
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 71
中文關鍵詞: 立體視覺影像匹配深度卷積神經網路
外文關鍵詞: Stereo matching, Deep Convolutional Neural Network
相關次數: 點閱:138下載:8
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在3D 資訊重建中,立體視覺影像密匹配為關鍵步驟之一,其在攝影測量和計算機視覺領域方面,仍然是一項艱鉅的任務。除了基於罩窗的匹配之外,在機器學習的最新研究中,其通過使用深度卷積神經網絡(DCNN),在密匹配取得了很大進展。雙向引導擴散式神經網絡(Dual-GDNet),於本研究中提出,其不僅在網絡設計和訓練過程中,採用由左影像至右影像的傳統匹配架構,並於訓練中加入由右至左的左右一致性檢測之特性,意於減少錯誤匹配的機率。此外,抑制型回歸(Suppressed Regression) 亦於本研究中提出,其通過在視差回歸前,去除無關機率信息來估計視差,避免在多峰機率分布中,估計出不屬於任何一峰的視差結果。Dual-GDNet中的左右一致性概念,可應用於現有的DCNN 模型,以進一步改善視差估計。為了評估各新型神經網路設計之性能,選擇了GANet作為骨幹和主要比較對象,並採納Scene Flow 和KITTI 2015 立體數據集,做為訓練與評估模型的資料集。實驗結果證明,在端點誤差(EPE Error)、大於一之像素誤差比率(Error Rate) 和top2誤差方面,與相關模型比較之下,表現較為優異,其中Scene Flow 數據集的改進為2-10%,KITTI 2015 數據集的改進為2-8%。

    Stereo dense matching which plays a key role in 3D reconstruction still remains a challenging task in photogrammetry and computer vision. In addition to block-based matching, recent studies based on machine learning have achieved great progress in stereo dense matching by using deep convolutional neural networks (DCNN). In this paper, a novel neural network called dual guided-diffusion network (Dual-GDNet) is proposed, which utilizes not only left-to-right but right-to-left image matchings in the network design and training with a consistentization process to reduce the possibilities of mis-matching. In addition, suppressed regression is proposed to refine disparity estimation by removing unrelated information before regression to prevent ambiguous predictions on multi-peaks probability distributions. The proposed Dual-GDNet can be applied to existing DCNN models for further improvement on disparity estimation. To estimate the performance, GA-Net is selected as the backbone, and the model was evaluated on the stereo datasets including Scene Flow and KITTI 2015. Experimental results demonstrate the superior, in terms of end-point-error, > 1 pixel error rate, and top-2 error, of the proposed model, compared with related models. An improvement of 2-10% on Scene Flow and 2-8% on KITTI 2015 datasets were obtained.

    摘要 - i Abstract - ii 致謝 - iii Table of Contents - iv List of Tables - vi List Figures - vii Chapter 1. Introduction - 1 Chapter 2. Related Work - 5 Chapter 3. Background - 7 Semi-Global Matching (SGM) - 7 Matching cost calculation - 7 Cost aggregation - 17 Disparity computation/optimization - 19 Disparity refinement - 19 GANet - 21 Feature extraction - 22 Cost volume construction - 24 Guidance network - 25 Semi-global aggregation (SGA) - 26 Local guided aggregation (LGA) - 29 Disparity regression - 29 Chapter 4. Methodology - 32 Dual-GDNet - 34 Visible and invisible pixels problem - 36 Probability distribution with higher generalizability - 38 Flipped training - 39 Guided Diffusion Layer (GD) - 44 Training in Cross Entropy with Softmax - 46 Suppressed Regression - 50 Chapter 5. Experimental Results - 53 Evaluation on Scene Flow Dataset - 54 Evaluation on KITTI 2015 Dataset - 56 Effect on Flipped Training - 64 Chapter 6. Conclusions - 68 References - 69

    Atienza, R. (2018). Fast disparity estimation using dense networks. CoRR,abs/1805.07499. Retrieved from http://arxiv.org/abs/1805.07499

    Žbontar, J., & LeCun, Y. (2015). Stereo matching by training a convolutional neural network to compare image patches. CoRR, abs/1510.05970. Retrieved from http://dblp.uni-trier.de/db/journals/corr/corr1510.html#ZbontarL15

    Chang, J., & Chen, Y. (2018). Pyramid stereo matching network. CoRR,abs/1803.08669. Retrieved from http://arxiv.org/abs/1803.08669

    Chen, J., & Yuan, C. (2016, 09). Convolutional neural network using multiscale information for stereo matching cost computation. In (p. 34243428). doi: 10.1109/ICIP.2016.7532995

    Cheng, F., He, X., & Zhang, H. (2017, 08). Learning to refine depth for robust stereo estimation. Pattern Recognition, 74. doi: 10.1016/j.patcog.2017.07.027

    Cheng, X., Wang, P., & Yang, R. (2018). Learning depth with convolutional spatial propagation network. arXiv preprint arXiv:1810.02695

    Godard, C., Aodha, O. M., & Brostow, G. J. (2016). Unsupervised monocular depth estimation with leftright consistency. CoRR, abs/1609.03677. Retrieved from http://arxiv.org/abs/1609.03677

    He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep residual learning for image recognition. CoRR, abs/1512.03385. Retrieved from http://arxiv.org/abs/1512.03385

    Hinton, G. E., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. ArXiv, abs/1503.02531

    Hirschmüller, H. (2008). Stereo Processing by SemiGlobal Matching and Mutual Information. IEEE TRANSACTIONS ON PATTERN 69 doi:10.6844/NCKU202002215 ANALYSIS AND MACHINE INTELLIGENCE, 30(2), 328–341

    Huang, G., Liu, Z., & Weinberger, K. Q. (2016). Densely connected convolutional networks. CoRR, abs/1608.06993. Retrieved from http://arxiv.org/abs/1608.06993

    Kang, J., Chen, L., Deng, F., & Heipke, C. (2019, 09). Context pyramidal network for stereo matching regularized by disparity gradients. ISPRS Journal of Photogrammetry and Remote Sensing, 157. doi:10.1016/j.isprsjprs.2019.09.012

    Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., & Bry, A. (2017). Endtoend learning of geometry and context for deep stereo regression. CoRR, abs/1703.04309. Retrieved from http://arxiv.org/abs/1703.04309

    Kingma, D., & Ba, J. (2014, 12). Adam: A method for stochastic optimization. International Conference on Learning Representations

    Liu, S., De Mello, S., Gu, J., Zhong, G., Yang, M.H., & Kautz, J. (2017). Learning affinity via spatial propagation networks. In I. Guyon et al. (Eds.), Advances in neural information processing systems 30 (pp. 1520–1530). Curran Associates, Inc. Retrieved from http://papers.nips.cc/paper/6750-learning-affinity-via-spatial-propagation-networks.pdf

    Luo, W., Schwing, A. G., & Urtasun, R. (2016). Efficient deep learning for stereo matching. In 2016 ieee conference on computer vision and pattern recognition (cvpr) (p. 56955703)

    Mayer, N., Ilg, E., Häusser, P., Fischer, P., Cremers, D., Dosovitskiy, A., & Brox, T. (2015). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. CoRR, abs/1512.02134. Retrieved from http://arxiv.org/abs/
    1512.02134

    Menze, M., & Geiger, A. (2015). Object scene flow for autonomous vehicles. In 2015 ieee conference on computer vision and pattern recognition (cvpr) (p. 30613070)

    Mozerov, M., & Weijer, J. (2015, 01). Accurate stereo matching by twostep energy minimization. IEEE transactions on image processing : a 70 doi:10.6844/NCKU202002215 publication of the IEEE Signal Processing Society, 24. doi: 10.1109/TIP.2015.2395820

    PierrotDeseilligny, M., & Paparoditis, N. (2006). A multiresolution and optimizationbased image matching approach: An application to surface reconstruction from SPOT5HRS stereo imagery. The International Archives of Photogrammetry, Remote Sensing andSpatial Information Sciences, 36(1/W41), 328–341

    Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(13),7–42

    Seki, A., & Pollefeys, M. (2017). A. Seki and M. Pollefeys. Sgmnets:Semiglobal matching with neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6640–6649

    Wolf, P. R., & Dewitt, B. A. (2000). Elements of Photogrammetry: with applications in GIS. New York: McGrawHill, 30

    Zhang, F., Prisacariu, V., Yang, R., & Torr, P. H. (2019). GANet:Guided Aggregation Net for Endtoend Stereo Matching. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 185–194

    下載圖示 校內:2022-08-19公開
    校外:2022-08-19公開
    QR CODE