簡易檢索 / 詳目顯示

研究生: 莊智凱
Chuang, Chih-Kai
論文名稱: 多尺度相關性之精確視差估計立體匹配網路
Precise Disparity Estimation with Multiscale Correlation Stereo Matching Networks
指導教授: 楊家輝
Yang, Jar-Ferr
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 46
中文關鍵詞: 立體匹配網路多尺度視差圖相關性層擴張功能
外文關鍵詞: stereo matching networks, multiscale, the correlation layer, expansion function
相關次數: 點閱:67下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著3D技術的快速發展,在不同的應用當中,人們希望能夠達到更舒適且真實的視覺體驗。3D訊息是以較有效率的單視角影像加上深度圖方式描述,深度圖可表示每一像素到觀賞者的距離,而視差圖可以利用簡單數學等式轉換成深度圖。在電腦視覺研究中,雙視角的立體匹配演算法是非常重要的議題,其目的即為獲得最精確且符合立體視覺的深度圖。而近幾年深度學習在影像處理方面的成果非常卓越,研究的重心逐漸從傳統演算法轉變為卷積類神經網路的形式,因此,在本論文中,我們提出立體匹配網路來預測更準確的視差圖,希望能夠以相關性層的基礎出發,在不使用過多的計算成本下,達到更好的預測成果。我們提出基於多尺度相關性層的方法,來計算左右圖特徵的在不同視差下的相似程度,並利用擴張功能融合不同尺度下的相似分數。另外,為了獲得更精確的視差圖,我們會在不同尺度下預測初始視差圖並逐漸放大優化最後的結果。本篇論文的實驗結果顯示出比起其他立體匹配網路能以更少的計算時間與成本得到更精準且高品質的視差圖。

    With the rapid development of 3D technology, people hope to achieve a more comfortable and real visual experience in different applications. The 3D information can be effectively described by a texture image with its corresponding depth map, which represents the distance from each pixel to the viewer, and the disparity map can be converted into a depth map using simple mathematical equations. In the research of computer vision, the most important is to obtain the most accurate depth map. In recent years, deep learning has achieved excellent results in image processing. The focus of research has gradually changed from traditional algorithms to convolutional neural networks. Therefore, in this paper, we propose a stereo matching network to predict more accurate disparity map without using too much computation cost. We proposed a method based on multiscale correlation layers to calculate the similarity between features of left and right image under each disparity, and use the expansion function to fuse similarity scores at different scales. In addition, in order to obtain a more accurate disparity map, we predicted the initial disparity map at different scales, and gradually enhance and optimize the final result through the network. The experimental results show that compared with other stereo matching networks, a more accurate disparity map can be obtained with less calculation time and cost.

    摘要 I Abstract II 誌謝 III Contents IV List of Tables VII List of Figures VIII Chapter 1 Introduction 1 1.1 Research Background 1 1.2 Motivations 2 1.3 Literature Review 3 1.4 Thesis Organization 6 Chapter 2 Related Work 7 2.1 Concept of Stereo Matching 7 2.2 ResNet 9 2.3 Learning based stereo matching 11 2.3.1 3D convolution 11 2.3.2 Correlation layer 13 Chapter 3 The Proposed Multiscale Correlation Stereo Matching System 16 3.1 Overview of the Proposed MCSM System 17 3.2 Data Pre-processing 18 3.3 Network Architecture 19 3.3.1 Multiscale Residual Extractor 20 3.3.2 Multiscale Correlation Block 22 3.3.2.1 Multiscale Correlation 23 3.3.2.2 Expansion Function 23 3.3.3 Disparity Prediction Subnetwork 25 3.3.4 Refinement Network 27 3.4 Data Post-processing 28 3.5 Loss Function 28 Chapter 4 Experimental Results 31 4.1 Environmental Settings 31 4.2 Datasets and Implementation Details 32 4.3 Results of the proposed MCSM system 34 4.3.1 Performance of network prediction 34 4.3.2 Comparisons with Other Approaches 36 4.3.3 Ablation Study 38 Chapter 5 Conclusions 40 Chapter 6 Future Work 41 References 42

    [1] S. C. Chan, H. Shum and K. Ng, "Image-Based Rendering and Synthesis," in IEEE Signal Processing Magazine, vol. 24, no. 6, pp. 22-33, Nov. 2007, doi: 10.1109/MSP.2007.905702.
    [2] Q. H. Nguyen, M. N. Do and S. J. Patel, "Depth Image-Based Rendering with Low Resolution Depth," 2009 16th IEEE International Conference on Image Processing (ICIP), Cairo, 2009, pp. 553-556, doi: 10.1109/ICIP.2009.5413896.
    [3] C. Hsia, "Improved Depth Image-Based Rendering Using an Adaptive Compensation Method on an Autostereoscopic 3-D Display for a Kinect Sensor," in IEEE Sensors Journal, vol. 15, no. 2, pp. 994-1002, Feb. 2015, doi: 10.1109/JSEN.2014.2359225.
    [4] Y. A. Sheikh and M. Shah, "Trajectory Association across Multiple Airborne Cameras," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 361-367, Feb. 2008, doi: 10.1109/TPAMI.2007.70750.
    [5] H. Hirschmuller, "Stereo Vision in Structured Environments by Consistent Semi-Global Matching," 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), New York, NY, USA, 2006, pp. 2386-2393, doi: 10.1109/CVPR.2006.294.
    [6] Sing Bing Kang, R. Szeliski and Jinxiang Chai, "Handling Occlusions in Dense Multi-view Stereo," Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, Kauai, HI, USA, 2001, pp. I-I, doi: 10.1109/CVPR.2001.990462.
    [7] V. Q. Dinh, C. C. Pham and J. W. Jeon, "Matching Cost Function Using Robust Soft Rank Transformations," in IET Image Processing, vol. 10, no. 7, pp. 561-569, 7 2016, doi: 10.1049/iet-ipr.2015.0736.
    [8] J. Lu, S. Rogmans, G. Lafruit and F. Catthoor, "Stream-Centric Stereo Matching and View Synthesis: A High-Speed Approach on GPUs," in IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, no. 11, pp. 1598-1611, Nov. 2009, doi: 10.1109/TCSVT.2009.2026948.
    [9] Y. Boykov, O. Veksler and R. Zabih, "Fast Approximate Energy Minimization via Graph Cuts," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 11, pp. 1222-1239, Nov. 2001, doi: 10.1109/34.969114.
    [10] Jian Sun, Nan-Ning Zheng and Heung-Yeung Shum, "Stereo Matching Using Belief Propagation," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 7, pp. 787-800, July 2003, doi: 10.1109/TPAMI.2003.1206509.
    [11] X. Sun, X. Mei, S. Jiao, M. Zhou and H. Wang, "Stereo Matching with Reliable Disparity Propagation," 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission, Hangzhou, 2011, pp. 132-139, doi: 10.1109/3DIMPVT.2011.24.
    [12] D. Scharstein, R. Szeliski and R. Zabih, "A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms," Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001), Kauai, HI, USA, 2001, pp. 131-140, doi: 10.1109/SMBV.2001.988771.
    [13] J. Žbontar and Y. LeCun, "Computing the Stereo Matching Cost with A Convolutional Neural Network," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 1592-1599, doi: 10.1109/CVPR.2015.7298767.
    [14] A. Kendall et al., "End-to-End Learning of Geometry and Context for Deep Stereo Regression," 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 2017, pp. 66-75, doi: 10.1109/ICCV.2017.17.
    [15] J. Chang and Y. Chen, "Pyramid Stereo Matching Network," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp. 5410-5418, doi: 10.1109/CVPR.2018.00567.
    [16] F. Zhang, V. Prisacariu, R. Yang and P. H. S. Torr, "GA-Net: Guided Aggregation Net for End-To-End Stereo Matching," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 185-194, doi: 10.1109/CVPR.2019.00027.
    [17] Y. Zhang, Y. Chen, X. Bai, J. Zhou, K. Yu, Z. Li, & K. Yang (2019). “Adaptive Unimodal Cost Volume Filtering for Deep Stereo Matching,” arXiv preprint arXiv:1909.03751.
    [18] X. Guo, K. Yang, W. Yang, X. Wang and H. Li, "Group-Wise Correlation Stereo Network," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 3268-3277, doi: 10.1109/CVPR.2019.00339.
    [19] Z. Wu, X. Wu, X. Zhang, S. Wang and L. Ju, "Semantic Stereo Matching with Pyramid Cost Volumes," 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019, pp. 7483-7492, doi: 10.1109/ICCV.2019.00758.
    [20] A. Dosovitskiy et al., "FlowNet: Learning Optical Flow with Convolutional Networks," 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 2015, pp. 2758-2766, doi: 10.1109/ICCV.2015.316.
    [21] N. Mayer et al., "A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 4040-4048, doi: 10.1109/CVPR.2016.438.
    [22] X. Song, X. Zhao, H. Hu, & L. Fang, "Edgestereo: A Context Integrated Residual Pyramid Network for Stereo Matching, " Asian Conference on Computer Vision (pp. 20-35). Springer, Cham.
    [23] Z. Liang et al., "Learning for Disparity Estimation Through Feature Constancy," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp. 2811-2820, doi: 10.1109/CVPR.2018.00297.
    [24] J. Pang, W. Sun, J. S. Ren, C. Yang and Q. Yan, "Cascade Residual Learning: A Two-Stage Convolutional Neural Network for Stereo Matching," 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, 2017, pp. 878-886, doi: 10.1109/ICCVW.2017.108.
    [25] G. Yang, H. Zhao, J. Shi, Z. Deng, & J. Jia, "Segstereo: Exploiting Semantic Information for Disparity Estimation," Proceedings of the European Conference on Computer Vision (ECCV) (pp. 636-651).
    [26] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.
    [27] A. Krizhevsky, I. Sutskever, & G. E. Hinton, "Imagenet Classification with Deep Convolutional Neural Networks," Advances in neural information processing systems (pp. 1097-1105).
    [28] K. Simonyan, & A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," arXiv preprint arXiv:1409.1556.
    [29] C. Szegedy et al., "Going Deeper with Convolutions," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 1-9, doi: 10.1109/CVPR.2015.7298594.
    [30] A. Geiger, P. Lenz and R. Urtasun, "Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite," 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, 2012, pp. 3354-3361, doi: 10.1109/CVPR.2012.6248074.

    無法下載圖示 校內:2025-07-20公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE