簡易檢索 / 詳目顯示

研究生: 許嘉元
Xu, Jia-Yuan
論文名稱: 雙徑殘差網路之多場景單視角深度估計
Multiple Scene Monocular Depth Estimation with Dual-path Residual Networks
指導教授: 楊家輝
Yang, Jar-Ferr
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 38
中文關鍵詞: 單視角深度估計雙徑殘差網路學習殘差圖學習深度圖損失函數
外文關鍵詞: monocular depth estimation, dual-path residual network, learning residual map, learning depth map, loss function
相關次數: 點閱:85下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著3D技術的快速發展,人們得以享有更真實的視覺體驗。在3D技術的許多應用當中,若希望以較低的成本達成目標,需要使用深度圖進行輔助,因為深度圖表示影像中每一像素與觀賞者的距離。在電腦視覺的領域中,單視角深度估計是非常熱門的議題,其優勢是只需要單張影像作為輸入即可進行深度估計。為了應用於2D轉3D的任務,我們希望能夠獲得相對深度圖,即物件之間的前後相對關係。另外,我們期望訓練模型不要有資料依賴的問題,能應用於多種不同場景。因此我們提出單視角深度估計網路進行深度估計。所提出的雙徑殘差網路鼓勵資訊共享,共同使用學習殘差圖與學習深度圖的兩路徑,目的是獲得更精準的深度圖。另外,我們使用了數個損失函數進行訓練,並針對各個損失函數進行加權,以獲得符合預期的深度圖。本篇論文的實驗結果顯示出我們所提出的方法不僅能在對訓練資料集進行測試時取得高品質的深度圖,在對其他場景進行深度估計時依舊有很出色的表現。

    With the rapid development of 3D technology, people can enjoy a more realistic visual experience. In many applications of 3D technology, the use of depth maps, which represent the distance between each pixel in the image and the viewer, is necessary to achieve the goal at a lower cost. In the field of computer vision, monocular depth estimation is a very popular topic, with the advantage that only a single image is needed as input for depth estimation. For 2D to 3D tasks, we want to obtain relative depth maps, i.e., the anterior-posterior relationship between objects. In addition, we expect that the training model does not have data dependency problem and can be applied to many different scenes. We propose a monocular depth estimation network for depth estimation. The proposed dual-path residual network encourages information sharing by jointly using two paths of learning residual map and learning depth map, with the aim of obtaining a more accurate depth map. In addition, we use several loss functions for training and weight each loss function to obtain the expected depth map. The experimental results in this thesis show that our proposed method not only achieves high quality depth maps when testing the training dataset, but also performs well when estimating depths for other scenes.

    摘要 I Abstract II 誌謝 III Contents IV List of Tables VI List of Figures VI Chapter 1 Introduction 1 1.1 Research Background 1 1.2 Motivations 2 1.3 Literature Review 4 1.4 Thesis Organization 5 Chapter 2 Related Work 7 2.1 Monocular Depth Estimation 8 2.2 Encoder-decoder 9 2.3 Skip connection 10 2.4 DenseNet 10 2.5 Laplacian pyramid-based network 12 Chapter 3 The Proposed Dual Path Residual Network 14 3.1 Overview of the Proposed DPRN 14 3.2 Data Augmentation 16 3.3 Network Architecture 17 3.3.1 Feature Extractor 18 3.3.2 Geometric Up-sampling Block 20 3.3.3 Residual Up-sampling Block 21 3.4 Loss Function 22 Chapter 4 Experimental Results 26 4.1 Environmental Settings 26 4.2 Datasets and Implementation Details 26 4.3 Results of the proposed DPRN network 28 4.3.1 Ablation Study 29 4.3.2 Comparisons with Other Approaches 31 4.3.3 Testing on different datasets 32 Chapter 5 Conclusions 34 Chapter 6 Future Work 35 References 36

    [1] Zbontar, J., & LeCun, Y. (2015). Computing the stereo matching cost with a convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1592-1599).
    [2] Alatan, A. A., Yemez, Y., Gudukbay, U., Zabulis, X., Muller, K., Erdem, Ç. E., ... & Smolic, A. (2007). Scene representation technologies for 3DTV—A survey. IEEE Transactions on Circuits and Systems for Video Technology, 17(11), 1587-1605.
    [3] Cho, H., Jung, S. U., & Jee, H. K. (2017, March). Real-time interactive AR system for broadcasting. In 2017 IEEE Virtual Reality (VR) (pp. 353-354). IEEE.
    [4] Sharma, M., Chaudhury, S., & Lall, B. (2016). A novel hybrid kinect-variety-based high-quality multiview rendering scheme for glass-free 3D displays. IEEE Transactions on Circuits and Systems for Video Technology, 27(10), 2098-2117.
    [5] Wang, P., Shen, X., Lin, Z., Cohen, S., Price, B., & Yuille, A. L. (2015). Towards unified depth and semantic prediction from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2800-2809).
    [6] Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., & Navab, N. (2016, October). Deeper depth prediction with fully convolutional residual networks. In 2016 Fourth International Conference on 3D Vision (3DV) (pp. 239-248). IEEE.
    [7] Fu, H., Gong, M., Wang, C., Batmanghelich, K., & Tao, D. (2018). Deep ordinal regression network for monocular depth estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2002-2011).
    [8] Godard, C., Mac Aodha, O., & Brostow, G. J. (2017). Unsupervised monocular depth estimation with left-right consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 270-279).
    [9] Yin, Z., & Shi, J. (2018). Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1983-1992).
    [10] Zheng, C., Cham, T. J., & Cai, J. (2018). T2net: Synthetic-to-realistic translation for solving single-image depth estimation tasks. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 767-783).
    [11] Ren, H., El-Khamy, M., & Lee, J. (2019, June). Deep Robust Single Image Depth Estimation Neural Network Using Scene Understanding. In Proceedings of CVPR Workshops (pp. 37-45).
    [12] Kumari, S., Jha, R. R., Bhavsar, A., & Nigam, A. (2019, September). Autodepth: Single image depth map estimation via residual cnn encoder-decoder and stacked hourglass. In Proceedings of 2019 IEEE International Conference on Image Processing (ICIP) (pp. 340-344). IEEE.
    [13] Alhashim, I., & Wonka, P. (2018). High quality monocular depth estimation via transfer learning. arXiv preprint arXiv:1812.11941.
    [14] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770-778).
    [15] Ronneberger, O., Fischer, P., & Brox, T. (2015, October). U-net: Convolutional networks for biomedical image segmentation. In Proceedings of International Conference on Medical Image Computing and Computer-assisted Intervention (pp. 234-241). Springer, Cham.
    [16] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
    [17] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1-9).
    [18] Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4700-4708).
    [19] Jeon, J., & Lee, S. (2018). Reconstruction-based pairwise depth dataset for depth image enhancement using CNN. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 422-438).

    無法下載圖示 校內:2026-08-02公開
    校外:2026-08-02公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE