簡易檢索 / 詳目顯示

研究生: 陳祿文
Chen, Lu-Wen
論文名稱: 高效深度表示的深度補全
Depth Completion for Efficient Depth Representation
指導教授: 楊家輝
Yang, Jar-Ferr
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 57
中文關鍵詞: 深度學習深度估計深度補全自編碼器
外文關鍵詞: deep learning, depth estimation, depth completion, autoencoder
相關次數: 點閱:121下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,3D 廣播系統的發展因其具有徹底改變娛樂、虛擬實境、機器人感知和自動駕駛 等領域的潛力而受到廣泛關注。傳統的 3D 數據傳輸方法需要大量數據,導致流程耗時且佔 用資源。在這些應用程式中,深度資訊對於創建準確的 3D 表示至關重要。為了提高資料傳 輸效率,我們採用深度補全技術,預測並填入感測器遺失的深度資料。我們可透過利用少數 的稀疏深度圖和 RGB 影像,利用網路模型以重建密集深度圖,減少傳輸所需的數據,同時 保持品質和準確性。我們的研究透過複雜的模型和有效的深度採樣策略來推進深度補全技術。 透過優化稀疏深度點的取樣,我們的方法可以使用相同數量的傳輸點重建更準確的深度圖。 這增強了深度補全的效率和有效性,在保證高品質深度資訊的同時減輕了資料傳輸負擔。我 們提出了一種使用單一影像和稀疏深度輸入的深度補全架構,結合了高效的自動編碼器設計 和注意機制。另外,我們引入了一種新穎的基於物件的採樣策略來提高深度採樣效率。根據 我們的實驗和討論,我們專注於水平深度資訊和均勻分佈,以確保深度輸入的各個稀疏程度 的有效性。我們的方法展示了深度補全在資料傳輸和相關用途的實用且高效的應用,為優化 深度採樣以實現更有效的深度圖重建提供了寶貴的見解。

    The development of 3D broadcasting systems has garnered significant attention due to their potential to revolutionize fields such as entertainment, virtual reality, robotic perception, and autonomous driving. Traditional methods of 3D data transmission require large amounts of data, making the process time-consuming and resource-intensive. Depth information is crucial in these applications for creating accurate 3D representations. To improve data transmission efficiency, we adopt depth completion techniques, which predict and fill in missing depth data from sensors. By leveraging sparse depth maps and RGB images, the designed models can reconstruct dense depth maps to reduce the depth data needed for transmission while maintaining quality and accuracy. Our research advances depth completion technology through sophisticated models and effective depth sampling strategies. By optimizing the sampling of sparse depth points, our approach reconstructs more accurate depth maps with the same number of transmitted points. This enhances the efficiency and effectiveness of depth completion, reducing the data transmission burden while ensuring high quality depth information. We propose a depth completion architecture using a single image and sparse depth input, incorporating an efficient autoencoder design and attention mechanisms. Additionally, we introduce a fixed grid sampling strategy to enhance depth sampling efficiency. Based on our experiments and discussions, we focus on horizontal depth information and uniform distribution to ensure effectiveness across various levels of sparsity in depth inputs. Our approach demonstrates a practical and efficient application of depth completion for data transmission and related uses, providing valuable insights into optimizing depth sampling for more effective depth map reconstruction.

    摘要 I Abstract II 誌謝 IV Contents V Chapter 1 Introduction 1 1.1 Research Background 1 1.2 Motivations 2 1.3 Thesis Organization 3 Chapter 2 Related Work 4 2.1 RGB based Depth Estimation 5 2.1.1 Autoencoder 6 2.1.2 FastDepth [3] 7 2.2 Depth Completion 9 2.2.1 SparseFormer [1] 10 2.2.2 Guided-Attention [2] 13 Chapter 3 The Proposed Depth Completion Network and Sparse Depth Sampling Strategy 16 3.1 Overview of the Proposed Depth Completion Network 16 3.2 Proposed Autoencoder Architecture 18 3.3 Encoder 20 3.4 Share Decoder 21 3.4.1 Up-Attention Block 22 3.4.2 Scale and Place Block 24 3.4.3 Depth Fusion Block 26 3.5 Training Loss Functions 27 3.6 Sparse Depth Sampling Strategy 28 3.6.1 Random Sampling [14] 29 3.6.2 Fixed Grid Sampling 30 Chapter 4 Experiment Results 32 4.1 Training and Evaluation Setup 32 4.1.1 Training Setting 32 4.1.2 Datasets 33 4.1.3 Data Augmentation 34 4.1.4 Evaluation Metrics 34 4.2 Ablation Studies 35 4.2.1 Effectiveness of Each Block in Decoder 35 4.2.2 Effectiveness of Channel and Spatial Attention in Up-Attention Block 36 4.3 Comparison with Different Sparsity Level 37 4.4 Visual Comparison of Predicted Depth Maps 38 4.5 Comparison with Different Sampling Strategies 39 4.5.1 Random Sampling Versus Fixed Grid Sampling 39 4.5.2 Fixed Grid Subsampling Versus Its Extension 41 Chapter 5 Conclusions 44 Chapter 6 Future Work 45 References 47

    [1] Frederik Warburg, Michael Ramamonjisoa, and Manuel López-Antequera. Sparseformer: Attention-based depth completion network. arXiv preprint arXiv:2206.04557, 2022.
    [2] Kyeongha Rho, Jinsung Ha, and Youngjung Kim. Guideformer: Transformers for image guided depth completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6250–6259, 2022.
    [3] D. Wofk, F. Ma, T.-J. Yang, S. Karaman, and V. Sze, “Fastdepth: Fast monocular depth estimation on embedded systems,” in 2019 International Conference on Robotics and Automation (ICRA). IEEE, pp. 6101–6108, 2019.
    [4] H. Hirschmuller, “Stereo processing by semiglobal matching and mutual information,” in IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), pp. 328-341, 2007
    [5] T. Kanade and M. Okutomi, “A stereo matching algorithm with an adaptive window: theory and experiment,” in IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), pp. 920-932, 1994
    [6] David Eigen, Christian Puhrsch, and Rob Fergus. Depth map prediction from a single image using a multi-scale deep network, in: Advances in Neural Information Processing Systems, pages 2366–2374, 2014
    [7] C. Godard, O. Mac Aodha, G.J. Brostow, Unsupervised monocular depth estimation with left-right consistency, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 270–279, 2017
    [8] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
    [9] Y. Xu, X. Zhu, J. Shi, G. Zhang, H. Bao, and H. Li, “Depth completion from sparse lidar data with depth-normal constraints,” in ICCV, 2019.
    [10] Jie Tang, Fei-Peng Tian, Wei Feng, Jian Li, and Ping Tan. Learning guided convolutional network for depth completion. In IEEE TIP, pages 1116–1129, 2020.
    [11] Park, J., Joo, K., Hu, Z., Liu, C.K., Kweon, I.S.. Non-local spatial propagation network for depth completion, in: ECCV, 2020
    [12] M. Jaritz, R. de Charette, E. Wirbel, X. Perrotton, and F. Nashashibi,“Sparse and dense data with cnns: Depth completion and semantic segmentation,” in 3DV, 2018
    [13] Fangchang Ma, Guilherme Venturelli Cavalheiro, and Sertac Karaman. Self-supervised sparse-to-dense: Self-supervised depth completion from lidar and monocular camera. In ICRA, pages 3288–3295. IEEE, 2019.
    [14] Fangchang Ma and Sertac Karaman. Sparse-to-dense: Depth prediction from sparse depth samples and a single image. In ICRA, 2018.
    [15] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In ECCV, pages 630–645, 2016.
    [16] Mu Hu, Shuling Wang, Bin Li, Shiyu Ning, Li Fan, and Xiaojin Gong. Penet: Towards precise and efficient image guided depth completion. In ICRA, pages 13656–13662. IEEE, 2021
    [17] Lina Liu, Xibin Song, Xiaoyang Lyu, Junwei Diao, Mengmeng Wang, Yong Liu, and Liangjun Zhang. Fcfr-net: Feature fusion based coarse- to-fine residual learning for depth completion. In AAAI, volume 35, pages 2136–2144, 2021.
    [18] Danish Nazir, Marcus Liwicki, Didier Stricker, and Muhammad Zeshan Afzal. Semattnet: Towards attention-based semantic aware guided depth completion. arXiv preprint arXiv:2204.13635, 2022.
    [19] Jiaxiong Qiu, Zhaopeng Cui, Yinda Zhang, Xingdi Zhang, Shuaicheng Liu, Bing Zeng, and Marc Pollefeys. Deeplidar: Deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image. In CVPR, pages 3313–3322, 2019
    [20] Wouter Van Gansbeke, Davy Neven, Bert De Brabandere, and Luc Van Gool. Sparse and noisy lidar completion with rgb guidance and uncertainty. In MVA, pages 1–6. IEEE, 2019.
    [21] Yinda Zhang and Thomas Funkhouser. Deep depth completion of a single rgb-d image. In CVPR, pages 175–185, 2018.
    [22] S. Liu, S. De Mello, J. Gu, G. Zhong, M.-H. Yang, and J. Kautz, “Learning affinity via spatial propagation networks,”in Advances in Neural Information Processing Systems, vol. 30, 2017.
    [23] X. Cheng, P. Wang, and R. Yang, ``Learning depth with convolutional spatial propagation network,'' IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 10, pp. 2361-2379, 2019.
    [24] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021
    [25] Youmin Zhang, Xianda Guo, Matteo Poggi, Zheng Zhu, Guan Huang, and Stefano Mattoccia. Completionformer: Depth completion with convolutions and vision transformers. In CVPR, pages 18527–18536, 2023.
    [26] Andrea Conti, Matteo Poggi, and Stefano Mattoccia. Sparsity agnostic depth completion. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5871–5880, 2023.
    [27] J. Tang, F.-P. Tian, B. An, J. Li, P. Tan, Bilateral propagation network for depth completion, in: CVPR, 2024
    [28] Hsuan Tsai, Jar-Ferr Yang. Low Resolution to High Precision Depth Estimation for MR Glasses, 2023
    [29] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alche-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32, pages 8026–8037. Curran Associates, Inc., 2019
    [30] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
    [31] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in European Conference on Computer Vision. Springer, pp. 746–760, 2012.
    [32] Abdelrahman Eldesokey, Michael Felsberg, Karl Holmquist, and Michael Persson. Uncertainty-aware cnns for depth completion: Uncertainty from beginning to end. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12011–12020, 2020

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE