| 研究生: |
蘇晃 Su, Huang |
|---|---|
| 論文名稱: |
基於ORB-SLAM3定位與神經輻射場的RGB-D關鍵幀三維場景重建 3D Scene Reconstruction from RGB-D Keyframes Using ORB-SLAM3 Positioning and Neural Radiance Fields |
| 指導教授: |
連震杰
Lien, Jenn-Jier 郭淑美 Guo, Shu-Mei |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2025 |
| 畢業學年度: | 114 |
| 語文別: | 英文 |
| 論文頁數: | 92 |
| 中文關鍵詞: | 同步定位與地圖構建 (SLAM) 、神經輻射場 (NeRF) 、三維高斯濺射 (3DGS) 、三維重建 、深度補全 、系統整合 |
| 外文關鍵詞: | Simultaneous Localization and Mapping (SLAM), Neural Radiance Fields (NeRF), 3D Gaussian Splatting (3DGS), 3D Reconstruction, Depth Completion, System Integration |
| 相關次數: | 點閱:32 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著擴增實境、機器人技術與數位分身的迅速發展,對於高保真度三維場景重建的需求日益增長。近年來,神經輻射場 (NeRF) 與三維高斯濺射 (3DGS) 作為兩種前沿的場景表示法,因其生成照片級真實感影像的巨大潛力,成為學術界與業界的焦點。然而,這兩種技術在渲染原理、訓練效率與記憶體消耗上各有優劣,且在應用於真實世界時,均面臨消費級感測器數據不完美的挑戰。
本碩論旨在應對上述議題,並包含兩項核心貢獻。第一項貢獻為系統性的文獻回顧與比較分析,我們深入探討了從傳統 SLAM 到 NeRF-based 與 3DGS-based SLAM 的技術演進脈絡,並對 NeRF 與 3DGS 在底層表示、渲染機制、性能優劣以及在 SLAM 應用中的挑戰進行了全面的統整與評述,為後續研究提供了清晰的理論基礎與發展方向。
本論文的第二項核心貢獻,在於設計並實現了一套完整的端到端高品質三維重建 SLAM 系統, 旨在解決消費級 RGB-D 感測器深度數據稀疏與充滿噪聲的關鍵問題。該系統整合了三個關鍵模組:(1) 前端採用穩定的 ORB-SLAM 負責即時相機姿態追蹤;(2) 中端為引入 「邊界一致性深度補全」 框架,該框架利用門控卷積網路,融合 RGB 影像與預測的幾何先驗,以生成稠密且邊界清晰的高品質深度圖;(3) 後端則將優化後的數據饋入 Co-SLAM 框架,以重建出細節豐富的隱式三維場景。
實驗結果證明,本論文所提出的整合系統,尤其是引入深度補全模組,能有效克服真實感測器的數據缺陷,顯著提升最終三維模型的幾何完整性與視覺品質,驗證了此端到端架構在處理真實世界不完美數據時的有效性與魯棒性。
With the rapid development of augmented reality, robotics, and digital twins, the demand for high-fidelity 3D scene reconstruction has been steadily increasing. In recent years, Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have emerged as two cutting-edge scene representations that have attracted significant attention in both academia and industry due to their strong potential for generating photo-realistic imagery. However, these two techniques differ in rendering principles, training efficiency, and memory consumption, and both face challenges when applied to real-world data captured by consumer-grade sensors.
This thesis aims to address these issues through two core contributions. The first contribution is a systematic literature review and comparative analysis. We trace the technological evolution from traditional SLAM to NeRF-based and 3DGS-based SLAM, and we provide a comprehensive synthesis and evaluation of NeRF and 3DGS in terms of their underlying representations, rendering mechanisms, advantages and limitations, as well as the challenges they face in SLAM applications. This establishes a clear theoretical foundation and development direction for future research.
The second core contribution of this thesis is the design and implementation of a complete end-to-end high-quality 3D reconstruction SLAM system, aimed at solving the problems of sparse and noisy depth data from consumer RGB-D sensors. This system integrates three key modules: (1) the frontend adopts a robust ORB-SLAM framework for real-time camera pose tracking; (2) the middle stage introduces a "boundary-consistent depth completion" framework that employs gated convolutional networks and fuses RGB images with predicted geometric priors to generate dense and boundary-preserving high-quality depth maps; (3) the backend incorporates the optimized data into the Co-SLAM framework to reconstruct geometrically detailed implicit 3D scenes.
Experimental results demonstrate that the proposed integrated system—especially with the introduction of the depth completion module—can effectively overcome the imperfections of real sensor data and significantly improve the geometric integrity and visual quality of the final 3D models. This validates the effectiveness and robustness of the end-to-end architecture in handling imperfect real-world data.
[1] C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. Montiel, and J. D. Tardós, "Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam," IEEE transactions on robotics, vol. 37, no. 6, pp. 1874–1890, 2021.
[2] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
[3] Y.-K. Huang, T.-H. Wu, Y.-C. Liu, and W. H. Hsu, "Indoor depth completion with boundary consistency and self-attention," in Proceedings of the IEEE/CVF international conference on computer vision workshops, 2019, pp. 0–0.
[4] N. Keetha et al., "Splatam: Splat track & map 3d gaussians for dense rgb-d slam," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 21357–21366.
[5] B. Kerbl, G. Kopanas, T. Leimkühler, and G. Drettakis, "3D Gaussian splatting for real-time radiance field rendering," ACM Trans. Graph., vol. 42, no. 4, pp. 139:1–139:14, 2023.
[6] Y. Li, N. Brasch, Y. Wang, N. Navab, and F. Tombari, "Structure-slam: Low-drift monocular slam in indoor environments," IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 6583–6590, 2020.
[7] Z. Li et al., "Neuralangelo: High-fidelity neural surface reconstruction," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 8456–8465.
[8] W. E. Lorensen and H. E. Cline, "Marching cubes: A high resolution 3D surface construction algorithm," in Seminal graphics: pioneering efforts that shaped the field, 1998, pp. 347–353.
[9] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, "Nerf: Representing scenes as neural radiance fields for view synthesis," Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021.
[10] T. Müller, A. Evans, C. Schied, and A. Keller, "Instant neural graphics primitives with a multiresolution hash encoding," ACM transactions on graphics (TOG), vol. 41, no. 4, pp. 1–15, 2022.
[11] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, "ORB-SLAM: A versatile and accurate monocular SLAM system," IEEE transactions on robotics, vol. 31, no. 5, pp. 1147–1163, 2015.
[12] R. A. Newcombe et al., "Kinectfusion: Real-time dense surface mapping and tracking," in 2011 10th IEEE international symposium on mixed and augmented reality, 2011: Ieee, pp. 127–136.
[13] S. Peng, M. Niemeyer, L. Mescheder, M. Pollefeys, and A. Geiger, "Convolutional occupancy networks," in European Conference on Computer Vision, 2020: Springer, pp. 523–540.
[14] E. Sucar, S. Liu, J. Ortiz, and A. J. Davison, "imap: Implicit mapping and positioning in real-time," in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 6229–6238.
[15] K. Tateno, F. Tombari, I. Laina, and N. Navab, "Cnn-slam: Real-time dense monocular slam with learned depth prediction," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 6243–6252.
[16] H. Wang, J. Wang, and L. Agapito, "Co-slam: Joint coordinate and sparse parametric encodings for neural real-time slam," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13293–13302.
[17] P. Wang, L. Liu, Y. Liu, C. Theobalt, T. Komura, and W. Wang, "Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction," arXiv preprint arXiv:2106.10689, 2021.
[18] C. Yan et al., "Gs-slam: Dense visual slam with 3d gaussian splatting," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 19595–19604.
[19] Y. Zhang and T. Funkhouser, "Deep depth completion of a single rgb-d image," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 175–185.
[20] Z. Zhu et al., "Nice-slam: Neural implicit scalable encoding for slam," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 12786–12796.