| 研究生: | 李季穎 Li, Ji-Ying | 
|---|---|
| 論文名稱: | 基於迭代最近點算法的視覺里程計之邊緣感知的選點策略與滑動估計 Edge-aware Sampling Scheme and Sliding Estimation for ICP-based Visual Odometry | 
| 指導教授: | 謝明得 Shieh, Ming-Der | 
| 學位類別: | 碩士 Master | 
| 系所名稱: | 電機資訊學院 - 電機工程學系 Department of Electrical Engineering | 
| 論文出版年: | 2020 | 
| 畢業學年度: | 108 | 
| 語文別: | 英文 | 
| 論文頁數: | 57 | 
| 中文關鍵詞: | 視覺里程計 、同步定位與地圖建構 、迭代最近點演算法 、擴增實境 、邊緣偵測 | 
| 外文關鍵詞: | Iterative closest point (ICP), Visual odometry (VO), Simultaneous localization and mapping (SLAM), Augmented reality (AR), Edge-based method. | 
| 相關次數: | 點閱:135 下載:0 | 
| 分享至: | 
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 | 
    在視覺里程計(Visual Odometry, VO)或同步定位與地圖建構(SLAM)等研究領域中,能做到在未知環境下追蹤手持相機,進而實現擴增實境(AR)應用。AR使用者可以察覺虛擬物件微小的震動,所以保持良好的相機追蹤是一大挑戰。另外,當取得新的影像時,AR系統需要即時地更新虛擬物件,所以追蹤演算法有足夠低的複雜度以符合時間需求。因此,我們基於精準度與複雜度提出了一個VO系統。由於KinectFusion[3]的成果給於我們深刻的印象,我們的系統參考了它,只使用深度相機並且以迭代最近點(Iterative Closest Point, ICP)演算法來追蹤相機。 
    因為ICP演算法會受到輸入點雲的結構信息影響,加上效率考量,我們系統採用邊緣選點的策略去保留影像上大部分的結構性信息[4]。然而,為符合即時運算的需求而採用的快速邊緣偵測演算法[2],在某些時候會遺漏重要的結構信息,因此我們在邊緣偵測演算法中加入隨機的概念,去保留住在某些時候所遺漏的結構信息,與在多數情形下也有此邊緣偵測演算法的優點。
    另外,對於追蹤系統來說,需要判斷ICP的結果有無發散,才能立即避免因發散所造成不可挽回的結果。在缺乏結構信息的場景下,所以ICP演算法有時會無法收斂。一個評斷的穩定性指標稱作條件數,它可以用來辨別ICP解出來的旋轉平移是否穩定,但它卻不足以直接評斷ICP收斂與否,因為就算判定旋轉平移不穩定,其實在多數情況,ICP並不會發散。所以用條件數評斷ICP發散,將有很多誤判為發散的情形。因此我們基於條件數加上相機的移動去提出評估發散的指標,來降低誤判為發散的機率。
    最後基於我們開發的方法提出了一個VO系統,並在公開數據集下獲得能與其他系統競爭的數據。另外,我們也將此VO系統整合進熱門SLAM系統Elasticfusion[5]裡。	
    In visual odometry or SLAM, tracking of a handheld camera within an unknown environment in real-time promises opportunities for augmented reality (AR) application. A main challenge is to keep tracking as well as possible because AR users can perceive the vibration of virtual object in real world with a little tracking error. Meanwhile, the time complexity of tracking algorithm must be low to reach real-time constraint, otherwise the update of AR system cannot be immediate when new visual data arrive. Therefore, we presented a visual odometry system pursuing accuracy and low time complexity. Inspired from KinectFusion[3], the presented VO using depth camera alone is based on iterative closest point (ICP) algorithm, which is a common algorithm of registration problem. 
    A factor affecting ICP is the structural constraint from the geometry, so our work, for efficiency, is based on the edge-based sampling strategy, which retain the more of the important structure [4] from depth image. However, the edge from fast-edge detection[2], is adopted for reach real-time constraint, sometimes misses the important structure. Therefore, we extended the fast-edge detection by random concept to avoid the situation of structure missing as well as persist the advantage of edge in most situations.
    In addition, it is important for VO system to identify whether the ICP divergence of not because divergent result leads to irreversible trajectory. the divergence of ICP sometimes still exists even if edge-based strategy does not miss any structure because the camera captures in structureless scene. A measure of stability, condition number, can be used to identify such situation; however, the ICP, in fact, is not always divergent in the situation, so identifying the ICP divergence based on condition number has large among of false positive case. Hence, we extended the condition number by motion factor using constraint analysis to reduce the false positive rate.
    Finally, we proposed a visual odometry system with our developed edge-aware sampling scheme and sliding estimation, and it obtain competitive performance comparing with other the state-of-the-art visual odometry systems on public benchmarking. Further, we integrated the proposed VO as a building block in the popular SLAM system, Elasticfusion[5].
[1]	Hartley, R. and A. Zisserman, Multiple view geometry in computer vision. 2003: Cambridge university press.
[2]	Bose, L. and A. Richards. Fast depth edge detection and edge based RGB-D SLAM. in 2016 IEEE international conference on robotics and automation (ICRA). 2016. IEEE.
[3]	Newcombe, R.A., et al. KinectFusion: Real-time dense surface mapping and tracking. in 2011 10th IEEE International Symposium on Mixed and Augmented Reality. 2011. IEEE.
[4]	Choi, C., A.J. Trevor, and H.I. Christensen. RGB-D edge detection and edge-based registration. in 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. 2013. IEEE.
[5]	Whelan, T., et al. ElasticFusion: Dense SLAM without a pose graph. 2015. Robotics: Science and Systems.
[6]	Davison, A.J. Real-time simultaneous localisation and mapping with a single camera. in null. 2003. IEEE.
[7]	Klein, G. and D. Murray. Parallel tracking and mapping for small AR workspaces. in 2007 6th IEEE and ACM international symposium on mixed and augmented reality. 2007. IEEE.
[8]	Mur-Artal, R., J.M.M. Montiel, and J.D. Tardos, ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE transactions on robotics, 2015. 31(5): p. 1147-1163.
[9]	Mur-Artal, R. and J.D. Tardós, Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Transactions on Robotics, 2017. 33(5): p. 1255-1262.
[10]	Engel, J., T. Schöps, and D. Cremers. LSD-SLAM: Large-scale direct monocular SLAM. in European conference on computer vision. 2014. Springer.
[11]	Newcombe, R.A., S.J. Lovegrove, and A.J. Davison. DTAM: Dense tracking and mapping in real-time. in 2011 international conference on computer vision. 2011. IEEE.
[12]	Newcombe, R.A. and A.J. Davison. Live dense reconstruction with a single moving camera. in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2010. IEEE.
[13]	Stühmer, J., S. Gumhold, and D. Cremers. Real-time dense geometry from a handheld camera. in Joint Pattern Recognition Symposium. 2010. Springer.
[14]	Engel, J., J. Sturm, and D. Cremers. Semi-dense visual odometry for a monocular camera. in Proceedings of the IEEE international conference on computer vision. 2013.
[15]	Kerl, C., J. Sturm, and D. Cremers. Dense visual SLAM for RGB-D cameras. in 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. 2013. IEEE.
[16]	Zhang, Z., A flexible new technique for camera calibration. IEEE Transactions on pattern analysis and machine intelligence, 2000. 22(11): p. 1330-1334.
[17]	Ma, Y., et al., An invitation to 3-d vision: from images to geometric models. Vol. 26. 2012: Springer Science & Business Media.
[18]	Sturm, J., et al. A benchmark for the evaluation of RGB-D SLAM systems. in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. 2012. IEEE.
[19]	Thrun, S., Probabilistic robotics. Communications of the ACM, 2002. 45(3): p. 52-57.
[20]	Besl, P.J. and N.D. McKay. Method for registration of 3-D shapes. in Sensor fusion IV: control paradigms and data structures. 1992. International Society for Optics and Photonics.
[21]	Holz, D., et al., Registration with the point cloud library: A modular framework for aligning in 3-D. IEEE Robotics & Automation Magazine, 2015. 22(4): p. 110-124.
[22]	Rusinkiewicz, S. and M. Levoy. Efficient variants of the ICP algorithm. in Proceedings third international conference on 3-D digital imaging and modeling. 2001. IEEE.
[23]	Gelfand, N., et al. Geometrically stable sampling for the ICP algorithm. in Fourth International Conference on 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings. 2003. IEEE.
[24]	Nguyen, C.V., S. Izadi, and D. Lovell. Modeling kinect sensor noise for improved 3d reconstruction and tracking. in 2012 second international conference on 3D imaging, modeling, processing, visualization & transmission. 2012. IEEE.
[25]	Mitra, N.J. and A. Nguyen. Estimating surface normals in noisy point cloud data. in Proceedings of the nineteenth annual symposium on Computational geometry. 2003.
[26]	Arun, K.S., T.S. Huang, and S.D. Blostein, Least-squares fitting of two 3-D point sets. IEEE Transactions on pattern analysis and machine intelligence, 1987(5): p. 698-700.
[27]	Low, K.-L., Linear least-squares optimization for point-to-plane icp surface registration. Chapel Hill, University of North Carolina, 2004. 4(10): p. 1-3.
[28]	Lin, T.-Y., Efficient Point Selection and Matching for Real-time ICP-based Visual Odometry, in 電機工程學系. 2018, 國立成功大學: 台南市. p. 61.
[29]	Mallick, T., P.P. Das, and A.K. Majumdar, Characterizations of noise in Kinect depth images: A review. IEEE Sensors journal, 2014. 14(6): p. 1731-1740.
[30]	Simon, D.A., Fast and accurate shape-based registration. 1996: Carnegie Mellon University Pittsburgh, Pennsylvania.
[31]	Scaramuzza, D. and F. Fraundorfer, Visual odometry [tutorial]. IEEE robotics & automation magazine, 2011. 18(4): p. 80-92.
[32]	Triggs, B., et al. Bundle adjustment—a modern synthesis. in International workshop on vision algorithms. 1999. Springer.
[33]	Schonberger, J.L. and J.-M. Frahm. Structure-from-motion revisited. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
[34]	Lu, F. and E. Milios, Globally consistent range scan alignment for environment mapping. Autonomous robots, 1997. 4(4): p. 333-349.
[35]	Henry, P., et al., RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments. The International Journal of Robotics Research, 2012. 31(5): p. 647-663.
[36]	Strasdat, H., J. Montiel, and A.J. Davison, Scale drift-aware large scale monocular SLAM. Robotics: Science and Systems VI, 2010. 2(3): p. 7.
[37]	Borgefors, G., Distance transformations in digital images. Computer vision, graphics, and image processing, 1986. 34(3): p. 344-371.
[38]	Vitter, J.S., Faster methods for random sampling. Communications of the ACM, 1984. 27(7): p. 703-718.
[39]	Fawcett, T., An introduction to ROC analysis. Pattern recognition letters, 2006. 27(8): p. 861-874.
[40]	Horn, B.K., Closed-form solution of absolute orientation using unit quaternions. Josa a, 1987. 4(4): p. 629-642.
[41]	Babu, B.W., et al. σ-dvo: Sensor noise model meets dense visual odometry. in 2016 IEEE international symposium on mixed and augmented reality (ISMAR). 2016. IEEE.
[42]	Schops, T., T. Sattler, and M. Pollefeys. Bad slam: Bundle adjusted direct rgb-d slam. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2019.
[43]	Prakhya, S.M., et al. Sparse depth odometry: 3D keypoint based pose estimation from dense depth data. in 2015 IEEE international conference on robotics and automation (ICRA). 2015. IEEE.
[44]	Yan, Z., M. Ye, and L. Ren, Dense visual SLAM with probabilistic surfel map. IEEE transactions on visualization and computer graphics, 2017. 23(11): p. 2389-2398.
[45]	Leutenegger, S., et al., Keyframe-based visual–inertial odometry using nonlinear optimization. The International Journal of Robotics Research, 2015. 34(3): p. 314-334.
[46]	Sun, K., et al., Robust stereo visual inertial odometry for fast autonomous flight. IEEE Robotics and Automation Letters, 2018. 3(2): p. 965-972.
[47]	Kendall, A., M. Grimes, and R. Cipolla. Posenet: A convolutional network for real-time 6-dof camera relocalization. in Proceedings of the IEEE international conference on computer vision. 2015.
[48]	Eigen, D. and R. Fergus. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. in Proceedings of the IEEE international conference on computer vision. 2015.
[49]	McCormac, J., et al. Semanticfusion: Dense 3d semantic mapping with convolutional neural networks. in 2017 IEEE International Conference on Robotics and automation (ICRA). 2017. IEEE.
[50]	Davison, A., FutureMapping: The Computational Structure of Spatial AI Systems. 2018.
 校內:2025-08-01公開
                                        校內:2025-08-01公開