簡易檢索 / 詳目顯示

研究生: 許維真
Hsu, Wei-Chen
論文名稱: 使用級聯注意力網路於點雲圖中偵測人並結合RTAB-Map
Point Cloud-Based Person Detection Using Cascade Attention Network with RTAB-Map
指導教授: 連震杰
Lien, Jenn-Jier
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 人工智慧科技碩士學位學程
Graduate Program of Artificial Intelligence
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 69
中文關鍵詞: SLAMRTAB-Map點雲級聯注意力自注意力
外文關鍵詞: SLAM, RTAB-Map, Point Cloud, Cascade Attention, Self-Attention
相關次數: 點閱:76下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 自主移動機器人 (Autonomous Mobile Robot, AMR) 被廣泛應用在工廠、物流中心或醫院等地方,透過自主移動機器人運送物品可減少人力資源,自主移動機器人可以自主導航、避障、優化路線、即時性的調整路線。除此之外,工廠安全也是一項重要的問題。本論文主要包括兩部分:1) 將RTAB-Map (Real-Time Appearance-Based Mapping) 應用於漢錸自主移動機器人進行建圖和導航,2) 使用級聯注意力網路 (Cascade Attention Network) 於點雲圖中偵測人的位置。
    第一部分,RTAB-Map使用一顆2D 光達 (Light Detection and Ranging, LiDAR) 及一個RGB-D攝影機來建立地圖。2D 光達偵測範圍較RGB-D攝影機遠,可視範圍 (Field of View, FoV) 為360°,可產生較完整的2D地圖;RGB-D攝影機產生有顏色且稠密的點雲圖,受限於RGB-D攝影機的偵測範圍及可視範圍,僅能產生自主移動機器人前方10公尺範圍內的點雲圖。產生地圖之後,使用ROS (Robot Operating System) 的套件進行導航及避障,在導航期間透過ROS的通訊機制與機械手臂進行溝通,當自主移動機器人走到特定位置時,交由機械手臂進行貨物夾取。同時建立圖形使用者介面,能更簡便的使用RTAB-Map進行建圖與導航。
    第二部分為級聯注意力網路 (Cascade Attention Network) 於點雲圖中偵測人,高解析度3D光達成本較高,因此使用16線低解析度光達,透過Autoencoder將16線的點雲圖擴增成64線,達到高解析度光達的效果。標註點雲圖的ground truth需花費較多人力資源,本論文直接採用KITTI 資料集進行模型訓練,KITTI 為戶外資料集,本論文測試場域為室內,因此須將我們的資料集進行處理,以更貼近訓練資料集。使用兩階段的模型,第一階段先預測邊界框 (bounding box) 大略的位置,第二階段再把每一個邊界框進行調整,使用級聯 (cascade) 的概念調整邊界框兩次,以求達到更精確的預測結果。透過這個方式,可節省標註點雲圖的人力資源,並用較低成本的光達做到人的偵測。

    Autonomous Mobile Robots (AMRs) are extensively utilized in settings such as factories, logistics centers, and hospitals. The deployment of AMRs for material transportation reduces human resource requirements. AMRs are capable of autonomous decision-making and can optimize their routes, avoid obstacles, and adjust their actions based on real-time information. Additionally, ensuring safety within factory premises is of paramount importance. This thesis contains two main sections: 1) Applying RTAB-Map (Real-Time Appearance-Based Mapping) to the AMR developed by iAmech to do mapping and navigation, 2) point cloud-based person detection using cascade attention network.
    First part, RTAB-Map uses a 2D LiDAR and an RGB-D camera for mapping. The 2D LiDAR has a broader scanning range compared to the RGB-D camera, with a 360° Field of View (FoV), which can generate a more complete 2D map. The RGB-D camera produces colored and dense point cloud. However, due to limitations in the RGB-D camera's scanning range and FoV, it can only generate point cloud within a 10-meter range in front of the AMR. After generating the map, ROS (Robot Operating System) packages are employed for navigation and obstacle avoidance. During navigation, ROS communication interfaces with the robotic arm. When the AMR reaches specific locations, hand it over to the robotic arm to grasp the cargo. A graphical user interface is established to streamline RTAB-Map usage for mapping and navigation.
    Second part is point cloud-based person detection using cascade attention network. High-resolution 3D LiDARs are costly, thus a cost-effective approach of employing a 16-beam low-resolution LiDAR is adopted. Autoencoder is utilized to augment 16-beam point cloud to 64-beam resolution, emulating the effects of high-resolution LiDARs. Labeling ground truth of point cloud requires substantial human effort. Instead, we use the KITTI dataset for model training. As KITTI is an outdoor dataset while the testing environment is indoors, adjustments are made to align the testing data with the training data. By employing a two-stage model, the first stage predicts the rough position of the bounding box, while the second stage refines the position of each bounding box. This refinement is achieved using the concept of a cascade, adjusting the bounding boxes twice to attain a more precise prediction outcome. Through this methodology, the human effort for labeling point cloud is conserved, and person detection can be accomplished using a more cost-effective LiDAR.

    摘要 I Abstract II 誌謝 III List of Tables VII List of Figures VIII Chapter 1 Introduction 1 1.1 Motivation and Objective 1 1.2 Global Framework 2 1.3 Related Works 9 1.4 Contributions 11 Chapter 2 System Setup and Specification 13 2.1 System Setup 13 2.2 Hardware Specifications 14 Chapter 3 RTAB-Map 17 3.1 Scenario of RTAB-Map: One Example 17 3.2 Introduction of RTAB-Map 20 3.3 RTAB-Map Applications on ROS 24 3.4 GUI and Demo 31 Chapter 4 Point Cloud-Based Person Detection Using Cascade Attention Network 35 4.1 Training Framework of Point Cloud-Based Person Detection Using Cascade Attention Network 35 4.2 Testing Framework of Point Cloud-Based Person Detection Using Cascade Attention Network 48 4.3 Data Collection and Metrics 57 4.4 Experimental Results 60 Chapter 5 Conclusions and Future Works 66 Reference 68

    [1] M. LABBÉ and F. MICHAUD, “RTAB‐Map as an Open‐Source Lidar and Visual Simultaneous Localization and Mapping Library for Large‐Scale and Long‐Term Online Operation,” Journal of Field Robotics, pp. 416-446, 2019.
    [2] M. Quigley, et al, “ROS: an Open-Source Robot Operating System,” ICRA Workshop on Open Source Software, p. 5, 2009.
    [3] H. Wu, J. Deng, C. Wen, X. Li, C. Wang, and J. Li, “CasA: A Cascade Attention Network for 3D Object Detection from LiDAR Point Clouds,” IEEE Transactions on Geoscience and Remote Sensing, 2022.
    [4] T. Foote, “Tf: The Transform Library,” IEEE Conference on Technologies for Practical Robot Applications (TePRA), pp. 1-6, 2013.
    [5] T. Shan, J. Wang, F. Chen, P. Szenher, and B. Englot, “Simulation-Based Lidar Super-Resolution for Ground Vehicles,” Robotics and Autonomous Systems, 2020.
    [6] A. Vaswani, et al, “Attention Is All You Need,” Advances in Neural Information Processing Systems, 2017.
    [7] Z. Cai and N. Vasconcelos, “Cascade R-Cnn: Delving into High Quality Object Detection,” Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154-6162, 2018.
    [8] Z. Cai and N. Vasconcelos, “Cascade R-CNN: High Quality Object Detection and Instance Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1483-1498, 2019.
    [9] G. Grisetti, C. Stachniss, and W. Burgard, “Improving Grid-Based Slam with Rao-Blackwellized Particle Filters by Adaptive Proposals and Selective Resampling,” Proceedings of The 2005 IEEE International Conference on Robotics and Automation, pp. 2432-2437, 2005.
    [10] G. Grisetti, C. Stachniss, and W. Burgard, “Improved Techniques for Grid Mapping with Rao-Blackwellized Particle Filters,” IEEE Transactions on Robotics, 2007.
    [11] J. Zhang, and S. Singh, “LOAM: Lidar Odometry and Mapping in Real-Time,” Robotics: Science and Systems, pp. 1-9, 2014.
    [12] A. Hornung, K. M. Wurm, M. Bennewitz, C. Stachniss, and W. Burgard, “OctoMap: An Efficient Probabilistic 3D Mapping Framework Based on Octrees,” Autonomous Robots, 2013.
    [13] C.R. Qi, H. Su, K. Mo, and L.J. Guibas, “Pointnet: Deep Learning on Point Sets for 3d Classification and Segmentation,” Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition, pp. 652-660, 2017.
    [14] C.R. Qi, L. Yi, H. Su, and L.J. Guibas, “Pointnet++: Deep Hierarchical Feature Learning on Point Sets in A Metric Space,” Advances in Neural Information Processing Systems, 2017.
    [15] S. Shi, X. Wang, and H. Li, “PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud,” Proceedings of The IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770-779, 2019.
    [16] Y. Zhou and O. Tuzel, “Voxelnet: End-to-End Learning for Point Cloud Based 3d Object Detection,” Proceedings of The IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4490-4499, 2018.
    [17] Y. Yan, Y. Mao, and B. Li, “Second: Sparsely Embedded Convolutional Detection,” Sensors, 2018.
    [18] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Pointpillars: Fast Encoders for Object Detection from Point Clouds,” Proceedings of The IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12697-12705, 2019.
    [19] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision Meets Robotics: The Kitti Dataset,” The International Journal of Robotics Research, pp. 1231-1237, 2013.
    [20] “ROS/Concepts,” [Online]. Available: http://wiki.ros.org/ROS/Concepts.
    [21] S. C. Park, M. K. Park, and M. G. Kang, “Super-Resolution Image Reconstruction: A Technical Overview,” IEEE Signal Processing Magazine, pp. 21-36, 2003.
    [22] Y. Gal and Z. Ghahramani, “Dropout as A Bayesian Approximation: Representing Model Uncertainty in Deep Learning,” International Conference on Machine Learning, pp. 1050-1059, 2016.
    [23] J. Deng, S. Shi, P. Li, W. Zhou, Y. Zhang, and H. Li, “Voxel R-Cnn: Towards High Performance Voxel-Based 3d Object Detection,” Proceedings of The AAAI Conference on Artificial Intelligence, pp. 1201-1209, 2021.
    [24] S. Shi, C. Guo, L. Jiang, Z. Wang, J. Shi, X. Wang, and H. Li, “Pv-Rcnn: Point-Voxel Feature Set Abstraction for 3d Object Detection,” Proceedings of The IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10529-10538, 2020.

    無法下載圖示 校內:2028-08-27公開
    校外:2028-08-27公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE