| 研究生: |
許維真 Hsu, Wei-Chen |
|---|---|
| 論文名稱: |
使用級聯注意力網路於點雲圖中偵測人並結合RTAB-Map Point Cloud-Based Person Detection Using Cascade Attention Network with RTAB-Map |
| 指導教授: |
連震杰
Lien, Jenn-Jier |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 人工智慧科技碩士學位學程 Graduate Program of Artificial Intelligence |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 英文 |
| 論文頁數: | 69 |
| 中文關鍵詞: | SLAM 、RTAB-Map 、點雲 、級聯注意力 、自注意力 |
| 外文關鍵詞: | SLAM, RTAB-Map, Point Cloud, Cascade Attention, Self-Attention |
| 相關次數: | 點閱:76 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
自主移動機器人 (Autonomous Mobile Robot, AMR) 被廣泛應用在工廠、物流中心或醫院等地方,透過自主移動機器人運送物品可減少人力資源,自主移動機器人可以自主導航、避障、優化路線、即時性的調整路線。除此之外,工廠安全也是一項重要的問題。本論文主要包括兩部分:1) 將RTAB-Map (Real-Time Appearance-Based Mapping) 應用於漢錸自主移動機器人進行建圖和導航,2) 使用級聯注意力網路 (Cascade Attention Network) 於點雲圖中偵測人的位置。
第一部分,RTAB-Map使用一顆2D 光達 (Light Detection and Ranging, LiDAR) 及一個RGB-D攝影機來建立地圖。2D 光達偵測範圍較RGB-D攝影機遠,可視範圍 (Field of View, FoV) 為360°,可產生較完整的2D地圖;RGB-D攝影機產生有顏色且稠密的點雲圖,受限於RGB-D攝影機的偵測範圍及可視範圍,僅能產生自主移動機器人前方10公尺範圍內的點雲圖。產生地圖之後,使用ROS (Robot Operating System) 的套件進行導航及避障,在導航期間透過ROS的通訊機制與機械手臂進行溝通,當自主移動機器人走到特定位置時,交由機械手臂進行貨物夾取。同時建立圖形使用者介面,能更簡便的使用RTAB-Map進行建圖與導航。
第二部分為級聯注意力網路 (Cascade Attention Network) 於點雲圖中偵測人,高解析度3D光達成本較高,因此使用16線低解析度光達,透過Autoencoder將16線的點雲圖擴增成64線,達到高解析度光達的效果。標註點雲圖的ground truth需花費較多人力資源,本論文直接採用KITTI 資料集進行模型訓練,KITTI 為戶外資料集,本論文測試場域為室內,因此須將我們的資料集進行處理,以更貼近訓練資料集。使用兩階段的模型,第一階段先預測邊界框 (bounding box) 大略的位置,第二階段再把每一個邊界框進行調整,使用級聯 (cascade) 的概念調整邊界框兩次,以求達到更精確的預測結果。透過這個方式,可節省標註點雲圖的人力資源,並用較低成本的光達做到人的偵測。
Autonomous Mobile Robots (AMRs) are extensively utilized in settings such as factories, logistics centers, and hospitals. The deployment of AMRs for material transportation reduces human resource requirements. AMRs are capable of autonomous decision-making and can optimize their routes, avoid obstacles, and adjust their actions based on real-time information. Additionally, ensuring safety within factory premises is of paramount importance. This thesis contains two main sections: 1) Applying RTAB-Map (Real-Time Appearance-Based Mapping) to the AMR developed by iAmech to do mapping and navigation, 2) point cloud-based person detection using cascade attention network.
First part, RTAB-Map uses a 2D LiDAR and an RGB-D camera for mapping. The 2D LiDAR has a broader scanning range compared to the RGB-D camera, with a 360° Field of View (FoV), which can generate a more complete 2D map. The RGB-D camera produces colored and dense point cloud. However, due to limitations in the RGB-D camera's scanning range and FoV, it can only generate point cloud within a 10-meter range in front of the AMR. After generating the map, ROS (Robot Operating System) packages are employed for navigation and obstacle avoidance. During navigation, ROS communication interfaces with the robotic arm. When the AMR reaches specific locations, hand it over to the robotic arm to grasp the cargo. A graphical user interface is established to streamline RTAB-Map usage for mapping and navigation.
Second part is point cloud-based person detection using cascade attention network. High-resolution 3D LiDARs are costly, thus a cost-effective approach of employing a 16-beam low-resolution LiDAR is adopted. Autoencoder is utilized to augment 16-beam point cloud to 64-beam resolution, emulating the effects of high-resolution LiDARs. Labeling ground truth of point cloud requires substantial human effort. Instead, we use the KITTI dataset for model training. As KITTI is an outdoor dataset while the testing environment is indoors, adjustments are made to align the testing data with the training data. By employing a two-stage model, the first stage predicts the rough position of the bounding box, while the second stage refines the position of each bounding box. This refinement is achieved using the concept of a cascade, adjusting the bounding boxes twice to attain a more precise prediction outcome. Through this methodology, the human effort for labeling point cloud is conserved, and person detection can be accomplished using a more cost-effective LiDAR.
[1] M. LABBÉ and F. MICHAUD, “RTAB‐Map as an Open‐Source Lidar and Visual Simultaneous Localization and Mapping Library for Large‐Scale and Long‐Term Online Operation,” Journal of Field Robotics, pp. 416-446, 2019.
[2] M. Quigley, et al, “ROS: an Open-Source Robot Operating System,” ICRA Workshop on Open Source Software, p. 5, 2009.
[3] H. Wu, J. Deng, C. Wen, X. Li, C. Wang, and J. Li, “CasA: A Cascade Attention Network for 3D Object Detection from LiDAR Point Clouds,” IEEE Transactions on Geoscience and Remote Sensing, 2022.
[4] T. Foote, “Tf: The Transform Library,” IEEE Conference on Technologies for Practical Robot Applications (TePRA), pp. 1-6, 2013.
[5] T. Shan, J. Wang, F. Chen, P. Szenher, and B. Englot, “Simulation-Based Lidar Super-Resolution for Ground Vehicles,” Robotics and Autonomous Systems, 2020.
[6] A. Vaswani, et al, “Attention Is All You Need,” Advances in Neural Information Processing Systems, 2017.
[7] Z. Cai and N. Vasconcelos, “Cascade R-Cnn: Delving into High Quality Object Detection,” Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154-6162, 2018.
[8] Z. Cai and N. Vasconcelos, “Cascade R-CNN: High Quality Object Detection and Instance Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1483-1498, 2019.
[9] G. Grisetti, C. Stachniss, and W. Burgard, “Improving Grid-Based Slam with Rao-Blackwellized Particle Filters by Adaptive Proposals and Selective Resampling,” Proceedings of The 2005 IEEE International Conference on Robotics and Automation, pp. 2432-2437, 2005.
[10] G. Grisetti, C. Stachniss, and W. Burgard, “Improved Techniques for Grid Mapping with Rao-Blackwellized Particle Filters,” IEEE Transactions on Robotics, 2007.
[11] J. Zhang, and S. Singh, “LOAM: Lidar Odometry and Mapping in Real-Time,” Robotics: Science and Systems, pp. 1-9, 2014.
[12] A. Hornung, K. M. Wurm, M. Bennewitz, C. Stachniss, and W. Burgard, “OctoMap: An Efficient Probabilistic 3D Mapping Framework Based on Octrees,” Autonomous Robots, 2013.
[13] C.R. Qi, H. Su, K. Mo, and L.J. Guibas, “Pointnet: Deep Learning on Point Sets for 3d Classification and Segmentation,” Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition, pp. 652-660, 2017.
[14] C.R. Qi, L. Yi, H. Su, and L.J. Guibas, “Pointnet++: Deep Hierarchical Feature Learning on Point Sets in A Metric Space,” Advances in Neural Information Processing Systems, 2017.
[15] S. Shi, X. Wang, and H. Li, “PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud,” Proceedings of The IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770-779, 2019.
[16] Y. Zhou and O. Tuzel, “Voxelnet: End-to-End Learning for Point Cloud Based 3d Object Detection,” Proceedings of The IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4490-4499, 2018.
[17] Y. Yan, Y. Mao, and B. Li, “Second: Sparsely Embedded Convolutional Detection,” Sensors, 2018.
[18] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Pointpillars: Fast Encoders for Object Detection from Point Clouds,” Proceedings of The IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12697-12705, 2019.
[19] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision Meets Robotics: The Kitti Dataset,” The International Journal of Robotics Research, pp. 1231-1237, 2013.
[20] “ROS/Concepts,” [Online]. Available: http://wiki.ros.org/ROS/Concepts.
[21] S. C. Park, M. K. Park, and M. G. Kang, “Super-Resolution Image Reconstruction: A Technical Overview,” IEEE Signal Processing Magazine, pp. 21-36, 2003.
[22] Y. Gal and Z. Ghahramani, “Dropout as A Bayesian Approximation: Representing Model Uncertainty in Deep Learning,” International Conference on Machine Learning, pp. 1050-1059, 2016.
[23] J. Deng, S. Shi, P. Li, W. Zhou, Y. Zhang, and H. Li, “Voxel R-Cnn: Towards High Performance Voxel-Based 3d Object Detection,” Proceedings of The AAAI Conference on Artificial Intelligence, pp. 1201-1209, 2021.
[24] S. Shi, C. Guo, L. Jiang, Z. Wang, J. Shi, X. Wang, and H. Li, “Pv-Rcnn: Point-Voxel Feature Set Abstraction for 3d Object Detection,” Proceedings of The IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10529-10538, 2020.
校內:2028-08-27公開