| 研究生: |
楊善 Yang, Shan |
|---|---|
| 論文名稱: |
使用基於截斷符號距離的卷積神經網路於機器手臂散堆取物 Robot Arm Grasping for a Pile of Objects Using Truncated Signed Distance Function-Based Convolutional Neural Network |
| 指導教授: |
連震杰
Lien, Jenn-Jier |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2024 |
| 畢業學年度: | 112 |
| 語文別: | 英文 |
| 論文頁數: | 108 |
| 中文關鍵詞: | 虛實整合 、截斷符號距離函數 、物件偵測 、機械手臂夾取位置預測 |
| 外文關鍵詞: | Sim-To-Real integration, Truncated Signed Distance Function (TSDF), Object Detection, Robotic Arm Grasp Position Prediction |
| 相關次數: | 點閱:57 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究旨在解決倉庫自動化系統中定位系統的難題,提出了一種結合ArUco marker檢測和深度學習物件檢測的三維視覺導引機械手臂系統,並引入使用Sim-To-Real的方式來產生大量的訓練資料來進行物件偵測深度學習模型的訓練,來提升機械手臂夾取和放置未知物體的能力。本文分為兩個主要部分:自動化夾取及擺放系統和基於截斷符號距離的卷積神經網路在複雜任務中的應用。對於機械手臂想要使用模型來偵測夾取位置來說,要夾取物品通常需要大量的訓練資料像是2D影像的夾取位置標記,但是如何蒐集成夾取資料集中成了一個大難題。在自動倉儲系統中,我們透過在貨物架上貼上ArUco marker來定位貨物架位置,並使用裝有RGB-D相機的機械手臂偵測這些標記,建立貨物架座標系統,保存貨物架和貨物格的位置資訊。透過自動倉儲系統,除了將存取貨物的誤差保持在10毫米以內,也可以加快存取貨物的速度與降低人力資源需求。在雜亂環境中進行抓取檢測需要機器人根據不完整影像和雜訊來推理三維場景。本研究認為三維重建和抓取學習是兩個密切相關的任務,都需要對局部幾何細節有精細的理解,故本論文使用Sim-To-Real的方式來產生大量的訓練資料來進行物件偵測深度學習模型的訓練,並結合基於截斷符號距離的卷積神經網路接受每一個場景的截斷符號距離函數(TSDF)建成的物體模型,經過模型後直接輸出每個體素的預測抓取質量、夾爪方向、開口寬度和物體占用機率。我們在模擬環境中基於自監督抓取試驗數據訓練模型,並在雜亂物體清理任務中進行評估,機器人逐個抓取清理物體。模擬和實驗結果顯示,使用隱式神經表示和抓取適應性及三維重建的聯合學習,實現了最先進的抓取效果,我們的方法能夠在僅需1秒內完成抓取規劃,並且在實際的雜亂環境物體清理實驗中。其實時性能力使得閉環抓取規劃成為可能,允許機器人在處理干擾、從錯誤中恢復並提高魯棒性方面表現更佳。
This study aims to address the challenge of localization in warehouse automation systems by proposing a 3D vision-guided robotic arm system that combines ArUco marker detection and deep learning object detection. To enhance the robotic arm's ability to grasp and place unknown objects, we introduce a Sim-To-Real approach to generate a large amount of training data for the object detection deep learning model. The paper is divided into two main parts: the application of a Convolutional Neural Network (CNN) based on Truncated Signed Distance Functions (TSDF) and the automation of grasping and placing systems in complex tasks. For a robotic arm to use a model to detect grasping positions, a significant amount of training data, such as 2D image annotations of grasping positions, is typically required. However, collecting such data poses a significant challenge. Grasp detection in cluttered environments requires the robot to infer the 3D scene from incomplete images and noise. This study posits that 3D reconstruction and grasp learning are two closely related tasks that both necessitate a fine-grained understanding of local geometric details. Therefore, we use a Sim-To-Real approach to generate a large amount of training data for the object detection deep learning model. We combine this with a CNN based on TSDF, which accepts TSDF representations of each scene to build object models. The model directly outputs predicted grasp quality, gripper orientation, opening width, and object occupancy probability for each voxel. We trained the model on self-supervised grasp trial data in a simulated environment and evaluated it on a cluttered object clearing task, where the robot grasps and clears objects one by one. Simulation and experimental results show that the use of implicit neural representations and joint learning of grasp affordance and 3D reconstruction achieved state-of-the-art grasping performance. Our method can complete grasp planning in just one second and perform effectively in real-world cluttered environment experiments. Its real-time capability enables closed-loop grasp planning, allowing the robot to better handle disturbances, recover from errors, and improve robustness. In the automated storage system, we use ArUco markers placed on the shelves to locate the frame positions. The robotic arm equipped with an RGB-D LiDAR camera detects these markers, establishing a coordinate system for the shelves and saving the positions of the shelves and compartments. Through this automated storage system, we can maintain an error of less than 10 millimeters in accessing goods, speed up the retrieval process, and reduce the need for human resources.
[1]D. Morrison, P. Corke, and J. Leitner, “Closing the Loop for Robotic Grasping: A real-time, Generative Grasp Synthesis Approach,” Robotics: Science and System (RSS), 2018.
[2] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4510-4520, 2018.
[3] A. Howard, M. Sandler, M, G. Chu, L.-C. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, Q. Le, and H. Adam, “Searching for MobileNetV3,” 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1314-1324, 2019.
[4] H. Sung, and S. Lee, “A Robot-camera Hand/eye Self-calibration System Using a Planar Target,” IEEE Intelligence and Safety for Robotics (ISR), pp. 1-4, 2013.
[5] R. Newbury, M. Gu, L. Chumbley, A. Mousavian, and C. Eppner, “Deep Learning Approaches to Grasp Synthesis: A Review,” IEEE Transactions on Robotics, pp. 3994-4015, 2023.
[6] M.-C. Chiu, H.-Y. Tsai, and J.-E Chiu, “A Knovel Directional Object Detection Method for Piled Objects Using a Hybrid Region-Based Convolutional Neural Network,” ScienceDirect Advanced Engineering Informatics, 2022.
[7] M. Breyer, J.-J. Chung, L. Ott, R. Siegwart, and J. Nieto, “Volumetric Grasping Network: Real-time 6 DOF Grasp Detection in Clutter,” Conference on Robot Learning (CoRL),pp. 1602-1611, 2020.
[8] D. Werner, A. AlHamadi, and P. Werner, “ Truncated Signed Distance Function: Experiments on Voxel Size,” International Conference on Image Analysis and Recognition (ICIAR), pp. 357-364, 2014
[9] Z. Jiang, Y. Zhu, M. Svetlik, K. Fang, and Y. Zhu, “Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations,” Robotics: Science and Systems(RSS)
[10] S. Peng, M. Niemeyer, Lars Mescheder, M. Pollefeys, and Andreas Geiger, “Convolutional Occupancy Networks,” European Conference on Computer Vision(Eccv), pp 523 – 540, 2020
[11] Wei He, “Flat, Stack and Piles of Objects Detection for Grasping System Using Modified RGB-D MobileNetV3,” 2022.
[12] O. Ronneberger, P. Fischer, and T. Brox, “ U-net: Convolutional networks for biomedical image segmentation, ” In Medical Image Computing and Computer-Assisted Intervention(MICCAI), pp. 234-241, 2015.
[13] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, 2016.
[14] B. Calli, S. Srinivasa, A. Singh, P. Abbeel, A. Walsman, and A. M. Dollar, “The YCB Object and Model Set: Towards Common Benchmarks for Manipulation Research,” International Conference on Advanced Robotics (ICAR), 2015.
[15] A. Singh, J. Sha, K. S. Narayan, T. Achim, and P. Abbeel, “BigBIRD: A Large-Scale 3D Database of Object Instances,” International Conference on Advanced Robotics (ICAR), 2014.
校內:2029-08-22公開