| 研究生: |
胡家銘 Hu, Chia-Ming |
|---|---|
| 論文名稱: |
基於強化學習使用 CNN-Transformer 模型實現虛實整合之機械手臂夾取 Sim2Real Robot Arm Grasping Using Reinforcement Learning-Based CNN-Transformer |
| 指導教授: |
連震杰
Lien, Jenn-Jier |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 英文 |
| 論文頁數: | 61 |
| 中文關鍵詞: | 機械手臂控制 、強化學習 、鄰域注意力 、虛實整合 |
| 外文關鍵詞: | Robot Arm Control, Reinforcement Learning, Neighborhood Attention, Sim2Real |
| 相關次數: | 點閱:106 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
現今深度學習主要仰賴龐大的資料標註來進行,因此本論文旨在透過深度強化學習 (deep reinforcement learning) 的方法,使模型能夠自主和環境互動,以減少對於人工標註的需求。此外,使用機械手臂夾取不同的未知物體也是工業上一項具有挑戰性的問題,我們使用transformer模型來改善網路預測的準確率,並結合模擬的方式,來解決在深度強化學習中訓練過程需要大量回合數的問題,在本研究中主要分為兩部分:使用強化學習於機械手臂夾取與Sim2Real的架構。
在使用強化學習於機械手臂夾取方面,本研究使用CNN-Transformer模型利用RGB影像預測適合夾取的位置,機械手臂自動學習夾取未知物體,不需要額外的人力資源和時間來標記訓練資料,每次機械手臂做完動作後會從環境中獲得獎勵來更新策略,當回饋的獎勵是正向的,網路便會傾向選擇該動作,反之網路則會學習到該動作不適合當下狀態。其中,CNN-Transformer模型使用的是NAT和GR-ConvNet兩種模型,由NAT萃取出基於臨域注意力 (Neighborhood Attention) 的特徵,這些特徵連同RGB影像一同輸入到GR-ConvNet進行進一步的萃取和預測,這種模型組合不僅表現出較高的準確率,同時也具有較快的收斂速度。
在Sim2Real架構的方面,為節省在現實環境中訓練所需的時間和人力成本,本研究採用CoppeliaSim軟體產生虛擬環境,透過模擬的方法來完成強化學習於機械手臂夾取的任務,將預訓練好的模型移植到現實環境中使用或訓練,讓模型在初始化時有一定的預測能力。
相對於單獨使用GR-ConvNet模型,本研究提出的CNN-Transformer模型在虛擬環境中測試,夾取成功率由89% 提升到 94%,當該模型移植至現實環境中,其對於玩具模型和生活用品的夾取成功率分別達到了88.75% 和87% ,這表明該模型在不同物體上具有相當良好的泛用性。
In contemporary times, deep learning heavily relies on extensive data labeling. This study aims to enable models to interact with their environment autonomously by using deep reinforcement learning (DRL). Thereby reduce the reliance on manual labeling. Furthermore, using a robot arm to pick up unseen objects is also a challenging problem. We use a transformer model to enhance the accuracy of network predictions. And combine it with a simulation-to-real (Sim2Real) method to solve the issue of the numerous numbers of training iterations required in DRL. In this study, it is mainly divided into two parts: using reinforcement learning for object grasping and the framework of Sim2Real.
In the aspect of using reinforcement learning for object grasping. The grasping position are predicted from RGB images using the CNN-Transformer model. The robot arm autonomously learns to grasp unseen objects without the need for human resources and time to label training data. After each action performed by the robot arm, it will get a reward from the environment to update the policy. The CNN-Transformer model used in this study is a combination of two models, NAT and GR-ConvNet. NAT extracts features based on neighborhood attention. These extracted features are then merged with RGB images and input into GR-ConvNet for additional feature extraction and prediction. This hybrid model not only shows a higher success rate but also has faster convergence speed.
In the aspect of the Sim2Real framework, to save labeling cost and time required for training. This study used CoppeliaSim software to generate a virtual environment. Through simulation, the task of object grasping based on reinforcement learning was accomplished. The pretrained models were transferred from the virtual environment to the real environment so that the model has a certain predictive ability in the beginning of training.
Compared with using the GR-ConvNet model alone, the proposed CNN-Transformer model has increased the success rate from 89% to 94% in the virtual environment testing. When this model is transferred to the real environment, it achieved success rates of 88.75% for toy blocks and 87% for common household items. It shows that the model has quite good generality on different objects.
[1] S. Kumra, S. Josh and F. Sahin, “Learning Robotic Manipulation Tasks via Task Progress based Gaussian Reward and Loss Adjusted Exploration,” IEEE Robotics and Automation Letters (RA-L), VOL. 7, pp. 534-541, Jan 2022.
[2] S. Wang, Z. Zhou and Z. Kan, “When Transformer Meets Robotic Grasping: Exploits Context for Efficient Grasp Detection,” Sep 2022.
[3] A. Hassani, S. Walton, J. Li, S. Li and H. Shi, ” Neighborhood Attention Transformer,” Nov 2022.
[4] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov and L.C. Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” Computer Vision and Pattern Recognition (CVPR), pp. 4510-4520, 2018.
[5] H.V. Hasselt, A. Guez and D. Silver, “Deep Reinforcement Learning with Double Q-Learning,” Association for the Advancement of Artificial Intelligence (AAAI), 2016.
[6] A. Zeng, S. Song, S. Welker, J. Lee, A. Rodriguez and T. Funkhouser, “Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning”, Intelligent Robots and Systems (IROS), 2018.
[7] Y. Yang, H. Liang and C. Choi, “A Deep Learning Approach to Grasping the Invisible,” IEEE Robotics and Automation Letters (RA-L), 2020.
[8] K. Xu, H. Yu, Q. Lai, Y. Wang and R. Xiong “Efficient learning of goal-oriented push-grasping synergy in clutter”, IEEE/CAA Journal of Automatica Sinica, pp. 135-145, Oct 2021.
[9] Y. Yang, Z. Ni, M. Gao, J. Zhang and D. Tao, "Collaborative Pushing and Grasping of Tightly Stacked Objects via Deep Reinforcement Learning," in IEEE/CAA Journal of Automatica Sinica, vol. 9, no. 1, pp. 135-145, 2021.
[10] S. James and A. J. Davison, “Q-attention: Enabling Efficient Learning for Vision-based Robotic Manipulation”, IEEE Robotics and Automation Letters (RA-L), vol. 7, pp. 1612-1619, 2022.
[11] A. Hassani and H. Shi, “Dilated Neighborhood Attention Transformer”, arXiv:2209.15001, 2022.
[12] W. Zhao1, J. P. Queralta and T. Westerlund, “Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey”, IEEE Symposium Series on Computational Intelligence (SSCI), pp. 737-744, 2020.
[13] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser and I. Polosukhin, “Attention Is All You Need,” In Advances in Neural Information Processing Systems. 5998–6008, Dec 2017
[14] F. P. Audonnet, A. Hamilton and G. A.-Camarasa, “A Systematic Comparison of Simulation Software for Robotic Arm Manipulation using ROS2,” 2022
[15] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra and M. Riedmiller, “Playing Atari with Deep Reinforcement Learning,” NIPS, Dec 2023
[16] S. Hofer, K. Bekris, A. Handa, J. C. Gamboa, M. Mozifian, F. Golemo, C. Atkeson, D. Fox, K. Goldberg, J. Leonard, et al., “Sim2real in robotics and automation: Applications and challenges,” IEEE transactions on automation science and engineering, vol. 18, no. 2, pp. 398–400, 2021.
[17] H. v. Hasselt, A. Guez and D. Silver, “Deep Reinforcement Learning with Double Q-learning,” Association for the Advancement of Artificial Intelligence (AAAI), Dec 2015.
[18] F. Zhuang, Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, H. Xiong and Q. He, “A Comprehensive Survey on Transfer Learning,” Jun 2020.
[19] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel and D. Hassabis, “Mastering the Game of Go with Deep Neural Networks and Tree Search,” Nature, pp. 484-489, 2016.
[20] R.Y. Tsai and R.K. Lenz, “A New Technique for Fully Autonomous and Efficient 3D Robotics Hand-Eye Calibration,” IEEE Transactions on Robotics and Automation, pp. 345-358, 1989.
[21] Y. Deng, X. Guo, Y. Wei, K. Lu, B. Fang, D. Guo, H. Liu and F. Sun, “Deep Reinforcement Learning for Robotic Pushing and Picking in Cluttered Environment,” International Conference on Intelligent Robots and Systems (IROS), Nov 2019.
[22] A. Hundt, B. Killen, N. Greene, H. Wu, H. Kwon, C. Paxton and G. D Hager, “Good Robot: Efficient Reinforcement Learning for Multi-Step-Visual Tasks with Sim to Real Transfer,” in Proceedings of the IEEE Conferences Robotics and Automation Letters, pp. 6724-6731, 2020.
[23] L. Berscheid, P. Meißner and T. Kröger, “Self-supervised Learning for Precise Pick-and-place without Object Model,” IEEE Robotics and Automation Letters (RA-L), Jun 2020.
[24] M. Shridhar, L. Manuelli and D. Fox, “Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation,” CoRL, Sep 2022.
[25] A. Iriondo, E. Lazkano, L. Susperregi, J. Urain, A. Fernandez and J. Molina, “Pick and Place Operations in Logistics Using a Mobile Manipulator Controlled with Deep Reinforcement Learning,” January 2019.
[26] H.-G. Cao, W. Zeng, and I-C. Wu, “Reinforcement Learning for Picking Cluttered General Objects with Dense Object Descriptors,” International Conference on Robotics and Automation (ICRA), May 2022.
[27] L. Meng, M. Goodwin, A. Yazidi and P. Engelstad, “Deep Reinforcement Learning with Swin Transformers,” Jun 2023.
校內:2028-08-25公開