| 研究生: |
陳亞伶 Chen, Ya-Ling |
|---|---|
| 論文名稱: |
基於電腦視覺與深度增強式學習之工業用機械手臂物件取放任務研究 Study on Object Pick-and-Place Tasks of Industrial Manipulator Based on Computer Vision and Deep Reinforcement Learning |
| 指導教授: |
鄭銘揚
Cheng, Ming-Yang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 中文 |
| 論文頁數: | 83 |
| 中文關鍵詞: | 六軸工業機械手臂 、深度增強式學習 、柔性演員評論家 、YOLO 、物件偵測 、物件拾取 、取放作業 |
| 外文關鍵詞: | 6-DOF industrial manipulator, Deep reinforcement learning, Soft Actor-Critic(SAC), Object detection, YOLO, Object grasping, Pick-and-place tasks |
| 相關次數: | 點閱:207 下載:32 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著產業發展趨勢逐漸朝向自動化、聰明化及精密化,少量多樣化的生產需求已成為常態,一個靈活且多功能之彈性生產製造系統才能滿足上述需求。有鑑於此,本論文將發展一具備自我學習能力、能適應少量多樣的製造生產方式且具有高物件取放成功率之工業用機械手臂控制技術。此技術主要是藉由電腦視覺偵測出感興趣物件,再由深度增強式學習決定此物件在影像平面中之拾取點位置。其中,電腦視覺是使用YOLO演算法偵測影像中的物件並加以辨識;而深度增強式學習演算法則是根據YOLO演算法的辨識結果,使用柔性演員評論家(Soft Actor-Critic, SAC)來決定物件拾取點位置。為了減少訓練時間及訓練成本,本論文使用V-REP機器人模擬器來訓練深度增強式學習演算法,讓六軸工業機械手臂能在模擬環境中透過自我學習決定任意擺放物件的拾取影像座標點。最後,將其訓練成果應用於真實六軸工業機械手臂拾取系統中,並對多種物件進行拾取及分類實驗。實驗結果證實本論文所提出之架構,在少量多樣物件的作業條件下,亦能達到高取放成功率。
With the trend of industry development gradually moving towards goals such as automation, smartness and precision , the need f or small volume large variety production is commonly with in industry. In order to meet such a need a more flexible and versatile manufacturing system is essential. In light of such demands, this thesis aims to develop a control technique for industrial robot manipulators that is capable of self-learning, suitable for small-volume large-variety production, and has a high success rate in object grasping/pick-and-place tasks. In particular, computer vision based on the YOLO (You Only Look Once) algorithm is used to detect and recognize object of interest, while a deep reinforcement learning algorithm based on Soft Actor-Critic (SAC) is employed to determine a suitable pose for object grasping through self-learning. In order to reduce training time and training costs, the deep reinforcement learning model is trained in the simulation environment built by the V-REP simulator. The virtual 6-DOF industrial manipulator can learn the most suitable position on the image plane for grasping any randomly placed objects of interest through self-learning. After completing its learning, the learned model will be transferred into the real 6-DOF industrial manipulator for executing pick and place tasks. To verify the feasibility of the proposed approach, various kinds of objects were used in the experiments for a real 6-DOF industrial manipulator to perform object grasping and classification. Experimental results indicated that the proposed approach can achieve a high success rate in object grasping/pick-and-place tasks under the scenario of small-volume large-variety tasks.
[1] 林潔君,基於視覺之工業用機械手臂物件夾取研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2015.
[2] 羅國益,基於視覺之工業用機械手臂物件取放作業研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2016.
[3] E. Johns, S. Leutenegger, and A. J. Davison, "Deep learning a grasp function for grasping under gripper pose uncertainty," in Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2016, pp. 4461-4468.
[4] I. Lenz, H. Lee, and A. Saxena, "Deep learning for detecting robotic grasps," The International Journal of Robotics Research, vol. 34, no. 4-5, pp. 705-724, 2015.
[5] L. Pinto and A. Gupta, "Supersizing self-supervision: learning to grasp from 50k tries and 700 robot hours," in Proceedings of the 2016 IEEE International Conference on Robotics and Automation, 2016, pp. 3406-3413.
[6] S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, "Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection," The International Journal of Robotics Research, vol. 37, no. 4-5, pp. 421-436, 2018.
[7] J. Mahler et al., "Dex-Net 1.0: a cloud-based network of 3D objects for robust grasp planning using a multi-armed bandit model with correlated rewards," in Proceedings of the 2016 IEEE International Conference on Robotics and Automation, 2016, pp. 1957-1964.
[8] J. Mahler, M. Matl, X. Liu, A. Li, D. Gealy, and K. Goldberg, "Dex-Net 3.0: computing robust vacuum suction grasp targets in point clouds using a new analytic model and deep learning," in Proceedings of the 2018 IEEE International Conference on Robotics and Automation, 2018, pp. 5620-5627.
[9] J. Mahler et al., "Learning ambidextrous robot grasping policies," Science Robotics, vol. 4, no. 26, 2019.
[10] J. Mahler et al., "Dex-Net 2.0: deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics," arXiv preprint arXiv:1703.09312, Aug. 2017.
[11] M. Riedmiller et al., "Learning by playing - solving sparse reward tasks from scratch," in Proceedings of the International Conference on Machine Learning, 2018, pp. 4344-4353.
[12] K. Fang et al., "Learning task-oriented grasping for tool manipulation from simulated self-supervision," The International Journal of Robotics Research, vol. 39, no. 2-3, pp. 202-216, 2020.
[13] Y. Tsurumine, Y. Cui, E. Uchibe, and T. Matsubara, "Deep dynamic policy programming for robot control with raw images," in Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2017, pp. 1545-1550.
[14] F. Li, Q. Jiang, W. Quan, S. Cai, R. Song, and Y. Li, "Manipulation skill acquisition for robotic assembly based on multi-modal information description," IEEE Access, vol. 8, pp. 6282-6294, 2020.
[15] M. Vecerik et al., "Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards," arXiv preprint arXiv:1707.08817, Oct. 2018.
[16] 葉庭瑜,基於深度增強式學習之二軸機械手臂視覺追蹤系統研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2019.
[17] M. Gualtieri, A. ten Pas, and R. Platt, "Pick and place without geometric object models," in Proceedings of the 2018 IEEE International Conference on Robotics and Automation, 2018, pp. 7433-7440.
[18] Y. Fujita, K. Uenishi, A. Ummadisingu, P. Nagarajan, S. Masuda, and M. Y. Castro, "Distributed reinforcement learning of targeted grasping with active vision for mobile manipulators," arXiv preprint arXiv:2007.08082, Oct. 2020.
[19] Y. Deng et al., "Deep reinforcement learning for robotic pushing and picking in cluttered environment," in Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2019, pp. 619-626.
[20] A. Zeng, S. Song, S. Welker, J. Lee, A. Rodriguez, and T. Funkhouser, "Learning synergies between pushing and grasping with self-supervised deep reinforcement learning," in Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018, pp. 4238-4245.
[21] D. Kalashnikov et al., "QT-opt: scalable deep reinforcement learning for vision-based robotic manipulation," arXiv preprint arXiv:1806.10293, Nov. 2018.
[22] R. Chen and X.-y. Dai, "Robotic grasp control policy with target pre-detection based on deep q-learning," in Proceedings of the 2018 3rd International Conference on Robotics and Automation Engineering, 2018, pp. 29-33.
[23] Z. Chen, M. Lin, Z. Jia, and S. Jian, "Towards generalization and data efficient learning of deep robotic grasping," arXiv preprint arXiv:2007.00982, Jul. 2020.
[24] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: unified, real-time object detection," in Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779-788.
[25] Z. Zhang, "Microsoft kinect sensor and its effect," IEEE MultiMedia, vol. 19, no. 2, pp. 4-10, 2012.
[26] 巴布羅,應用於多面物體自動夾取之基於 Kinect 即時三維點雲處理研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2016.
[27] 李哲良,基於影像之視覺伺服應用於循跡控制研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2016.
[28] 張庭育,虛擬視覺伺服估測器及動態視覺伺服架構之研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2018.
[29] 江宗錡,六軸關節型機械手臂之手眼校正研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2014.
[30] 楊士賢,基於電腦視覺與危險能量場之工業型機械手臂避障研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2020.
[31] C. Cai, N. Somani, and A. Knoll, "Orthogonal image features for visual servoing of a 6-DOF manipulator with uncalibrated stereo cameras," IEEE transactions on Robotics, vol. 32, no. 2, pp. 452-461, 2016.
[32] 吳如峰,工業用六軸機械手臂之基於影像視覺伺服架構研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2015.
[33] E. Rohmer, S. P. Singh, and M. Freese, "V-REP: a versatile and scalable robot simulation framework," in Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2013, pp. 1321-1326.
[34] J. Redmon and A. Farhadi, "YOLOv3: an incremental improvement," arXiv preprint arXiv:1804.02767, Apr. 2018.
[35] J. Redmon and A. Farhadi, "YOLO9000: better, faster, stronger," in Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6517-6525.
[36] J. A. Hartigan and M. A. Wong, "Algorithm AS 136: a k-means clustering algorithm," Journal of the Royal Statistical Society: Series C, vol. 28, no. 1, pp. 100-108, 1979.
[37] J. Hosang, R. Benenson, and B. Schiele, "Learning non-maximum suppression," in Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6469-6477.
[38] L. P. Kaelbling, M. L. Littman, and A. W. Moore, "Reinforcement learning: a survey," Journal of Articial Intelligence Research, vol. 4, no. 1, pp. 237-285, 1996.
[39] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA:MIT press, 1998.
[40] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, "Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor," in Proceedings of the International Conference on Machine Learning, 2018, pp. 1861-1870.
[41] T. Haarnoja et al., "Soft actor-critic algorithms and applications," arXiv preprint arXiv:1812.05905, Jan. 2019.
[42] J. Peters and S. Schaal, "Natural actor-critic," Neurocomputing, vol. 71, no. 7-9, pp. 1180-1190, 2008.
[43] S. Fujimoto, H. Hoof, and D. Meger, "Addressing function approximation error in actor-critic methods," in Proceedings of the International Conference on Machine Learning, 2018, pp. 1587-1596.
[44] D. Yarats, A. Zhang, I. Kostrikov, B. Amos, J. Pineau, and R. Fergus, "Improving sample efficiency in model-free reinforcement learning from images," arXiv preprint arXiv:1910.01741, Jul. 2020.
[45] I. Kostrikov, D. Yarats, and R. Fergus, "Image augmentation is all you need: regularizing deep reinforcement learning from pixels," arXiv preprint arXiv:2004.13649, Mar. 2021.