簡易檢索 / 詳目顯示

研究生: 陳亞伶
Chen, Ya-Ling
論文名稱: 基於電腦視覺與深度增強式學習之工業用機械手臂物件取放任務研究
Study on Object Pick-and-Place Tasks of Industrial Manipulator Based on Computer Vision and Deep Reinforcement Learning
指導教授: 鄭銘揚
Cheng, Ming-Yang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 83
中文關鍵詞: 六軸工業機械手臂深度增強式學習柔性演員評論家YOLO物件偵測物件拾取取放作業
外文關鍵詞: 6-DOF industrial manipulator, Deep reinforcement learning, Soft Actor-Critic(SAC), Object detection, YOLO, Object grasping, Pick-and-place tasks
相關次數: 點閱:207下載:32
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著產業發展趨勢逐漸朝向自動化、聰明化及精密化,少量多樣化的生產需求已成為常態,一個靈活且多功能之彈性生產製造系統才能滿足上述需求。有鑑於此,本論文將發展一具備自我學習能力、能適應少量多樣的製造生產方式且具有高物件取放成功率之工業用機械手臂控制技術。此技術主要是藉由電腦視覺偵測出感興趣物件,再由深度增強式學習決定此物件在影像平面中之拾取點位置。其中,電腦視覺是使用YOLO演算法偵測影像中的物件並加以辨識;而深度增強式學習演算法則是根據YOLO演算法的辨識結果,使用柔性演員評論家(Soft Actor-Critic, SAC)來決定物件拾取點位置。為了減少訓練時間及訓練成本,本論文使用V-REP機器人模擬器來訓練深度增強式學習演算法,讓六軸工業機械手臂能在模擬環境中透過自我學習決定任意擺放物件的拾取影像座標點。最後,將其訓練成果應用於真實六軸工業機械手臂拾取系統中,並對多種物件進行拾取及分類實驗。實驗結果證實本論文所提出之架構,在少量多樣物件的作業條件下,亦能達到高取放成功率。

    With the trend of industry development gradually moving towards goals such as automation, smartness and precision , the need f or small volume large variety production is commonly with in industry. In order to meet such a need a more flexible and versatile manufacturing system is essential. In light of such demands, this thesis aims to develop a control technique for industrial robot manipulators that is capable of self-learning, suitable for small-volume large-variety production, and has a high success rate in object grasping/pick-and-place tasks. In particular, computer vision based on the YOLO (You Only Look Once) algorithm is used to detect and recognize object of interest, while a deep reinforcement learning algorithm based on Soft Actor-Critic (SAC) is employed to determine a suitable pose for object grasping through self-learning. In order to reduce training time and training costs, the deep reinforcement learning model is trained in the simulation environment built by the V-REP simulator. The virtual 6-DOF industrial manipulator can learn the most suitable position on the image plane for grasping any randomly placed objects of interest through self-learning. After completing its learning, the learned model will be transferred into the real 6-DOF industrial manipulator for executing pick and place tasks. To verify the feasibility of the proposed approach, various kinds of objects were used in the experiments for a real 6-DOF industrial manipulator to perform object grasping and classification. Experimental results indicated that the proposed approach can achieve a high success rate in object grasping/pick-and-place tasks under the scenario of small-volume large-variety tasks.

    中文摘要 I EXTENDED ABSTRACT II 致謝 XIII 目錄 XV 表目錄 XVIII 圖目錄 XVIII 第一章 緒論 1 1.1 研究動機與目的 1 1.2 文獻回顧 2 1.3 論文架構 5 第二章 RGBD攝影機三維空間座標還原 7 2.1 Kinect攝影機簡介 7 2.2 三維空間座標還原 8 2.2.1 彩色與深度影像資訊擬合 9 2.2.2 攝影機內部參數 9 2.2.3 攝影機外部參數 11 2.2.4 手眼校正 12 第三章 機械手臂順逆向運動學與模擬環境 13 3.1 順向運動學 13 3.2 逆向運動學 16 3.3 模擬環境 19 3.3.1 模擬平台 20 3.3.2 六軸機械手臂模型 20 第四章 基於YOLOv3演算法之物件辨識及定位 23 4.1 YOLOv3[34] 23 4.1.1 影像網路分割 24 4.1.2 預測邊界框生成 26 4.1.3 成本函數 28 4.2 基於YOLOv3演算法之物件辨識與定位技術 29 4.2.1 YOLOv3網路架構設計 29 4.2.2 物件偵測訓練流程 31 4.2.3 物件偵測推論流程 32 第五章 基於深度增強式學習之物件取放系統 33 5.1 深度增強式學習介紹 33 5.2 Soft Actor-Critic 35 5.2.1 柔性策略疊代(Soft Policy Iteration) 36 5.2.2 SAC網路架構與訓練流程 38 5.3 基於深度增強式學習之物件取放決策設計 45 5.3.1 基於SAC之物件取放決策網路架構設計 48 第六章 實驗設備與結果 52 6.1 實驗環境與設備介紹 52 6.1.1 六軸機械手臂 53 6.1.2 攝影機 54 6.1.3 電動真空幫浦 55 6.1.4 實驗場景 57 6.2 實驗方法與結果分析 58 6.2.1 YOLOv3訓練結果 58 6.2.2 基於SAC之物件取放決策模型訓練方式與結果 60 6.3 實驗一:特定物件拾取 65 6.3.1 實驗流程 65 6.3.2 實驗結果 66 6.4 實驗二:物件分類 70 6.4.1 實驗流程 70 6.4.2 實驗結果 71 第七章 結論與建議 76 7.1 結論 76 7.2 未來展望與建議 77 參考文獻 78

    [1] 林潔君,基於視覺之工業用機械手臂物件夾取研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2015.
    [2] 羅國益,基於視覺之工業用機械手臂物件取放作業研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2016.
    [3] E. Johns, S. Leutenegger, and A. J. Davison, "Deep learning a grasp function for grasping under gripper pose uncertainty," in Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2016, pp. 4461-4468.
    [4] I. Lenz, H. Lee, and A. Saxena, "Deep learning for detecting robotic grasps," The International Journal of Robotics Research, vol. 34, no. 4-5, pp. 705-724, 2015.
    [5] L. Pinto and A. Gupta, "Supersizing self-supervision: learning to grasp from 50k tries and 700 robot hours," in Proceedings of the 2016 IEEE International Conference on Robotics and Automation, 2016, pp. 3406-3413.
    [6] S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, "Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection," The International Journal of Robotics Research, vol. 37, no. 4-5, pp. 421-436, 2018.
    [7] J. Mahler et al., "Dex-Net 1.0: a cloud-based network of 3D objects for robust grasp planning using a multi-armed bandit model with correlated rewards," in Proceedings of the 2016 IEEE International Conference on Robotics and Automation, 2016, pp. 1957-1964.
    [8] J. Mahler, M. Matl, X. Liu, A. Li, D. Gealy, and K. Goldberg, "Dex-Net 3.0: computing robust vacuum suction grasp targets in point clouds using a new analytic model and deep learning," in Proceedings of the 2018 IEEE International Conference on Robotics and Automation, 2018, pp. 5620-5627.
    [9] J. Mahler et al., "Learning ambidextrous robot grasping policies," Science Robotics, vol. 4, no. 26, 2019.
    [10] J. Mahler et al., "Dex-Net 2.0: deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics," arXiv preprint arXiv:1703.09312, Aug. 2017.
    [11] M. Riedmiller et al., "Learning by playing - solving sparse reward tasks from scratch," in Proceedings of the International Conference on Machine Learning, 2018, pp. 4344-4353.
    [12] K. Fang et al., "Learning task-oriented grasping for tool manipulation from simulated self-supervision," The International Journal of Robotics Research, vol. 39, no. 2-3, pp. 202-216, 2020.
    [13] Y. Tsurumine, Y. Cui, E. Uchibe, and T. Matsubara, "Deep dynamic policy programming for robot control with raw images," in Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2017, pp. 1545-1550.
    [14] F. Li, Q. Jiang, W. Quan, S. Cai, R. Song, and Y. Li, "Manipulation skill acquisition for robotic assembly based on multi-modal information description," IEEE Access, vol. 8, pp. 6282-6294, 2020.
    [15] M. Vecerik et al., "Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards," arXiv preprint arXiv:1707.08817, Oct. 2018.
    [16] 葉庭瑜,基於深度增強式學習之二軸機械手臂視覺追蹤系統研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2019.
    [17] M. Gualtieri, A. ten Pas, and R. Platt, "Pick and place without geometric object models," in Proceedings of the 2018 IEEE International Conference on Robotics and Automation, 2018, pp. 7433-7440.
    [18] Y. Fujita, K. Uenishi, A. Ummadisingu, P. Nagarajan, S. Masuda, and M. Y. Castro, "Distributed reinforcement learning of targeted grasping with active vision for mobile manipulators," arXiv preprint arXiv:2007.08082, Oct. 2020.
    [19] Y. Deng et al., "Deep reinforcement learning for robotic pushing and picking in cluttered environment," in Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2019, pp. 619-626.
    [20] A. Zeng, S. Song, S. Welker, J. Lee, A. Rodriguez, and T. Funkhouser, "Learning synergies between pushing and grasping with self-supervised deep reinforcement learning," in Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018, pp. 4238-4245.
    [21] D. Kalashnikov et al., "QT-opt: scalable deep reinforcement learning for vision-based robotic manipulation," arXiv preprint arXiv:1806.10293, Nov. 2018.
    [22] R. Chen and X.-y. Dai, "Robotic grasp control policy with target pre-detection based on deep q-learning," in Proceedings of the 2018 3rd International Conference on Robotics and Automation Engineering, 2018, pp. 29-33.
    [23] Z. Chen, M. Lin, Z. Jia, and S. Jian, "Towards generalization and data efficient learning of deep robotic grasping," arXiv preprint arXiv:2007.00982, Jul. 2020.
    [24] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: unified, real-time object detection," in Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779-788.
    [25] Z. Zhang, "Microsoft kinect sensor and its effect," IEEE MultiMedia, vol. 19, no. 2, pp. 4-10, 2012.
    [26] 巴布羅,應用於多面物體自動夾取之基於 Kinect 即時三維點雲處理研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2016.
    [27] 李哲良,基於影像之視覺伺服應用於循跡控制研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2016.
    [28] 張庭育,虛擬視覺伺服估測器及動態視覺伺服架構之研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2018.
    [29] 江宗錡,六軸關節型機械手臂之手眼校正研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2014.
    [30] 楊士賢,基於電腦視覺與危險能量場之工業型機械手臂避障研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2020.
    [31] C. Cai, N. Somani, and A. Knoll, "Orthogonal image features for visual servoing of a 6-DOF manipulator with uncalibrated stereo cameras," IEEE transactions on Robotics, vol. 32, no. 2, pp. 452-461, 2016.
    [32] 吳如峰,工業用六軸機械手臂之基於影像視覺伺服架構研究,碩士論文,國立成功大學,電機工程學系研究所,台灣,2015.
    [33] E. Rohmer, S. P. Singh, and M. Freese, "V-REP: a versatile and scalable robot simulation framework," in Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2013, pp. 1321-1326.
    [34] J. Redmon and A. Farhadi, "YOLOv3: an incremental improvement," arXiv preprint arXiv:1804.02767, Apr. 2018.
    [35] J. Redmon and A. Farhadi, "YOLO9000: better, faster, stronger," in Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6517-6525.
    [36] J. A. Hartigan and M. A. Wong, "Algorithm AS 136: a k-means clustering algorithm," Journal of the Royal Statistical Society: Series C, vol. 28, no. 1, pp. 100-108, 1979.
    [37] J. Hosang, R. Benenson, and B. Schiele, "Learning non-maximum suppression," in Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6469-6477.
    [38] L. P. Kaelbling, M. L. Littman, and A. W. Moore, "Reinforcement learning: a survey," Journal of Articial Intelligence Research, vol. 4, no. 1, pp. 237-285, 1996.
    [39] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA:MIT press, 1998.
    [40] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, "Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor," in Proceedings of the International Conference on Machine Learning, 2018, pp. 1861-1870.
    [41] T. Haarnoja et al., "Soft actor-critic algorithms and applications," arXiv preprint arXiv:1812.05905, Jan. 2019.
    [42] J. Peters and S. Schaal, "Natural actor-critic," Neurocomputing, vol. 71, no. 7-9, pp. 1180-1190, 2008.
    [43] S. Fujimoto, H. Hoof, and D. Meger, "Addressing function approximation error in actor-critic methods," in Proceedings of the International Conference on Machine Learning, 2018, pp. 1587-1596.
    [44] D. Yarats, A. Zhang, I. Kostrikov, B. Amos, J. Pineau, and R. Fergus, "Improving sample efficiency in model-free reinforcement learning from images," arXiv preprint arXiv:1910.01741, Jul. 2020.
    [45] I. Kostrikov, D. Yarats, and R. Fergus, "Image augmentation is all you need: regularizing deep reinforcement learning from pixels," arXiv preprint arXiv:2004.13649, Mar. 2021.

    下載圖示 校內:2022-08-31公開
    校外:2022-08-31公開
    QR CODE