| 研究生: |
謝宗佑 Hsieh, Tsung-Yu |
|---|---|
| 論文名稱: |
開發自我監督強化學習技術於免校正抓取系統 Calibration-Free Grasping System using Self-Supervised Reinforcement Learning |
| 指導教授: |
連震杰
Lien, Jenn-Jier James |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 英文 |
| 論文頁數: | 32 |
| 中文關鍵詞: | 視覺系統 、強化學習 、機械手臂 |
| 外文關鍵詞: | Vision-based, Reinforcement Learning, Robot Arm |
| 相關次數: | 點閱:206 下載:68 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來工業製造智慧化已是不可逆轉的趨勢。在高度自動化的製造流程中,機械手臂身為舉足輕重的一環也在漸漸進化。從一開始僅能定點式的完成特定任務,到現在能夠依照攝影機影像決定手臂自身的行為。然而其中視覺部分的演算法,不論是傳統的電腦視覺還是近年發展的深度學習演算法,都需要不少人力來做資料標示、座標校正這類前處理才能正常運轉,而演算法本身也常常需要人為的依照現場狀況調整參數。
基於減少演算法設計中所需的人力資源的想法,本文基於強化學習演算法,提供了一個不需人工作相機座標校正,且能夠直接從攝影機拍攝的彩色、深度影像以及手臂夾爪位置來對目標環境進行學習,並輸出對應夾爪動作的物體夾取系統。我們借助在多項領域取得巨大成功的卷積神經網路來處理來自攝影機的現實影像,讓其從中學習從平面影像到現實立體空間之間的座標轉換。與此同時,我們還利用強化學習演算法強大的環境適應能力,將經過卷積神經網路處裡的資料作為輸入,學習如何在當下狀況選擇能夠促進任務完成可能性的動作。
我們還引入了以目標物體為中心的座標表達方式,並通過實驗證實了這樣的方式能對整體系統的性能帶來良好的影響。通過將直接偵測目標物的絕對座標轉換為估計目標物的相對座標,大幅減輕了模型學習時的負擔,並成功完成任務。
In recent years, the intelligence production has become an irreversible trend. Being a vital part of high degree of automation, robot arm is also gradually evolving. From only being able to complete a specific task in a fixed manner, it’s now able to decide the behavior itself according to what it “sees”. However, the vision algorithms, either computer vision or deep learning algorithms, usually require a lot of manpower to do pre-processing such as data labeling or camera calibration in order to operate properly, and the algorithm itself often needs human adjustment according to the situations.
Inspired by the idea of reducing the manpower required in the algorithm design, we provide an object grasping system that does required any human work for camera calibration. It can learn the target environment directly from raw color image, depth image, and gripper position, outputting the corresponding gripper action. We use convolution neural networks, which have had great success in many fields, to process realistic images from camera, learning how to transform coordinates from 2D images to real 3D space. Also, we take advantage of the powerful environment adaptation capabilities of the reinforcement learning algorithm. Takes the data processed by the convolution neural network as input and learns how to select the action that will promote the success rate of task in the current situation.
We also incorporate the object-centric representation in convolution neural network, and demonstrate that this has a good impact on the system performance with experiments. Changing the method from detect the absolute position of target object to estimate the relative distance of them significantly reduce the burden of learning and finishing the task successfully.
[1] T. Lillicrap, J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, ''Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2015.
[2] V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King1, D . Kumaran, D . Wierstra, S. Legg and D. Hassabis, "Human level control through deep reinforcement learning," Nature, pp 529–533, 2015.
[3] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, "Deterministic policy gradient algorithms," International Conference on Machine Learning, 2014.
[4] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing atari with deep reinforcement learning," arXiv preprint arXiv:1312.5602, 2013.
[5] M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, P. Abbeel, and W. Zaremba, "Hindsight experience replay," Neural Information Processing Systems, 2017.
[6] M. Vecerik, T. Hester, J. Scholz, F. Wang, O. Pietquin, B. Piot, N. Heess, T. Rothörl, T. Lampe, and M. Riedmiller, "Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards," Neural Information Processing System, 2017.
[7] S. Levine, C. Finn, T. Darrell, and P. Abbeel, "End-to-end training of deep visuomotor policies," arXiv preprint arXiv:1504.00702, 2015.
[8] D. Quillen, E. Jang, O. Nachum, C. Finn, J. Ibarz, and S. Levine, "Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods," IEEE International Conference on Robotics and Automation, 2018.
[9] A. Zeng, S. Song, S. Welker, J. Lee, A. Rodriguez, and T. Funkhouser, "Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning," arXiv preprint arXiv:1803.09956, 2018.
[10] S. Levine, P. Pastor, A. Krizhevsky, and D. Quillen, "Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection," arXiv preprint arXiv:1603.02199, 2016.
[11] J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, "Domain randomization for transferring deep neural networks from simulation to the real world," arXiv preprint arXiv:1703.06907, 2017.
[12] R. Cheng, A. Agarwal, and K. Fragkiadaki, "Reinforcement learning of active vision for manipulating objects under occlusions," Conference on Robot Learning, 2018.
[13] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," arXiv preprint arXiv:1512.03385, 2015.
[14] X.B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, "Sim-to-real transfer of robotic control with dynamics randomization," arXiv preprint arXiv:1710.06537, 2017.
[15] S. Levine, and P. Abbeel, "Learning neural network policies with guided policy search under unknown dynamics," Neural Information Processing Systems, pp. 1071–1079, 2014.
[16] S. Levine and V. Koltun, "Guided policy search," International Conference on Machine Learning, 2013.
[17] H. van Hasselt, A. Guez, and D. Silver, "Deep Reinforcement Learning with Double Q-learning," arXiv preprint arXiv:1509.06461, 2015.
[18] S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," arXiv preprint arXiv:1502.03167, 2015.
[19] I. Popov, N. Heess, T. Lillicrap, R. Hafner, G. Barth-Maron, M. Vecerik, T. Lampe, Y. Tassa, T. Erez, and M. Riedmiller, "Data-efficient deep reinforcement learning for dexterous manipulation," arXiv preprint arXiv:1704.03073, 2017.
[20] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, "OpenAI Gym," arXiv preprint arXiv:1606.01540, 2016.
[21] J. Schulman, S. Levine, P. Moritz, M.I. Jordan, and P. Abbeel, " Trust Region Policy Optimization," arXiv preprint arXiv:1502.05477, 2015.
[22] Y.H. Chen, "Visual-Guided Grasping System using Self-Supervised Deep Reinforcement Learning," Master's thesis, National Cheng Kung University, Institute of Computer Science and Information Engineering.
[23] J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, and L. FeiFei, "ImageNet: A Large-Scale Hierarchical Image Database," Conference on Computer Vision and Pattern Recognition, 2009.