| 研究生: |
陳子聰 Chan, Chi-Chong |
|---|---|
| 論文名稱: |
使用學生老師架構和強化學習用於夾取系統 Robot Arm Grasping Using Teacher-Student-Based Reinforcement Learning Network |
| 指導教授: |
連震杰
Lien, Jenn-Jier James |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2024 |
| 畢業學年度: | 112 |
| 語文別: | 英文 |
| 論文頁數: | 78 |
| 中文關鍵詞: | 強化學習 、DQN 、Teacher-Student Approach |
| 外文關鍵詞: | DQN, Teacher-Student Approach, Reinforcement |
| 相關次數: | 點閱:31 下載:13 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
強化學習(Reinforcement Learning, RL)作為人工智慧和機器學習領域的一個重要分支,近年來取得了顯著的進展和廣泛的應用,如自動駕駛與機器人控制等。強化學習具有許多優點,使其成為解決複雜決策問題的一種有力工具,強化學習無需標記數據,適用於多樣化應用。儘管強化學習有著顯著的優點,但在實際應用中仍面臨一些挑戰,在強化學習中,特別是在機械手臂夾取系統中,應用強化學習面臨的主要挑戰集中在訓練過程的時間消耗和操作的安全性上。由於強化學習需要大量的數據來有效地學習最佳策略,這通常意味著機械手臂需要進行大量的嘗試來收集足夠的經驗,這不僅延長了訓練時間,還增加了系統運行的成本。此外,在探索過程中,機械手臂可能會執行一些未經驗證的動作,這可能導致夾取失敗,甚至損壞物體或設備,從而對安全性構成威脅。因此,如何在保證安全性的前提下,減少訓練時間並提高學習效率,是強化學習在機械手臂夾取任務中需要重點解決的問題。我們通過結合Teacher-Student框架與強化學習,提出了一種創新的方法來解決機械手臂夾取任務中的諸多挑戰。在本論文中,我們提出了一種結合 Teacher-Student 框架與強化學習的方法,用於機械手臂夾取任務。首先,利用經過預訓練的 Teacher 模型生成 Pseudo Label 作為 Student 模型的初始訓練標籤,從而有效地提升了 Student 模型的學習速度和初期性能。在此過程中,我們使用了一次性推理生成的 Pseudo Label Heatmap,該標籤不包含即時的獎勵資訊。接著,為進一步優化 Student 模型,我們應用了強化學習方法。通過 Student 模型執行多次訓練與操作,模型根據實際夾取成功率獲取即時獎勵(Reward),這樣的訓練過程不僅能夠幫助 Student 模型超越 Teacher 模型的表現,還能使其更好地適應複雜的實際操作環境。這種方法在處理夾取任務時有效地提高了模型的訓練效率和夾取成功率,並在應對實際操作中的各種不確定性方面展現了良好的強健性。
Reinforcement Learning (RL) has emerged as a significant branch of artificial intelligence and machine learning, achieving notable progress and widespread applications in recent years, such as in autonomous driving and robotic control. RL offers numerous advantages, making it a powerful tool for addressing complex decision-making problems. One of its key strengths lies in its ability to operate without labeled data, making it suitable for a diverse range of applications. However, despite its advantages, RL faces several challenges in practical applications. In particular, the application of RL in robotic grasping systems is challenged by the time consumption and operational safety during the training process. Since RL requires a substantial amount of data to effectively learn optimal strategies, robotic arms must perform numerous trials to gather sufficient experience, which not only extends the training time but also increases system operational costs. Additionally, during the exploration phase, robotic arms might execute unverified actions, potentially leading to grasp failures or even damaging objects or equipment, thereby posing safety risks. Therefore, reducing training time while ensuring safety and enhancing learning efficiency are critical issues that need to be addressed in applying RL to robotic grasping tasks. In this paper, we propose an innovative approach to tackle these challenges in robotic grasping tasks by integrating the Teacher-Student framework with reinforcement learning. Initially, a pre-trained Teacher model is used to generate Pseudo Labels, which serve as the initial training labels for the Student model, effectively accelerating the learning speed and improving the initial performance of the Student model. In this process, a Pseudo Label Heatmap is generated through one-time inference, which does not contain real-time reward information. Subsequently, to further optimize the Student model, we apply reinforcement learning methods. By allowing the Student model to undergo multiple training and operational phases, the model obtains real-time rewards based on the actual grasp success rate. This training process not only enables the Student model to surpass the performance of the Teacher model but also enhances its adaptability to complex real-world operating environments. Our approach effectively improves training efficiency and grasping success rates in handling grasping tasks and demonstrates strong robustness in dealing with various uncertainties in practical operations.
[1] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529-533, 2015.R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, Cambridge, MA: MIT press, 2018.
[2] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double Q-learning,” in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 2094-2100, 2016.
[3] T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” arXiv preprint arXiv:1511.05952, 2015
[4] Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, and N. De Freitas, “Dueling network architectures for deep reinforcement learning,” in Proceedings of the 33rd International Conference on Machine Learning (ICML), pp. 1995-2003, 2016.
[5] S. Luo, Y. Xu, and T. Y. Liu, “Knowledge distillation for recurrent neural networks,” arXiv preprint arXiv:1610.09650, 2016
[6] M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling, “The arcade learning environment: An evaluation platform for general agents,” Journal of Artificial Intelligence Research, vol. 47, pp. 253-279, 2013.
[7] S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,” Journal of Machine Learning Research, vol. 17, no. 39, pp. 1-40, 2016.
[8] J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238-1274, 2013.
[9] C. Finn, T. Yu, T. Zhang, P. Abbeel, and S. Levine, “One-shot visual imitation learning via meta-learning,” in Proceedings of the 1st Annual Conference on Robot Learning (CoRL), pp. 357-368, 2017.
[10] A. Y. Ng, D. Harada, and S. Russell, “Policy invariance under reward transformations: Theory and application to reward shaping,” in Proceedings of the Sixteenth International Conference on Machine Learning (ICML), pp. 278-287, 1999.
[11] D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell, “Curiosity-driven exploration by self-supervised prediction,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 16-17, 2017.
[12] R. Rusu, M. Vecerik, T. Rothörl, N. Heess, R. Pascanu, and R. Hadsell, “Sim-to-real robot learning from pixels with progressive nets,” in Proceedings of the 1st Annual Conference on Robot Learning (CoRL), pp. 262-270, 2017.
[13] J. Ho and S. Ermon, “Generative adversarial imitation learning,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 29, pp. 4565-4573, 2016.
[14] M. Riedmiller, “Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method,” in Proceedings of the 16th European Conference on Machine Learning (ECML), pp. 317-328, 2005.
[15] L. Zhang and K. Cho, “Query-efficient imitation learning for end-to-end simulated driving,” in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI), pp. 2891-2897, 2017.
[16] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484-489, 2016.