簡易檢索 / 詳目顯示

研究生: 陳子聰
Chan, Chi-Chong
論文名稱: 使用學生老師架構和強化學習用於夾取系統
Robot Arm Grasping Using Teacher-Student-Based Reinforcement Learning Network
指導教授: 連震杰
Lien, Jenn-Jier James
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 78
中文關鍵詞: 強化學習DQNTeacher-Student Approach
外文關鍵詞: DQN, Teacher-Student Approach, Reinforcement
相關次數: 點閱:31下載:13
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 強化學習(Reinforcement Learning, RL)作為人工智慧和機器學習領域的一個重要分支,近年來取得了顯著的進展和廣泛的應用,如自動駕駛與機器人控制等。強化學習具有許多優點,使其成為解決複雜決策問題的一種有力工具,強化學習無需標記數據,適用於多樣化應用。儘管強化學習有著顯著的優點,但在實際應用中仍面臨一些挑戰,在強化學習中,特別是在機械手臂夾取系統中,應用強化學習面臨的主要挑戰集中在訓練過程的時間消耗和操作的安全性上。由於強化學習需要大量的數據來有效地學習最佳策略,這通常意味著機械手臂需要進行大量的嘗試來收集足夠的經驗,這不僅延長了訓練時間,還增加了系統運行的成本。此外,在探索過程中,機械手臂可能會執行一些未經驗證的動作,這可能導致夾取失敗,甚至損壞物體或設備,從而對安全性構成威脅。因此,如何在保證安全性的前提下,減少訓練時間並提高學習效率,是強化學習在機械手臂夾取任務中需要重點解決的問題。我們通過結合Teacher-Student框架與強化學習,提出了一種創新的方法來解決機械手臂夾取任務中的諸多挑戰。在本論文中,我們提出了一種結合 Teacher-Student 框架與強化學習的方法,用於機械手臂夾取任務。首先,利用經過預訓練的 Teacher 模型生成 Pseudo Label 作為 Student 模型的初始訓練標籤,從而有效地提升了 Student 模型的學習速度和初期性能。在此過程中,我們使用了一次性推理生成的 Pseudo Label Heatmap,該標籤不包含即時的獎勵資訊。接著,為進一步優化 Student 模型,我們應用了強化學習方法。通過 Student 模型執行多次訓練與操作,模型根據實際夾取成功率獲取即時獎勵(Reward),這樣的訓練過程不僅能夠幫助 Student 模型超越 Teacher 模型的表現,還能使其更好地適應複雜的實際操作環境。這種方法在處理夾取任務時有效地提高了模型的訓練效率和夾取成功率,並在應對實際操作中的各種不確定性方面展現了良好的強健性。

    Reinforcement Learning (RL) has emerged as a significant branch of artificial intelligence and machine learning, achieving notable progress and widespread applications in recent years, such as in autonomous driving and robotic control. RL offers numerous advantages, making it a powerful tool for addressing complex decision-making problems. One of its key strengths lies in its ability to operate without labeled data, making it suitable for a diverse range of applications. However, despite its advantages, RL faces several challenges in practical applications. In particular, the application of RL in robotic grasping systems is challenged by the time consumption and operational safety during the training process. Since RL requires a substantial amount of data to effectively learn optimal strategies, robotic arms must perform numerous trials to gather sufficient experience, which not only extends the training time but also increases system operational costs. Additionally, during the exploration phase, robotic arms might execute unverified actions, potentially leading to grasp failures or even damaging objects or equipment, thereby posing safety risks. Therefore, reducing training time while ensuring safety and enhancing learning efficiency are critical issues that need to be addressed in applying RL to robotic grasping tasks. In this paper, we propose an innovative approach to tackle these challenges in robotic grasping tasks by integrating the Teacher-Student framework with reinforcement learning. Initially, a pre-trained Teacher model is used to generate Pseudo Labels, which serve as the initial training labels for the Student model, effectively accelerating the learning speed and improving the initial performance of the Student model. In this process, a Pseudo Label Heatmap is generated through one-time inference, which does not contain real-time reward information. Subsequently, to further optimize the Student model, we apply reinforcement learning methods. By allowing the Student model to undergo multiple training and operational phases, the model obtains real-time rewards based on the actual grasp success rate. This training process not only enables the Student model to surpass the performance of the Teacher model but also enhances its adaptability to complex real-world operating environments. Our approach effectively improves training efficiency and grasping success rates in handling grasping tasks and demonstrates strong robustness in dealing with various uncertainties in practical operations.

    摘要 I Abstract II 誌謝 III List of Tabel VII List of Figures VIII Chapter 1 Introduction 10 1.1 Motivation and Objective 10 1.2 Global Framework 11 1.3 Related Works 17 1.4 Contributions 19 Chapter 2 System Setup and Function Specifications 21 2.1 System setup 21 2.2 Function Specifications 21 Chapter 3 DQN-FCN Training Framework using Teacher-Student 25 3.1 Framework of Teacher-Student Approach 25 3.1.1 Training Framework of Teacher-Student Approach 27 3.2.1 Model Struct of Teacher Model 31 3.2.1 Model Struct of Student Model 35 3.2.3 Loss Function 38 3.3 Pseudo Label Heatmap QPseudo Creation – Masking 39 Chapter 4 DQN-FCN Training Framework Using Teacher-Student 43 4.1.1 Continuous Training using Reinforcement Learning 43 4.2.1 Label Heatmap QLabel Creation 48 4.2.2 Update Model with Reinforcement Learning 48 4.2.3 Loss Function 54 Chapter 5 Experimental Results 55 5.1.1 Data Collection 55 5.1.2 Metrics 58 5.2.1 Experimental Result: Grasping Block 60 5.2.1 Experimental Result: Grasping sundries 65 5.2.2 Result Analysis:Loss Function Comparison 68 Chapter 6 Conclusion 72 6.1 Conclusion 72 6.2 Future Works 73 Reference 75

    [1] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529-533, 2015.R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, Cambridge, MA: MIT press, 2018.
    [2] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double Q-learning,” in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 2094-2100, 2016.
    [3] T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” arXiv preprint arXiv:1511.05952, 2015
    [4] Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, and N. De Freitas, “Dueling network architectures for deep reinforcement learning,” in Proceedings of the 33rd International Conference on Machine Learning (ICML), pp. 1995-2003, 2016.
    [5] S. Luo, Y. Xu, and T. Y. Liu, “Knowledge distillation for recurrent neural networks,” arXiv preprint arXiv:1610.09650, 2016
    [6] M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling, “The arcade learning environment: An evaluation platform for general agents,” Journal of Artificial Intelligence Research, vol. 47, pp. 253-279, 2013.
    [7] S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,” Journal of Machine Learning Research, vol. 17, no. 39, pp. 1-40, 2016.
    [8] J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238-1274, 2013.
    [9] C. Finn, T. Yu, T. Zhang, P. Abbeel, and S. Levine, “One-shot visual imitation learning via meta-learning,” in Proceedings of the 1st Annual Conference on Robot Learning (CoRL), pp. 357-368, 2017.
    [10] A. Y. Ng, D. Harada, and S. Russell, “Policy invariance under reward transformations: Theory and application to reward shaping,” in Proceedings of the Sixteenth International Conference on Machine Learning (ICML), pp. 278-287, 1999.
    [11] D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell, “Curiosity-driven exploration by self-supervised prediction,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 16-17, 2017.
    [12] R. Rusu, M. Vecerik, T. Rothörl, N. Heess, R. Pascanu, and R. Hadsell, “Sim-to-real robot learning from pixels with progressive nets,” in Proceedings of the 1st Annual Conference on Robot Learning (CoRL), pp. 262-270, 2017.
    [13] J. Ho and S. Ermon, “Generative adversarial imitation learning,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 29, pp. 4565-4573, 2016.
    [14] M. Riedmiller, “Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method,” in Proceedings of the 16th European Conference on Machine Learning (ECML), pp. 317-328, 2005.
    [15] L. Zhang and K. Cho, “Query-efficient imitation learning for end-to-end simulated driving,” in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI), pp. 2891-2897, 2017.
    [16] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484-489, 2016.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE