簡易檢索 / 詳目顯示

研究生: 江長翰
Chiang, Chang-Han
論文名稱: 基於 SAC 與 TD3 的自然姿態控制策略於人形機器手臂之應用
Natural Posture Control in Humanoid Robot Arms Using SAC and TD3 Reinforcement Learning Policies
指導教授: 李祖聖
Li, Tzuu-Hseng S.
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2025
畢業學年度: 113
語文別: 英文
論文頁數: 100
中文關鍵詞: 人形機器人機械手臂深度強化學習自適應運動策略
外文關鍵詞: Adaptive Motion Policy, Deep Reinforcement Learning, Humanoid Robot, Robotic Arm
相關次數: 點閱:10下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文提出一套融合深度強化學習技術之自適應運動控制策略,目的在提升人形機器手臂的自然性與靈活性。針對人形機器手臂的操控需求,此論文分別基於 Soft Actor-Critic (SAC) 演算法與 Twin Delayed DDPG (TD3) 演算法,設計了即時動作調整策略與抓放預定位策略,實現全方面姿勢運動與穩定抓取行為。系統中亦整合了自然姿勢神經網路與碰撞邊界神經網路,以提供符合人體手臂放鬆姿態特徵,同時避免自碰撞風險,提升安全性與運動穩定性。此研究創新之處在於捨棄傳統逆向運動學計算,改以關節空間作為控制基礎,使機器手臂能在無需複雜計算的情況下,達成指定姿勢且即時的運動調整行為。為驗證整體策略之泛用性與效能,論文中設計了以深度強化學習為基礎的抓放預定位策略,並於模擬環境及實體機器人上進行測試,確認該策略具備良好的Sim-to-Real轉移能力。本論文成功展現深度強化學習技術於人形機器手臂控制上的應用潛力,並藉由導入自然姿勢與碰撞邊界神經網路,使系統兼具自然性與安全性,對人形機器人操作系統之發展具重要參考價值。

    This thesis proposes an adaptive motion control strategy that integrates deep reinforcement learning techniques to enhance the naturalness and flexibility of humanoid robotic arm movements. To address the control requirements of humanoid robotic arms, the study designs a real-time motion adjustment policy based on the Soft Actor-Critic (SAC) algorithm and a grasping prepositioning policy based on the Twin Delayed DDPG (TD3) algorithm, enabling comprehensive posture control and stable grasping behavior. The system also incorporates a natural posture neural network and a collision boundary neural network to provide features that reflect relaxed, human-like arm postures while avoiding self-collision risks, thereby improving safety and motion stability. The innovation of this research lies in replacing traditional inverse kinematics computations with joint space representations as the control foundation. This allows the robotic arm to achieve target postures and perform timely motion adjustments without complex calculations. To validate the generalizability and effectiveness of the overall strategy, the thesis develops a grasping prepositioning policy based on deep reinforcement learning, which is tested in both simulated environments and on a physical humanoid robot. The results confirm that the proposed strategy possesses strong sim-to-real transfer capabilities. This thesis successfully demonstrates the potential of applying deep reinforcement learning techniques to control humanoid robotic arms. By incorporating a natural posture neural network and a collision boundary neural network, the system achieves both natural movement and operational safety, providing significant reference value for the future development of humanoid robot control systems.

    摘要 I Abstract II Acknowledgment III Content IV List of Figures VII List of Tables IX Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Related Works 2 1.3 Thesis Organization 5 Chapter 2 Robot Hardware System 7 2.1 Overview 7 2.2 Hardware Specification 8 2.3 Actuator 11 2.3.1 Dynamixel Motor 12 2.3.2 Myactuator Motor 14 2.4 Power Supply 15 2.5 Control Kernel 16 2.6 RGBD Camera 17 2.7 System Architecture 18 Chapter 3 Method 20 3.1 Overview 20 3.2 Kinematics of Humanoid Robot 21 3.2.1 Forward Kinematics of Arm End-Effector 21 3.2.2 Forward Kinematics of Head Camera 25 3.3 SAC-based Moving Policy 27 3.3.1 Introduction 27 3.3.2 Soft Actor-Critic (SAC) 29 3.3.3 Action 31 3.3.4 State 32 3.3.5 Reward 34 3.4 TD3-Based Prepositioning Policy 39 3.4.1 Twin Delayed DDPG (TD3) 40 3.4.2 Action 42 3.4.3 State 42 3.4.4 Reward and Training Method 42 3.5 Natural Posture and Collision Boundary 44 3.5.1 Natural Posture Neural Network 45 3.5.2 Collision Boundary Neural Network 48 3.5.3 Natural Rate 51 3.6 Summary 52 Chapter 4 Simulations and Experimental Results 54 4.1 Overview 54 4.2 Experimental Setup 55 4.2.1 Simulation Environment – MuJoCo 55 4.2.2 Parameters in the RL models 55 4.3 Experimental Results 58 4.3.1 Experiment 1 – Moving Policy 58 4.3.2 Experiment 2 – Prepositioning Policy 62 4.3.3 Experiment 3 – Natural Posture 64 4.3.4 Experiment 4 – Collision Boundary 66 4.3.5 Experiment 5 – Natural Posture and Collision Boundary 69 4.3.6 Experiment 6 – Sim-to-Real Transfer of the Moving Policy 70 4.3.7 Experiment 7 – Real-Time Grasp/Place Task 74 4.3.8 Experiment 8 – Long Range Grasp/Place Task 79 4.4 Summary 81 Chapter 5 Conclusions and Future Work 82 5.1 Conclusions 82 5.2 Future Work 83 5.3 Declaration of the Use of Generative AI technologies 84 References 85

    [1] J. Fan, Z. Wang, Y. Xie, and Z. Yang, "A Theoretical Analysis of Deep Q-Learning," 2019, arXiv:1901.00137.
    [2] T. P. Lillicrap et al., "Continuous control with deep reinforcement learning," 2015, arXiv:1509.02971.
    [3] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal Policy Optimization Algorithms," 2017, arXiv:1707.06347.
    [4] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor," 2018, arXiv:1801.01290.
    [5] S. Fujimoto, H. van Hoof, and D. Meger, "Addressing Function Approximation Error in Actor-Critic Methods," 2018, arXiv:1802.09477.
    [6] A. Rupam Mahmood, D. Korenkevych, G. Vasan, W. Ma, and J. Bergstra, "Benchmarking Reinforcement Learning Algorithms on Real-World Robots," 2018, arXiv:1809.07731.
    [7] Y. Xiang et al., "RMBench: Benchmarking Deep Reinforcement Learning for Robotic Manipulator Control," in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023, pp. 1207-1214.
    [8] A. Franceschetti, E. Tosello, N. Castaman, and S. Ghidoni, "Robotic Arm Control and Task Training through Deep Reinforcement Learning," 2020, arXiv:2005.02632.
    [9] M. Mueangprasert, P. Chermprayong, and K. Boonlong, "Robot Arm Movement Control by Model-based Reinforcement Learning using Machine Learning Regression Techniques and Particle Swarm Optimization," in 2023 Third International Symposium on Instrumentation, Control, Artificial Intelligence, and Robotics (ICA-SYMP), 2023, pp. 83-86.
    [10] W. Yuanyang and M. N. Mahyuddin, "Grasping Deformable Objects in Industry Application: A Comprehensive Review of Robotic Manipulation," IEEE Access, vol. 13, pp. 33403-33423, 2025.
    [11] M. S. Nazeer, C. Laschi, and E. Falotico, "RL-Based Adaptive Controller for High Precision Reaching in a Soft Robot Arm," IEEE Transactions on Robotics, vol. 40, pp. 2498-2512, 2024.
    [12] P. Xie et al., "GAP-RL: Grasps as Points for RL Towards Dynamic Object Grasping," IEEE Robotics and Automation Letters, vol. 10, no. 1, pp. 40-47, 2025.
    [13] W. Zhu, X. Guo, D. Owaki, K. Kutsuzawa, and M. Hayashibe, "A Survey of Sim-to-Real Transfer Techniques Applied to Reinforcement Learning for Bioinspired Robots," IEEE Trans Neural Netw Learn Syst, vol. 34, no. 7, pp. 3444-3459, Jul 2023.
    [14] K. Duan and Z. Zou, "Enhancing Construction Robot Collaboration via Multiagent Reinforcement Learning," Journal of Intelligent Construction, vol. 3, no. 2, pp. 1-16, 2025.
    [15] Y. Guo, Z. Jiang, Y.-J. Wang, J. Gao, and J. Chen, "Decentralized Motor Skill Learning for Complex Robotic Systems," IEEE Robotics and Automation Letters, vol. 8, no. 9, pp. 5791-5798, 2023.
    [16] Y. Chen et al., "Bi-DexHands: Towards Human-Level Bimanual Dexterous Manipulation," IEEE Trans Pattern Anal Mach Intell, vol. 46, no. 5, pp. 2804-2818, May 2024.
    [17] H.-R. Lee, S. Park, and J. Lee, "Bayesian Reinforcement Learning for Adaptive Balancing in an Assembly Line With Human-Robot Collaboration," IEEE Access, vol. 12, pp. 172256-172265, 2024.
    [18] M. Seo et al., "Deep Imitation Learning for Humanoid Loco-manipulation Through Human Teleoperation," in 2023 IEEE-RAS 22nd International Conference on Humanoid Robots (Humanoids), 2023, pp. 1-8.
    [19] J. Li et al., "OKAMI: Teaching Humanoid Robots Manipulation Skills through Single Video Imitation," 2024, arXiv:2410.11792.
    [20] M. Kobayashi, T. Buamanee, Y. Uranishi, and H. Takemura, "ILBiT: Imitation Learning for Robot Using Position and Torque Information based on Bilateral Control with Transformer," 2024, arXiv:2401.16653.
    [21] J.-H. Oh, I. Espinoza, D. Jung, and T.-S. Kim, "Bimanual Long-Horizon Manipulation Via Temporal-Context Transformer RL," IEEE Robotics and Automation Letters, vol. 9, no. 12, pp. 10898-10905, 2024.
    [22] H. Kim, Y. Ohmura, and Y. Kuniyoshi, "Transformer-based deep imitation learning for dual-arm robot manipulation," in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, pp. 8965-8972.
    [23] A. Brohan et al., "RT-1: Robotics Transformer for Real-World Control at Scale," 2022, arXiv:2212.06817.
    [24] ROBOTIS. "Robotis." https://www.robotis.us/ (accessed June. 9, 2025).
    [25] S. M. A. T. Co. "Myactuator-x series." https://www.myactuator.com/downloads-xseries (accessed June. 9, 2025).
    [26] ROBOTIS. "ROBOTIS-P series." https://robotis.us/dynamixel-p/ (accessed June. 9, 2025).
    [27] ROBOTIS. "PH54-200." https://emanual.robotis.com/docs/en/dxl/p/ph54-200-s500-r/ (accessed June. 9, 2025).
    [28] ROBOTIS. "PH54-100." https://emanual.robotis.com/docs/en/dxl/p/ph54-100-s500-r/ (accessed June. 9, 2025).
    [29] ROBOTIS. "ROBOTIS MX-series." https://emanual.robotis.com/docs/en/dxl/mx/ (accessed June. 9, 2025).
    [30] ROBOTIS. "MX-106." https://emanual.robotis.com/docs/en/dxl/mx/mx-106-2/ (accessed June. 9, 2025).
    [31] ROBOTIS. "MX-28." https://emanual.robotis.com/docs/en/dxl/mx/mx-28-2/ (accessed June. 9, 2025).
    [32] MyActuator. "MyActuator RMD-X series." https://www.myactuator.com/rmd-x-planetarymotor (accessed June. 9, 2025).
    [33] MyActuator. "RMD-X8." https://www.myactuator.com/x8-60-details (accessed June. 9, 2025).
    [34] MyActuator. "RMD-X10." https://www.myactuator.com/x10-100-details (accessed June. 9, 2025).
    [35] M. Well. "RSP-2000 series." https://www.meanwell.com/Upload/PDF/RSP-2000/RSP-2000-SPEC.PDF (accessed June. 9, 2025).
    [36] M. Well. "LRS-350-24." https://meanwell-ps.com/products/lrs-350-24 (accessed June. 9, 2025).
    [37] M. Well. "LRS-350-12." https://meanwell-ps.com/products/lrs-350-12 (accessed June. 9, 2025).
    [38] Nvidia. "Jetson Xavier NX." https://www.nvidia.com/zh-tw/autonomous-machines/embedded-systems/jetson-xavier-nx/ (accessed June. 9, 2025).
    [39] Intel. "D435i." https://www.intelrealsense.com/depth-camera-d435i/ (accessed June. 9, 2025).
    [40] OpenAI. "SAC." https://spinningup.openai.com/en/latest/algorithms/sac.html (accessed July 01, 2025).
    [41] OpenAI. "TD3." https://spinningup.openai.com/en/latest/algorithms/td3.html (accessed July 01, 2025).
    [42] M. Andrychowicz et al., "Hindsight Experience Replay," 2017, arXiv:1707.01495.
    [43] J.-Y. Yang and T.-H. S. Li, "Design and Implementation of Intuitive Human Robot Interface System by DDPG with HER and RCA," in Proc. 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2023. DOI: 10.1109/SMC53992.2023.10394305
    [44] "Mujoco." https://mujoco.org/ (accessed June. 9, 2025).
    [45] "Stable-Baselines3: Reliable Reinforcement Learning Implementations." https://stable-baselines3.readthedocs.io/en/master/ (accessed June. 9, 2025).
    [46] Google. "MediaPipe." https://ai.google.dev/edge/mediapipe/solutions/guide (accessed June. 9, 2025).
    [47] T.-H. S. Li, Y.-F. Ho, P.-H. Kuo, Y.-T. Ye, and L.-F. Wu, "Natural Walking Reference Generation Based on Double-Link LIPM Gait Planning Algorithm," IEEE Access, vol. 5, pp. 2459-2469, 2017.
    [48] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," 2015, arXiv:1506.02640.
    [49] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, "End-to-End Object Detection with Transformers," 2020, arXiv:2005.12872.
    [50] OpenAI. "ChatGPT." https://openai.com/index/chatgpt/ (accessed June. 9, 2025).

    無法下載圖示 校內:2030-08-05公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE