成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	江長翰 Chiang, Chang-Han
論文名稱：	基於 SAC 與 TD3 的自然姿態控制策略於人形機器手臂之應用 Natural Posture Control in Humanoid Robot Arms Using SAC and TD3 Reinforcement Learning Policies
指導教授：	李祖聖 Li, Tzuu-Hseng S.
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2025
畢業學年度：	113
語文別：	英文
論文頁數：	100
中文關鍵詞：	人形機器人、機械手臂、深度強化學習、自適應運動策略
外文關鍵詞：	Adaptive Motion Policy, Deep Reinforcement Learning, Humanoid Robot, Robotic Arm
相關次數：	點閱：10 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文提出一套融合深度強化學習技術之自適應運動控制策略，目的在提升人形機器手臂的自然性與靈活性。針對人形機器手臂的操控需求，此論文分別基於 Soft Actor-Critic (SAC) 演算法與 Twin Delayed DDPG (TD3) 演算法，設計了即時動作調整策略與抓放預定位策略，實現全方面姿勢運動與穩定抓取行為。系統中亦整合了自然姿勢神經網路與碰撞邊界神經網路，以提供符合人體手臂放鬆姿態特徵，同時避免自碰撞風險，提升安全性與運動穩定性。此研究創新之處在於捨棄傳統逆向運動學計算，改以關節空間作為控制基礎，使機器手臂能在無需複雜計算的情況下，達成指定姿勢且即時的運動調整行為。為驗證整體策略之泛用性與效能，論文中設計了以深度強化學習為基礎的抓放預定位策略，並於模擬環境及實體機器人上進行測試，確認該策略具備良好的Sim-to-Real轉移能力。本論文成功展現深度強化學習技術於人形機器手臂控制上的應用潛力，並藉由導入自然姿勢與碰撞邊界神經網路，使系統兼具自然性與安全性，對人形機器人操作系統之發展具重要參考價值。

This thesis proposes an adaptive motion control strategy that integrates deep reinforcement learning techniques to enhance the naturalness and flexibility of humanoid robotic arm movements. To address the control requirements of humanoid robotic arms, the study designs a real-time motion adjustment policy based on the Soft Actor-Critic (SAC) algorithm and a grasping prepositioning policy based on the Twin Delayed DDPG (TD3) algorithm, enabling comprehensive posture control and stable grasping behavior. The system also incorporates a natural posture neural network and a collision boundary neural network to provide features that reflect relaxed, human-like arm postures while avoiding self-collision risks, thereby improving safety and motion stability. The innovation of this research lies in replacing traditional inverse kinematics computations with joint space representations as the control foundation. This allows the robotic arm to achieve target postures and perform timely motion adjustments without complex calculations. To validate the generalizability and effectiveness of the overall strategy, the thesis develops a grasping prepositioning policy based on deep reinforcement learning, which is tested in both simulated environments and on a physical humanoid robot. The results confirm that the proposed strategy possesses strong sim-to-real transfer capabilities. This thesis successfully demonstrates the potential of applying deep reinforcement learning techniques to control humanoid robotic arms. By incorporating a natural posture neural network and a collision boundary neural network, the system achieves both natural movement and operational safety, providing significant reference value for the future development of humanoid robot control systems.

摘要 I
Abstract II
Acknowledgment III
Content	IV
List of Figures VII
List of Tables IX 
Chapter 1 Introduction	1
1	Motivation	1
2	Related Works	2
3	Thesis Organization	5
Chapter 2	Robot Hardware System	7
1	Overview	7
2	Hardware Specification	8
3	Actuator	11
3.1	Dynamixel Motor	12
3.2	Myactuator Motor	14
4	Power Supply	15
5	Control Kernel	16
6	RGBD Camera	17
7	System Architecture	18
Chapter 3	Method	20
1	Overview	20
2	Kinematics of Humanoid Robot	21
2.1	Forward Kinematics of Arm End-Effector	21
2.2	Forward Kinematics of Head Camera	25
3	SAC-based Moving Policy	27
3.1	Introduction	27
3.2	Soft Actor-Critic (SAC)	29
3.3	Action	31
3.4	State	32
3.5	Reward	34
4	TD3-Based Prepositioning Policy	39
4.1	Twin Delayed DDPG (TD3)	40
4.2	Action	42
4.3	State	42
4.4	Reward and Training Method	42
5	Natural Posture and Collision Boundary	44
5.1	Natural Posture Neural Network	45
5.2	Collision Boundary Neural Network	48
5.3	Natural Rate	51
6	Summary	52
Chapter 4	Simulations and Experimental Results	54
1	Overview	54
2	Experimental Setup	55
2.1	Simulation Environment – MuJoCo	55
2.2	Parameters in the RL models	55
3	Experimental Results	58
3.1	Experiment 1 – Moving Policy	58
3.2	Experiment 2 – Prepositioning Policy	62
3.3	Experiment 3 – Natural Posture	64
3.4	Experiment 4 – Collision Boundary	66
3.5	Experiment 5 – Natural Posture and Collision Boundary	69
3.6	Experiment 6 – Sim-to-Real Transfer of the Moving Policy	70
3.7	Experiment 7 – Real-Time Grasp/Place Task	74
3.8	Experiment 8 – Long Range Grasp/Place Task	79
4	Summary	81
Chapter 5	Conclusions and Future Work	82
1	Conclusions	82
2	Future Work	83
3	Declaration of the Use of Generative AI technologies	84
References	85
                                    

[1] J. Fan, Z. Wang, Y. Xie, and Z. Yang, "A Theoretical Analysis of Deep Q-Learning," 2019, arXiv:1901.00137.
[2] T. P. Lillicrap et al., "Continuous control with deep reinforcement learning," 2015, arXiv:1509.02971.
[3] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal Policy Optimization Algorithms," 2017, arXiv:1707.06347.
[4] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor," 2018, arXiv:1801.01290.
[5] S. Fujimoto, H. van Hoof, and D. Meger, "Addressing Function Approximation Error in Actor-Critic Methods," 2018, arXiv:1802.09477.
[6] A. Rupam Mahmood, D. Korenkevych, G. Vasan, W. Ma, and J. Bergstra, "Benchmarking Reinforcement Learning Algorithms on Real-World Robots," 2018, arXiv:1809.07731.
[7] Y. Xiang et al., "RMBench: Benchmarking Deep Reinforcement Learning for Robotic Manipulator Control," in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023, pp. 1207-1214.
[8] A. Franceschetti, E. Tosello, N. Castaman, and S. Ghidoni, "Robotic Arm Control and Task Training through Deep Reinforcement Learning," 2020, arXiv:2005.02632.
[9] M. Mueangprasert, P. Chermprayong, and K. Boonlong, "Robot Arm Movement Control by Model-based Reinforcement Learning using Machine Learning Regression Techniques and Particle Swarm Optimization," in 2023 Third International Symposium on Instrumentation, Control, Artificial Intelligence, and Robotics (ICA-SYMP), 2023, pp. 83-86.
[10] W. Yuanyang and M. N. Mahyuddin, "Grasping Deformable Objects in Industry Application: A Comprehensive Review of Robotic Manipulation," IEEE Access, vol. 13, pp. 33403-33423, 2025.
[11] M. S. Nazeer, C. Laschi, and E. Falotico, "RL-Based Adaptive Controller for High Precision Reaching in a Soft Robot Arm," IEEE Transactions on Robotics, vol. 40, pp. 2498-2512, 2024.
[12] P. Xie et al., "GAP-RL: Grasps as Points for RL Towards Dynamic Object Grasping," IEEE Robotics and Automation Letters, vol. 10, no. 1, pp. 40-47, 2025.
[13] W. Zhu, X. Guo, D. Owaki, K. Kutsuzawa, and M. Hayashibe, "A Survey of Sim-to-Real Transfer Techniques Applied to Reinforcement Learning for Bioinspired Robots," IEEE Trans Neural Netw Learn Syst, vol. 34, no. 7, pp. 3444-3459, Jul 2023.
[14] K. Duan and Z. Zou, "Enhancing Construction Robot Collaboration via Multiagent Reinforcement Learning," Journal of Intelligent Construction, vol. 3, no. 2, pp. 1-16, 2025.
[15] Y. Guo, Z. Jiang, Y.-J. Wang, J. Gao, and J. Chen, "Decentralized Motor Skill Learning for Complex Robotic Systems," IEEE Robotics and Automation Letters, vol. 8, no. 9, pp. 5791-5798, 2023.
[16] Y. Chen et al., "Bi-DexHands: Towards Human-Level Bimanual Dexterous Manipulation," IEEE Trans Pattern Anal Mach Intell, vol. 46, no. 5, pp. 2804-2818, May 2024.
[17] H.-R. Lee, S. Park, and J. Lee, "Bayesian Reinforcement Learning for Adaptive Balancing in an Assembly Line With Human-Robot Collaboration," IEEE Access, vol. 12, pp. 172256-172265, 2024.
[18] M. Seo et al., "Deep Imitation Learning for Humanoid Loco-manipulation Through Human Teleoperation," in 2023 IEEE-RAS 22nd International Conference on Humanoid Robots (Humanoids), 2023, pp. 1-8.
[19] J. Li et al., "OKAMI: Teaching Humanoid Robots Manipulation Skills through Single Video Imitation," 2024, arXiv:2410.11792.
[20] M. Kobayashi, T. Buamanee, Y. Uranishi, and H. Takemura, "ILBiT: Imitation Learning for Robot Using Position and Torque Information based on Bilateral Control with Transformer," 2024, arXiv:2401.16653.
[21] J.-H. Oh, I. Espinoza, D. Jung, and T.-S. Kim, "Bimanual Long-Horizon Manipulation Via Temporal-Context Transformer RL," IEEE Robotics and Automation Letters, vol. 9, no. 12, pp. 10898-10905, 2024.
[22] H. Kim, Y. Ohmura, and Y. Kuniyoshi, "Transformer-based deep imitation learning for dual-arm robot manipulation," in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, pp. 8965-8972.
[23] A. Brohan et al., "RT-1: Robotics Transformer for Real-World Control at Scale," 2022, arXiv:2212.06817.
[24] ROBOTIS. "Robotis." https://www.robotis.us/ (accessed June. 9, 2025).
[25] S. M. A. T. Co. "Myactuator-x series." https://www.myactuator.com/downloads-xseries (accessed June. 9, 2025).
[26] ROBOTIS. "ROBOTIS-P series." https://robotis.us/dynamixel-p/ (accessed June. 9, 2025).
[27] ROBOTIS. "PH54-200." https://emanual.robotis.com/docs/en/dxl/p/ph54-200-s500-r/ (accessed June. 9, 2025).
[28] ROBOTIS. "PH54-100." https://emanual.robotis.com/docs/en/dxl/p/ph54-100-s500-r/ (accessed June. 9, 2025).
[29] ROBOTIS. "ROBOTIS MX-series." https://emanual.robotis.com/docs/en/dxl/mx/ (accessed June. 9, 2025).
[30] ROBOTIS. "MX-106." https://emanual.robotis.com/docs/en/dxl/mx/mx-106-2/ (accessed June. 9, 2025).
[31] ROBOTIS. "MX-28." https://emanual.robotis.com/docs/en/dxl/mx/mx-28-2/ (accessed June. 9, 2025).
[32] MyActuator. "MyActuator RMD-X series." https://www.myactuator.com/rmd-x-planetarymotor (accessed June. 9, 2025).
[33] MyActuator. "RMD-X8." https://www.myactuator.com/x8-60-details (accessed June. 9, 2025).
[34] MyActuator. "RMD-X10." https://www.myactuator.com/x10-100-details (accessed June. 9, 2025).
[35] M. Well. "RSP-2000 series." https://www.meanwell.com/Upload/PDF/RSP-2000/RSP-2000-SPEC.PDF (accessed June. 9, 2025).
[36] M. Well. "LRS-350-24." https://meanwell-ps.com/products/lrs-350-24 (accessed June. 9, 2025).
[37] M. Well. "LRS-350-12." https://meanwell-ps.com/products/lrs-350-12 (accessed June. 9, 2025).
[38] Nvidia. "Jetson Xavier NX." https://www.nvidia.com/zh-tw/autonomous-machines/embedded-systems/jetson-xavier-nx/ (accessed June. 9, 2025).
[39] Intel. "D435i." https://www.intelrealsense.com/depth-camera-d435i/ (accessed June. 9, 2025).
[40] OpenAI. "SAC." https://spinningup.openai.com/en/latest/algorithms/sac.html (accessed July 01, 2025).
[41] OpenAI. "TD3." https://spinningup.openai.com/en/latest/algorithms/td3.html (accessed July 01, 2025).
[42] M. Andrychowicz et al., "Hindsight Experience Replay," 2017, arXiv:1707.01495.
[43] J.-Y. Yang and T.-H. S. Li, "Design and Implementation of Intuitive Human Robot Interface System by DDPG with HER and RCA," in Proc. 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2023. DOI: 10.1109/SMC53992.2023.10394305
[44] "Mujoco." https://mujoco.org/ (accessed June. 9, 2025).
[45] "Stable-Baselines3: Reliable Reinforcement Learning Implementations." https://stable-baselines3.readthedocs.io/en/master/ (accessed June. 9, 2025).
[46] Google. "MediaPipe." https://ai.google.dev/edge/mediapipe/solutions/guide (accessed June. 9, 2025).
[47] T.-H. S. Li, Y.-F. Ho, P.-H. Kuo, Y.-T. Ye, and L.-F. Wu, "Natural Walking Reference Generation Based on Double-Link LIPM Gait Planning Algorithm," IEEE Access, vol. 5, pp. 2459-2469, 2017.
[48] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," 2015, arXiv:1506.02640.
[49] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, "End-to-End Object Detection with Transformers," 2020, arXiv:2005.12872.
[50] OpenAI. "ChatGPT." https://openai.com/index/chatgpt/ (accessed June. 9, 2025).

校內：2030-08-05公開
校外：不公開電子論文尚未授權公開，紙本請查館藏目錄

簡易檢索 / 詳目顯示

相關論文