簡易檢索 / 詳目顯示

研究生: 葉庭瑜
Ye, Ting-Yu
論文名稱: 基於深度增強式學習之二軸機械手臂視覺追蹤系統研究
Study on Visual Tracking System of a 2-DOF Manipulator Based on Deep Reinforcement Learning
指導教授: 鄭銘揚
Cheng, Ming-Yang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 103
中文關鍵詞: 機械手臂深度增強式學習視覺追蹤系統
外文關鍵詞: Industrial Manipulators, Deep Reinforcement Learning, Visual Tracking System
相關次數: 點閱:208下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著全球產業轉型邁向工廠自動化,機械手臂的身影早已遍布生產線上,為了賦予機械手臂學習與思考能力進而能因應不同狀況做出智慧決策,本論文引入增強式學習於機械手臂中,其採取獎勵機制進行學習,極適合用於控制機械手臂之序列決策,學習後即能因應不同的環境狀態給出最佳的決策,其中深度增強式學習為增強式學習之熱門研究領域,由於整合了深度學習與增強式學習,使之能解決更複雜的連續控制問題。有鑑於此,本論文實現三種基於深度增強式學習方法—Proximal Policy Optimization with KL-Penalty、Proximal Policy Optimization with CLIP與Deep Deterministic Policy Gradient之視覺追蹤系統,並應用於二軸SCARA機械手臂,使之能根據環境狀態變化給予最佳決策,最後根據學習性能與實機運行結果比較整體性能及應用於該任務之適用性,經由實驗結果顯示Deep Deterministic Policy Gradient在三者之中表現最為突出。

    With the state of global industry trending towards factory automation, the presence of robot manipulators has already begun to dominate production lines. In order to provide robot manipulators with the ability to make intelligent decisions according to different situations, this thesis introduces reinforcement learning into robot manipulators. The learning method is based on the given reward and is well-suited for controlling the sequence decisions of the robot manipulators. After completing its learning, it can make its optimal decisions according to different environmental states. Deep reinforcement learning is a popular branch of research in reinforcement learning. Due to the introduction of deep learning into reinforcement learning, more complex continuous control problems can be solved. In light of this, this thesis implements three deep reinforcement learning-based visual tracking systems—Proximal Policy Optimization with KL-Penalty, Proximal Policy Optimization with CLIP, and Deep Deterministic Policy Gradient—which are applied in a 2-DOF robot manipulator to enable it to make the best decision according to environmental changes. This thesis evaluates the overall performance and applicability of the three methods, including learning performance and visual tracking experimental results. Experimental results indicate that the Deep Deterministic Policy Gradient has the best performance among the three.

    中文摘要 I EXTENDED ABSTRACT II 誌謝 XV 目錄 XVI 表目錄 XIX 圖目錄 XX 第一章 緒論 1 1.1 研究動機與目的 1 1.2 文獻回顧 2 1.3 論文架構 5 第二章 攝影機模型與機械手臂運動學 6 2.1 攝影機模型 6 2.1.1 內部參數 7 2.1.2 外部參數 9 2.2 座標系轉換 10 2.2.1 間接校正法 11 2.2.2 直接校正法 13 2.2.3 線上校正法 15 2.3 二軸機械手臂運動學 17 2.3.1 順向運動學 17 2.3.2 逆向運動學 20 2.3.3 機器人雅可比矩陣 23 第三章 增強式學習介紹 24 3.1 機器學習 24 3.2 增強式學習 26 3.3 馬可夫決策過程 28 3.4 表格解法 34 3.4.1 動態規劃法 34 3.4.2 蒙地卡羅法 35 3.4.3 時間差分法 36 3.4.3.1 Q-learning 37 3.5 近似解法 39 3.5.1 基於價值之增強式學習 40 3.5.1.1 Deep Q-Network 40 3.5.2 基於策略之增強式學習 45 3.5.2.1 Policy Gradient 46 3.5.3 演員與評論者之增強式學習 47 3.5.3.1 Proximal Policy Optimization Algorithms 48 3.5.3.2 Deep Deterministic Policy Gradient 56 第四章 實驗設備與實驗結果 62 4.1 實驗設備與場景 62 4.1.1 二軸機械手臂 62 4.1.2 攝影機與鏡頭 63 4.1.3 實驗場景 64 4.2 實驗方法與結果 65 4.2.1 視覺追蹤實驗介紹 65 4.2.2 基於深度增強式學習之視覺追蹤系統控制架構設計 69 4.2.3 實驗結果與討論 77 第五章 結論與建議 93 5.1 結論 93 5.2 未來展望與建議 94 參考文獻 95

    [1] H. Lasi, P. Fettke, H. G. Kemper, T. Feld, and M. Hoffmann, “Industry 4.0,” Business & Information Systems Engineering, vol. 6, no. 4, pp. 239-242, Aug. 2014.
    [2] J. R. Parker, Algorithms for image processing and computer vision, Hoboken, New Jersey:John Wiley & Sons, 2010.
    [3] S. J. Russell and P. Norvig, Artificial intelligence: a modern approach, Upper Saddle River, New Jersey: Prentice Hall, 2009.
    [4] J. H. Holland, Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control and artificial intelligence, Cambridge, MA: MIT Press, 1992.
    [5] Y. Shoham, “Towards the AI index,” AI Magazine, vol. 38, no. 4, pp. 71-77, Win. 2017.
    [6] R. S. Sutton and A. G. Barto, Reinforcement learning: an introduction, Cambridge, MA:MIT Press, 1998.
    [7] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: a survey,” Journal of Artificial Intelligence Research, vol. 4, no. 1, pp.237-285, Jan. 1996.
    [8] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436-444, May 2015.
    [9] J. Schmidhuber, “Deep learning in neural networks: an overview,” Neural Networks, vol. 61, pp. 85-117, Jan. 2015.
    [10] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep reinforcement learning: a brief survey,” IEEE Signal Processing Magazine, vol. 34, no. 6, pp. 26-38, Nov. 2017.
    [11] Y. Li, “Deep reinforcement learning: an overview,” arXiv preprint arXiv:1701.07274, Jan. 2017.
    [12] J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in robotics: a survey,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238-1274, Aug. 2013.
    [13] P. Kormushev, S. Calinon, and D. Caldwell, “Reinforcement learning in robotics: Applications and real-world challenges,” Robotics, vol. 2, no. 3, pp.122-148, Jul. 2013.
    [14] Y. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta, F. F. Li, and A. Farhadi, “Target-driven visual navigation in indoor scenes using deep reinforcement learning,” in Proc. of IEEE international conference on robotics and automation, 2017, pp. 3357-3364.
    [15] S. Gu, E. Holly, T. Lillicrap, and S. Levine, “Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates,” in Proc. of IEEE international conference on robotics and automation, 2017, pp. 3389-3396.
    [16] S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,” arXiv preprint arXiv:1504.00702, Apr. 2015.
    [17] M. Breyer, F. Furrer, T. Novkovic, R. Siegwart, and J. Nieto, “Comparing task simplifications to learn closed-loop object picking using deep reinforcement learning,” IEEE Robotics and Automation Letters, vol. 4 , no. 2 , pp. 1549-1556, Apr. 2019.
    [18] S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, “Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection,” The International Journal of Robotics Research, vol. 37, no. 4-5, pp. 421-436, Apr. 2018.
    [19] M. Bishop, Pattern recognition and machine learning, New York, NY: Springer-Verlag, 2011.
    [20] E. Alpaydın, Introduction to machine learning, England, London: MIT Press, 2010.
    [21] D. R. Hush and B. G. Horne, “Progress in supervised neural networks,” IEEE Signal Processing Magazine, vol. 10, no. 1, pp. 8-39, Jan. 1993.
    [22] R. D. Reed and R. J. Marks, Neural smithing: supervised learning in feedforward artificial neural networks, Cambridge, MA: MIT Press, 1988.
    [23] O. Chapelle, B. Scholkopf, and A. Z. Eds, “Semi-supervised learning,” IEEE Transactions on Neural Networks, vol. 20, no.3, pp.542-542, Mar. 2009.
    [24] Chakraborty and B. Chakraborty, “A novel normalization technique for unsupervised learning in ANN,” IEEE Transactions on Neural Networks, vol. 11, no. 1, pp. 253-257, Jan. 2000.
    [25] M. L. Littman, “Reinforcement learning improves behavior from evaluative feedback,” Nature, vol. 521, no. 7553, pp. 445-451, May 2015.
    [26] R. Bellman, “Dynamic programming,” Science, vol. 153, no. 3731, pp. 34-37, Jul. 1966.
    [27] C. Robert and G. Casella, Monte carlo statistical methods, New York, NY: Springer-Verlag, 2004.
    [28] G. Tesauro, “Temporal difference learning and td-gammon,” Communications of the ACM, vol. 38, no. 3, pp. 58-68, Mar. 1995.
    [29] C. J. C. H. Watkins and P. Dayan, “Q-learning,” Machine Learning, vol. 8, no. 3-4, pp. 279-292, May 1992.
    [30] H. Shi, X. Li, K. S. Hwang, W. Pan, and G. Xu, “Decoupled visual servoing with fuzzy q-learning,” IEEE Transactions on Industrial Informatics, vol. 14, no. 1, pp.241-252, Jan. 2018.
    [31] Y. Wang, H. Lang, and C. W. D. Silva, “A hybrid visual servo controller for robust grasping by wheeled mobile robots,” IEEE/ASME Transactions on Mechatronics, vol. 15, no. 5, pp.757-769, Oct. 2010.
    [32] C. Sammut and G. I. Webb, Encyclopedia of machine learning and data mining, New York, NY: Springer US, 2017
    [33] W. B. Powell, Approximate dynamic programming: solving the curses of dimensionality, New York, NY: John Wiley & Sons, 2007.
    [34] R. S. Sutton, D. A. Mcallester, S. P. Singh, and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation,” in Proc. of the 12th International Conf. on Neural Information Processing Systems, 2000, pp. 1057-1063.
    [35] S. Melo, S. P. Meyn, and M. I. Ribeiro, “An analysis of reinforcement learning with function approximation,” in Proc. of the 25th international conference on Machine learning, 2008, pp. 664-671.
    [36] J. N. Tsitsiklis and B. V. Roy, “An analysis of temporal-difference learning with function approximation,” IEEE Transactions on Automatic Control, vol. 42, no. 5, pp. 674-690, May 1997.
    [37] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: a review and new perspectives,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798-1828, Aug. 2013.
    [38] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. V. D. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484-489, Jan. 2016.
    [39] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. V. D. Driessche, T. Graepel, and D. Hassabis, “Mastering the game of Go without human knowledge,” Nature, vol. 550, no. 76766, pp. 354-359, Oct. 2017.
    [40] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602, Dec. 2013.
    [41] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529-533, Feb. 2015.
    [42] T. P. Le, N. D. Quang, S. Y. Choi, and T. C. Chung, “Learning a self-driving bicycle using deep deterministic policy gradient,” in Proc. of International Conference on Control, Automation and Systems, 2018, pp. 231-236.
    [43] J. Peters and S. Schaal, “Natural actor-critic,” Neurocomputing, vol. 71, no. 7-9, pp.1180-1190, Mar. 2008.
    [44] H. V. Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,”in Proc. of the Thirtieth AAAI Conference on Artificial Intelligence, 2016, pp.2094-2100.
    [45] T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” arXiv preprint arXiv:1511.05952, Nov. 2015.
    [46] Z. Wang, T. Schaul, M. Hessel, H. V. Hasselt, M. Lanctot, and N. D. Freitas, “Dueling network architectures for deep reinforcement learning,” arXiv preprint arXiv:1511.06581, Nov. 2015.
    [47] M. Hessel, J. Modayil, H. V. Hasselt, T. Schaul, G. Ostrovski, W. Dabney, D. Horgan, B. Piot, M. Azar, and D. Silver, “Rainbow: combining improvements in deep reinforcement learning,”in Proc. of Thirty-Second AAAI Conference on Artificial Intelligence, 2018, pp. 3215-3222.
    [48] J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel, “Trust region policy optimization,”in Proc. of International Conference on Machine Learning, 2015, pp. 1889-1897.
    [49] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, Jul. 2017.
    [50] N. Heess, D. TB, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, T. Erez, Z. Wang, S. M. A. Eslami, M. Riedmiller, and D. Silver, “Emergence of locomotion behaviours in rich environments,”arXiv preprint arXiv:1707.02286, Jul. 2017.
    [51] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical recipes: the art of scientific computing, New York, NY: Cambridge University Press, 1986.
    [52] L. C. Baird, “Reinforcement learning in continuous time: advantage updating,” in Proc. of the IEEE International Conference on Neural Networks, 1994, pp. 2448-2453.
    [53] B. M. Gabriel, M. W. Hoffman, D. Budden, W. Dabney, D. Horgan, D. TB, A. Muldal, N. Heess, and T. Lillicrap, “Distributed distributional deterministic policy gradients,” arXiv preprint arXiv:1804.08617, Apr. 2018.
    [54] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, Sep. 2015.
    [55] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,”in Proc. of International conference on machine learning, 2016, pp. 1928-1937.
    [56] R. Hartley and A. Zisserman, Multiple view geometry in computer vision, New York, NY : Cambridge University Press, 2004.
    [57] J. Kannala and S. S. Brandt, “A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses,” IEEE transactions on pattern analysis and machine intelligence, vol. 28, no. 8, pp. 1335-1340, Aug. 2006.
    [58] 林潔君,基於視覺之工業用機械手臂物件夾取研究,碩士論文,國立成功大學,電機工程學系,台灣,2015。
    [59] 賴彥均,應用Q-learning於搭載機械手臂自走車系統之基於影像視覺伺服研究,碩士論文,電機工程學系,台灣,2017。
    [60] 張庭育,虛擬視覺伺服估測器及動態視覺伺服架構之研究,碩士論文,國立成功大學,電機工程學系,台灣,2018。
    [61] Y. Motai and A. Kosaka, “Hand–eye calibration applied to viewpoint selection for robotic vision,” IEEE Transactions on Industrial Electronics, vol. 55, no. 10, pp. 3731-3741, Mar. 2008.
    [62] R. Horaud and F. Dornaika, “Hand-eye calibration,” The international journal of robotics research, vol. 14, no. 3, pp. 195-210, Jun. 1995.
    [63] C. Cai, N. Somani, S. Nair, D. Mendoza, and A. Knoll, “Uncalibrated stereo visual servoing for manipulators using virtual impedance control,” in Proc. of the 13th International Conference on Control Automation Robotics & Vision, 2014, pp. 1888-1893.
    [64] R. P. Paul, Robot manipulators: mathematics, programming, and control : the computer control of robot manipulators, Cambridge, MA: MIT Press, 1982.
    [65] J. J. Craig, Introduction to robotics: mechanics and control, Boston, MA: Addison-Wesley, 1989.
    [66] L. Sciavicco and B. Siciliano, Modelling and control of robot manipulators, London, U.K.: Springer, 2000.
    [67] 胡智皓,選擇順應性裝配機械手臂之外力估測與順應控制研究,碩士論文,國立成功大學,電機工程學系,台灣,2016。
    [68] M. W. Spong, S. Hutchinson, and M. Vidyasagar, Robot modeling and control, Hoboken, NJ: Wiley, 2006.
    [69] H. R. Frederick, D. A. Waterman, and D. B. Lenat, Building expert systems, Boston, MA: Addison-Wesley, 1983.
    [70] T. Bäck, Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms, New York, NY: Oxford university press, 1996.
    [71] G. E. Monahan, “State of the art—a survey of partially observable markov decision processes: theory, models, and algorithms,” Management Science, vol. 28, no.1, pp. 1-16, Jan. 1982.
    [72] R. A. Howard, Dynamic programming and markov processes, England, Oxford: John Wiley, 1960.
    [73] K. Mehlhorn, B. R. Newell, P. M. Todd, M. D. Lee, K. Morgan, V. A. Braithwaite, D. Hausmann, K. Fiedler, and C. Gonzalez, “Unpacking the exploration-exploitation tradeoff: a synthesis of human and animal literatures,” Decision, vol. 2, no. 3, pp. 191-215, Jul. 2015.
    [74] J. G. March, “Exploration and exploitation in organizational learning,” Organization Science, vol. 2, no. 1, pp. 71-87, Feb. 1991.
    [75] 姜佳伶,基於個體推薦系統之Q-learning性能改良研究,碩士論文,國立成功大學,電機工程學系,台灣,2018。
    [76] M. Sniedovich, Dynamic programming: foundations and principles, Boca Raton, Florida : CYC Press, 2010.
    [77] D. Silver, G. Lever, N. Heess, T. Degris, D, Wierstra, and M. Riedmiller, “Deterministic policy gradient algorithms,” in Proc. of the International Conference on International Conference on Machine Learning, 2014, pp. I-387-I-395.

    無法下載圖示 校內:2024-06-18公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE