簡易檢索 / 詳目顯示

研究生: 王雲楷
WANG, YUN-KAI
論文名稱: 適用於辨識複雜獎勵函數之改良DQN演算法-使用多階層類神經網絡
An Improved DQN Algorithm for Identifying Complex Reward Functions Using Multi-stage Neural Network
指導教授: 侯廷偉
Hou, Ting-Wei
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 64
中文關鍵詞: 深度強化學習多階層類神經網絡策略變化
外文關鍵詞: Deep reinforcement learning, Multi-stage neural network, Strategy changes
相關次數: 點閱:51下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 目前已經有數篇有關深度強化學習的研究文獻,但都無法解決當遇到的目標物價值會改變的狀況。因此本研究提出以Deep Q-Networks(簡稱DQN)演算法為基礎的新學習方法。其最大的特色在於,將原本僅有單一階層的類神經網絡架構,增加至多階層。此外,本研究增加兩個特殊的機制於DQN演算法中,第一個方法稱為“深刻記憶(Profound Memory)”,此機制能讓機器在訓練時反覆學習到某些特別的狀況。另一個稱作"強化選擇(Enhanced Choice)",此機制能讓機器在類神經網絡還未有明顯策略時,做出的判斷不會過於隨機以及不定。在實際的實驗中,本研究以接球遊戲作為實驗平台,並更改了遊戲中的一些規則,讓接球這個動作不再是一直得到分數,而是在特定的情境下接到球反而會造成反效果。最後我們成功達成讓機器在面對同目標但價值不同時,能以切換不同策略的方式去適應環境之變化。

    There are several research papers on deep reinforcement learning. However, t the issue that the value of some target object will change in some specific situation still is not solved. A new learning method based on Deep Q-Networks (DQN) algorithm is proposed in this research. The key feature is that it expands the original neural network architecture from only one stage to multi stages. Two new mechanisms are proposed. The first one is called “Profound Memory”, which allows the machine to repeatedly learn certain special conditions during training. The second one is the "Enhance Choice" which allows the machine to make judgments that are not too random and uncertain when the neural network has no clear strategy. In the actual experiment, the catch game was implemented as the experimental platform and some rules were changed after the player had passed a threshold value. That is, the action of catching a ball would loses points instead of getting points as usual. Finally, the experiment showed that the machine can adapt to the change of situation by switching different strategies.

    摘要.............I Abstract...........II 誌謝.............IV Content............V LIST OF TABLES...........VI LIST OF FIGURES..........VII LIST OF Algorithm.....................VIII Chapter 1. Introduction.........1 1.1 Background..........1 1.2 Motivation..........2 1.3 Assumption..........................2 1.4 Outline.........3 Chapter 2. Related Works.........5 2.1 Convolution Neural Network.......5 2.2 Deep Q-Networks (DQN)........9 2.3 Why Existing DQN Can't Overcome Complex Reward Functions....14 2.4 Potential problems in Deep Reinforcement Learning..........16 Chapter 3. System Architecture........18 3.1 Multi-stage architecture of neural networks.........18 3.2 Improved DQN Algorithm.............22 3.3 Improved DQN Algorithm with Multi-stage Neural Networks...43 Chapter 4. Experimental Results and Discussion..........46 4.1 Experimental method........46 4.2 Compare the training results and discussion......51 Chapter 5. Conclusions and Future Works......57 5.1 Conclusions..........57 5.2 Future Works........59 Reference..........62

    [1] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra and M. Riedmiller, "Playing atari with deep reinforcement learning," in NIPS Deep Learning Workshop 2013, Lake Tahoe,U.S.A., 2013.
    [2] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. v. d. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel and D. Hassabis, "Mastering the game of Go with deep neural networks and tree search," Nature 529, no. 7587, pp. 484-489, Jan. 2016.
    [3] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. v. d. Driessche, T. Graepel and D. Hassabis, "Mastering the game of Go without human knowledge," Nature 550, no. 7676, pp. 354-359, Oct. 2017.
    [4] K. Fukushima, "Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position," Biol. Cybernetics, vol. 36, pp. 193-202, 1980.
    [5] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard and L. D. Jackel, "Backpropagation applied to handwritten zip code recognition," Neural Computation, vol. 1, no. 4, pp. 541-551, 10 1989.
    [6] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
    [7] A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," Advances in Neural Information Processing Systems, no. 25, pp. 1106-1114, 2012.
    [8] K. Jarrett, K. Kavukcuoglu, M. Ranzato and Y. LeCun, "What is the best multi-stage architecture for object recognition?," 2009 IEEE 12th International Conference on Computer Vision, pp. 2146-2153, 2009.
    [9] W. Ouyang, X. Wang, X. Zeng, S. Qiu, P. Luo, Y. Tian, H. Li, S. Yang, Z. Wang, C.-C. Loy and X. Tang, "DeepID-Net: deformable deep convolutional neural networks for object detection," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2403-2412, 2015.
    [10] M. Alina, C. Dragos, S. Emil and L. Marius, "A multi-stage multi-task neural network for aerial scene interpretation and geolocalization," arXiv preprint arXiv:1403.1840, 2018.
    [11] C. J. C. H. Watkins and P. Dayan, "Q-learning," Machine Learning, vol. 8, no. 3-4, pp. 279-292, 5 1992.
    [12] H.-y. Lee, "Courses_ML17," 2017. [Online]. Available: http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML17.html. [Accessed 20 4 2018].
    [13] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg and D. Hassabis, "Human-level control through deep reinforcement learning," Nature 518, pp. 529-533, 26 2 2015.
    [14] L.-J. Lin, "Reinforcement learning for robots using neural networks," Technical report, DTIC Document, 1993.
    [15] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver and D. Wierstra, "Continuous control with deep reinforcement learning," arXiv preprint, vol. arXiv:1509.02971, 2015.
    [16] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Harley, T. P. Lillicrap, D. Silver and K. Kavukcuoglu, "Asynchronous methods for deep reinforcement learning," The 33rd International Conference on Machine Learning, pp. 1928-1937, 2016.
    [17] P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup and D. Meger, "Deep reinforcement learning that matters," in Thirty-Second AAAI Conference On Artificial Intelligence (AAAI), pp.3207-3214, New Orleans, Louisiana, 2018.
    [18] A. Irpan, "Deep reinforcement learning doesn't work yet," 14 2 2018. [Online]. Available: https://www.alexirpan.com/2018/02/14/rl-hard.html.
    [19] C.-C. J. Kuo and Y. Chen, "On data-driven Saak Transform," Journal of visual communication and image representation, vol. 50, pp. 237-246, 2018.
    [20] C.-C. J. Kuo, "Understanding convolutional neural networks with a mathematical model," Journal of Visual Communication and Image Representation, vol. 41, pp. 406-413, 2016.
    [21] H. v. Hasselt, A. Guez and D. Silver, "Deep reinforcement learning with double Q-learning," in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp.2094-2100, Phoenix, Arizona, 2016.

    無法下載圖示 校內:2023-07-30公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE