簡易檢索 / 詳目顯示

研究生: 張恆嘉
Chang, Heng-Chia
論文名稱: 互動式學習模糊控制器
FUZZY FUNCTION APPROXIMATION USING TD-LAMBDA IN REINFORCEMENT LEARNING
指導教授: 譚俊豪
Tarn, J.H.
學位類別: 碩士
Master
系所名稱: 工學院 - 航空太空工程學系
Department of Aeronautics & Astronautics
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 41
外文關鍵詞: reinforcement learning, fuzzy control, temporal difference
相關次數: 點閱:88下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • A modification of actor-critic method in reinforcement learning is proposed in this thesis. We try to use a single fuzzy inference system to replace the actor element and critic element to learn by interact with environment.
    Function approximation is essential to reinforcement learning in storing value function. In large application of reinforcement learning (RL) require the use of generalizing function approximators such neural networks, decision-trees, or instance-based methods, in which all function approximation effort goes into estimating a value function, with the action-selection policy with respect to the estimated values. In this work we explore an alternative approach actor-critic method to function approximation in reinforcement learning using a fuzzy inference system. Rather than approximating a value function and using it to compute a policy, we approximate a policy directly and value function at the same time using a MIMO fuzzy inference system.

    ABSTRACT CONTENTS LIST OF FIGURE CHAPTER I INTRODUCTION 1 II REINFORCEMENT LEARNING 4 2.1 Elements of reinforcement learning 5 2.2 Update of value function 7 2.3 Incremental form of Monte Carlo method 8 2.4 Temporal difference TD 9 2.5 TD 11 2.6 Eligibility trace 12 III FUZZY INFERENCE SYSTEM 14 3.1 General description 14 3.2 Fuzzification 16 3.3 Fuzzy inference 17 3.4 Defuzzification 18 IV FUNCTION APPROXIMATION 19 4.1 Architecture for function approximation 19 4.2 Incremental Gradient-Descent method 21 4.3 Step-Size 22 4.4 Fuzzy function approximation 23 4.5 Fuzzy function approximation using TD fortraining 24 V SIMULATION 29 5.1 Actual surface 31 5.2 Training data (phase plot) 32 5.3 Approximated surface using 21X21 rules 33 5.4 Approximated surface using 51X51 rules 34 5.5 Approximated error comparison 35 VI CONCLUSION & DISCUSSION 38 REFERENCES 40

    [1] R. S. Sutton and A.G. Barto , Reinforcement Learning;An Introduction , The MIT Press , 1998.

    [2] Dimitri P. Bertsekas and John N. Tsitsiklis , Neuro-Dynamic Programming , Athena Scientific 1996.

    [3] R. S. Sutton, “Learning to predict by the methods of temporal differences”, Mach. Learning, vol. 3, pp. 9-44, 1988.

    [4] C.C LEE, “Fuzzy logic in control systems: Fuzzy logic controller-Part Ⅰ”, IEEE Trans. Syst., Man, Cybern., vol, 20, pp. 404-418, Mar. 1990.

    [5] C.C LEE, “Fuzzy logic in control systems: Fuzzy logic controller-Part Ⅱ”, IEEE Trans. Syst., Man, Cybern., vol, 20, pp. 419-435, Mar. 1990.

    [6] R. S. Sutton, David McAllester, Satinder Singh, and Yishay Mansour, “Policy gradient method for reinforcement learning with function approximation”, NIPS 1999.

    [7] John N. Tsitsiklis, Benjamin Van Roy “An analysis of temporal-difference Learning with function Approximation”, IEEE Transactions on Automatic Control, vol. 42, No. 5, pp. 674-690, May 1997.

    [8] Lionel Jouffe, “Fuzzy inference system learning by reinforcement methods”, IEEE Transactions on System, Man, and Cybernetics-part c: Application and Reviews, vol. 28, No.3, Aug. 1998.

    [9] Hiroyoshi NOMURA, Isao HAYASHI, Noboru WAKAMI, “A learning method od fuzzy inference rules by descent method”, 1992 IEEE.

    [10] Yung-Yaw Chen, Kao-Zong Lin, Shun-Tang Hau, “A self-learning fuzzy controller”, International conference of fuzzy system, 1992.

    下載圖示 校內:立即公開
    校外:2009-02-11公開
    QR CODE