| 研究生: |
張恆嘉 Chang, Heng-Chia |
|---|---|
| 論文名稱: |
互動式學習模糊控制器 FUZZY FUNCTION APPROXIMATION USING TD-LAMBDA IN REINFORCEMENT LEARNING |
| 指導教授: |
譚俊豪
Tarn, J.H. |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 航空太空工程學系 Department of Aeronautics & Astronautics |
| 論文出版年: | 2009 |
| 畢業學年度: | 97 |
| 語文別: | 英文 |
| 論文頁數: | 41 |
| 外文關鍵詞: | reinforcement learning, fuzzy control, temporal difference |
| 相關次數: | 點閱:88 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
A modification of actor-critic method in reinforcement learning is proposed in this thesis. We try to use a single fuzzy inference system to replace the actor element and critic element to learn by interact with environment.
Function approximation is essential to reinforcement learning in storing value function. In large application of reinforcement learning (RL) require the use of generalizing function approximators such neural networks, decision-trees, or instance-based methods, in which all function approximation effort goes into estimating a value function, with the action-selection policy with respect to the estimated values. In this work we explore an alternative approach actor-critic method to function approximation in reinforcement learning using a fuzzy inference system. Rather than approximating a value function and using it to compute a policy, we approximate a policy directly and value function at the same time using a MIMO fuzzy inference system.
[1] R. S. Sutton and A.G. Barto , Reinforcement Learning;An Introduction , The MIT Press , 1998.
[2] Dimitri P. Bertsekas and John N. Tsitsiklis , Neuro-Dynamic Programming , Athena Scientific 1996.
[3] R. S. Sutton, “Learning to predict by the methods of temporal differences”, Mach. Learning, vol. 3, pp. 9-44, 1988.
[4] C.C LEE, “Fuzzy logic in control systems: Fuzzy logic controller-Part Ⅰ”, IEEE Trans. Syst., Man, Cybern., vol, 20, pp. 404-418, Mar. 1990.
[5] C.C LEE, “Fuzzy logic in control systems: Fuzzy logic controller-Part Ⅱ”, IEEE Trans. Syst., Man, Cybern., vol, 20, pp. 419-435, Mar. 1990.
[6] R. S. Sutton, David McAllester, Satinder Singh, and Yishay Mansour, “Policy gradient method for reinforcement learning with function approximation”, NIPS 1999.
[7] John N. Tsitsiklis, Benjamin Van Roy “An analysis of temporal-difference Learning with function Approximation”, IEEE Transactions on Automatic Control, vol. 42, No. 5, pp. 674-690, May 1997.
[8] Lionel Jouffe, “Fuzzy inference system learning by reinforcement methods”, IEEE Transactions on System, Man, and Cybernetics-part c: Application and Reviews, vol. 28, No.3, Aug. 1998.
[9] Hiroyoshi NOMURA, Isao HAYASHI, Noboru WAKAMI, “A learning method od fuzzy inference rules by descent method”, 1992 IEEE.
[10] Yung-Yaw Chen, Kao-Zong Lin, Shun-Tang Hau, “A self-learning fuzzy controller”, International conference of fuzzy system, 1992.