成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	張恆嘉 Chang, Heng-Chia
論文名稱：	互動式學習模糊控制器 FUZZY FUNCTION APPROXIMATION USING TD-LAMBDA IN REINFORCEMENT LEARNING
指導教授：	譚俊豪 Tarn, J.H.
學位類別：	碩士 Master
系所名稱：	工學院 - 航空太空工程學系 Department of Aeronautics & Astronautics
論文出版年：	2009
畢業學年度：	97
語文別：	英文
論文頁數：	41
外文關鍵詞：	reinforcement learning, fuzzy control, temporal difference
相關次數：	點閱：88 下載：1
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

A modification of actor-critic method in reinforcement learning is proposed in this thesis. We try to use a single fuzzy inference system to replace the actor element and critic element to learn by interact with environment.
Function approximation is essential to reinforcement learning in storing value function. In large application of reinforcement learning (RL) require the use of generalizing function approximators such neural networks, decision-trees, or instance-based methods, in which all function approximation effort goes into estimating a value function, with the action-selection policy with respect to the estimated values. In this work we explore an alternative approach actor-critic method to function approximation in reinforcement learning using a fuzzy inference system. Rather than approximating a value function and using it to compute a policy, we approximate a policy directly and value function at the same time using a MIMO fuzzy inference system.

ABSTRACT
CONTENTS
LIST OF FIGURE

CHAPTER
I	INTRODUCTION        1
II	REINFORCEMENT LEARNING        4
2.1 	Elements of reinforcement learning        5
2.2 	Update of value function        7
2.3 	Incremental form of Monte Carlo method        8
2.4 	Temporal difference TD        9
2.5 	TD        11
2.6 	Eligibility trace        12
III 	FUZZY INFERENCE SYSTEM        14
3.1 	General description        14
3.2 	Fuzzification        16
3.3 	Fuzzy inference        17
3.4 	Defuzzification        18
IV 	FUNCTION APPROXIMATION        19
4.1 	Architecture for function approximation        19
4.2 	Incremental Gradient-Descent method        21
4.3 	Step-Size	22
4.4 	Fuzzy function approximation        23
4.5 	Fuzzy function approximation using TD 
         fortraining        24
V 	SIMULATION        29
5.1 	Actual surface        31
5.2 	Training data (phase plot)        32
5.3 	Approximated surface using 21X21 rules        33
5.4 	Approximated surface using 51X51 rules        34
5.5 	Approximated error comparison        35
VI 	CONCLUSION & DISCUSSION        38

REFERENCES        40
                                    

[1] R. S. Sutton and A.G. Barto , Reinforcement Learning；An Introduction , The MIT Press , 1998.

[2] Dimitri P. Bertsekas and John N. Tsitsiklis , Neuro-Dynamic Programming , Athena Scientific 1996.

[3] R. S. Sutton, “Learning to predict by the methods of temporal differences”, Mach. Learning, vol. 3, pp. 9-44, 1988.

[4] C.C LEE, “Fuzzy logic in control systems: Fuzzy logic controller－Part Ⅰ”, IEEE Trans. Syst., Man, Cybern., vol, 20, pp. 404-418, Mar. 1990.

[5] C.C LEE, “Fuzzy logic in control systems: Fuzzy logic controller－Part Ⅱ”, IEEE Trans. Syst., Man, Cybern., vol, 20, pp. 419-435, Mar. 1990.

[6] R. S. Sutton, David McAllester, Satinder Singh, and Yishay Mansour, “Policy gradient method for reinforcement learning with function approximation”, NIPS 1999.

[7] John N. Tsitsiklis, Benjamin Van Roy “An analysis of temporal-difference Learning with function Approximation”, IEEE Transactions on Automatic Control, vol. 42, No. 5, pp. 674-690, May 1997.

[8] Lionel Jouffe, “Fuzzy inference system learning by reinforcement methods”, IEEE Transactions on System, Man, and Cybernetics－part c: Application and Reviews, vol. 28, No.3, Aug. 1998.

[9] Hiroyoshi NOMURA, Isao HAYASHI, Noboru WAKAMI, “A learning method od fuzzy inference rules by descent method”, 1992 IEEE.

[10] Yung-Yaw Chen, Kao-Zong Lin, Shun-Tang Hau, “A self-learning fuzzy controller”, International conference of fuzzy system, 1992.

校內：立即公開
校外：2009-02-11公開

簡易檢索 / 詳目顯示

相關論文