簡易檢索 / 詳目顯示

研究生: 葉皓傑
Ye, Hao-Jie
論文名稱: 智慧型系統的階層式互動式學習方法之研究
A hierarchical reinforcement learning method to intelligent systems
指導教授: 譚俊豪
Tarn, Jyun-Hao
學位類別: 碩士
Master
系所名稱: 工學院 - 航空太空工程學系
Department of Aeronautics & Astronautics
論文出版年: 2006
畢業學年度: 94
語文別: 中文
論文頁數: 71
中文關鍵詞: 互動式學習方法階層式系統
外文關鍵詞: reinforcement learning, hierarchical systems
相關次數: 點閱:94下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於電機資訊技術的發展,使得傳統的機械系統發生很大的改變。機械系統與電資系統整合之後,產生了所謂的機電系統。機電系統比傳統機械系統可以利用更多的資訊,但是在分析設計上變得夠困難。這些改變使得控制工程的挑戰愈來愈多,控制系統漸漸地必須擁有邏輯判斷與做決策的能力。本論文是研究利用機械學習的方法,使得控制系統擁有邏輯判斷與做決策的能力。所使用的機械學習方法為模糊邏輯與互動式學習理論。本研究是以一清掃機器人為例,使用互動式學習理論訓練此機器人,並且為了加強學習能力,把此機器人視為一階層化的系統。訓練之後,機器人的確展現了不錯的邏輯判斷與決策能力。

    Because of the progression of electrical and information engineering, a lot of changes occur in traditional mechanical systems. After the integration of electrical and mechanical systems, mechatronics systems come into existence. Comparing with traditional mechanical systems, mechatronics systems can use more information but the analysis and design of mechatronics systems is more difficult. By these changes, there are more and more challenges to control engineering. Control systems must have the ability of logic judgement and decision-making gradually. This research use machine learning method to let control systems have the ability of logic judgement and decision-making. The machine learning method used is fuzzy logic and reinforcement learning. The case study is a cleaning robot. To enhance learning ability, the robot is taken as a hierarchical system. After training, the robot emerge logic judgement and decision-making ability

    目錄 誌謝....................................................................I 中文摘要...............................................................II 英文摘要..............................................................III 目錄...................................................................IV 圖目錄.................................................................VI 符號說明.............................................................VIII 第一章 導論...........................................................1 1-1 前言與動機...................................................1 1-2 階層式系統的概念與優點.......................................2 1-3 階層式系統的困難.............................................4 1-4 文獻回顧.....................................................5 1-5 內容大綱.....................................................6 第二章 模糊系統簡介...................................................7 2-1 模糊集合.....................................................7 2-2 歸屬函數.....................................................9 2-3 模糊規則....................................................10 2-4 模糊推論與解模糊化..........................................11 2-5 模糊系統....................................................12 第三章 互動式學習法簡介..............................................13 3-1 互動式學習架構..............................................13 3-2 Monte-Carlo演算法...........................................16 3-3 Temporal Difference演算法...................................17 3-4動作選取策略..................................................17 第四章 清掃機器人的例子..............................................19 4-1 Case1問題敘述與模擬方法.....................................19 4-2 state設定...................................................23 4-3 Case1非階層式學習模擬.......................................24 4-4 Case1階層式學習模擬.........................................28 4-4-1 階層式學習架構.......................................28 4-4-2 Search決策者的學習...................................30 4-4-3 Clean決策者的學習....................................31 4-4-4 第一層決策者的學習...................................31 4-4-5 Case1階層式學習結果..................................34 4-5 Case1階層式學習灰塵集中模擬.................................40 4-6 Case2階層式學習模擬.........................................44 4-7 Case3階層式學習模擬.........................................46 4-8 不同地圖的特性與模糊階層式架構..............................48 4-9 第一層決策者與第二層決策者的on-line學習補償.................57 4-10 Case4與Case5 使用on-line學習補償............................63 4-11小結.........................................................67 第五章 結論與未來工作................................................69 參考文獻...............................................................70 自述...................................................................71

    [1] C.J.C.H. Watkins. “Learning from delayed Rewards” ,PhD thesis ,King’s College , Cambridge , 1989.

    [2] M.A. Lewis, A.H. Fagg, and A. Solidum. “Genetic programming approach to the construction of a neural network for control of a walking robot” ,In Proceedings of the 1992 IEEE International Conference on Robotics and Automation , pages2618-2623 , Nice , France , 1992.

    [3] Long-Ji Lin . “Reinforcement Learning for Robots Using Neural Networks” , PhD thesis , Carnegie Mellon University , School of Computer Science ,1993.

    [4] Long-Ji Lin . “Hierarchical Learning of Robots Skills by Reinforcement",
    IEEE International Conference on Neural Networks ,vol. 1 ,page 181-186 ,1993

    [5] C.T. Lin and I.F Chung . “A Reinforcement Neuro-Fuzzy Combiner for Multiobjective Control” , IEEE Transactions on Systems, Man, and Cybernetics--Part B:Cybernetics
    , Vol.29, No.6, page 726-744, 1999.

    [6] C.T. Lin and C.S. George Lee. “Neural Fuzzy System:A Neuro-Fuzzy Synergism to Intelligent Systems” , Prentice-Hall International, Inc., 1996.

    [7] J.S. Roger Jang, C.T. Sun and E. Mizutani. “Neuro-Fuzzy and Soft Computing” , Prentice-Hall International, Inc., 1997.

    [8] 蘇木春、張孝德 , “機器學習:類神經網路、模糊系統以及基因演算法” , 全華 , 2004.

    [9] R.S. Sutton and A.G. Barto , “Reinforcement Learning;An Introduction” , The MIT Press , 1998.

    下載圖示 校內:立即公開
    校外:2006-08-18公開
    QR CODE