簡易檢索 / 詳目顯示

研究生: 姜佳伶
CHIANG, CHIA-LING
論文名稱: 基於個體推薦系統之Q-learning性能改良研究
Convergence Improvement of Q-learning Based on Personalized Recommendation System
指導教授: 鄭銘揚
Cheng, Ming-Yang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 91
中文關鍵詞: 增強式學習Q-learning個體化學習推薦系統
外文關鍵詞: Reinforcement Learning, Q-learning, Personal Learning, Recommendation System
相關次數: 點閱:93下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於科技與電腦技術發展日新月異,藉由人工智慧,人類與電腦機器相互合作提升工作效率與增加便利性之目標得以實現。在人工智慧諸多領域中,增強式學習採取人類進行學習時之獎勵懲罰機制,利用真實環境中之回饋信號使得電腦機器在複雜環境中仍然能進行自我且強健之學習。增強式學習之元素包含學習代理人、當下環境之狀態集合、動作集合與即時獎勵機制與環境本身。雖然增強式學習之用途廣泛,在實現上仍面臨數項困境:其一是增強式學習之學習代理人於學習階段中除需選擇已自我探索過之動作,亦需同時具備向外探勘之傾向,然而這兩者之間的取捨不易,很難找到適當的平衡點,若選擇失當可能會造成學習錯誤或者學習成本上升;此外由於增強式學習之學習代理人需與環境互動獲得即時獎勵,然此互動過程可能造成學習時間過於冗長。為了克服前述困難,本論文提出一新作法,藉由引入一個體化推薦系統來提供增強式學習Q-learning之前饋候選動作與回饋獎勵機制,以實現互相教學之自我適應學習。本論文以懸崖散步模擬及視覺追蹤實驗來驗證所提方法之有效性,模擬及實驗結果顯示本論文所提方法確實可行。

    Benefiting from vast improvements in computer technology, cooperation between humankind and computers has made the goal of enhancing workplace efficiency and convenience a reality by exploiting Artificial Intelligence. Among the many sub-research fields of Artificial Intelligence, Reinforcement Learning (RL) exploits the concept of reward/penalty in human learning so that the feedback signals of the environment can be used for self-learning without previous knowledge. The basic elements of RL include state, action, environment, learning agent, and reward/penalty. Although RL has been applied to many research fields, there are difficulties when implementing it in real world applications. One difficulty is the tradeoff between exploration and exploitation for the agent of RL in choosing a proper action—an improper action may lead to learning failure or an increase in learning cost. The other difficulty is that the learning agent of RL needs to interact with the environment to attain this real-time reward/penalty; however, the learning time consumed in the interaction process may be too long. In order to overcome the aforementioned difficulties, this thesis proposes an approach that employs a personalized recommendation system to provide a feedforward candidate action for RL to implement self-adaptive learning through teaching. A cliff-walking computer simulation and a visual tracking experiment using a pan/tilt camera are both conducted to assess the performance of the proposed approach. Experimental results show that this personalized recommendation system-based RL is able to improve the effectiveness and practicality of RL.

    中文摘要 I EXTENDED ABSTRACT II 誌謝 XVII 目錄 XVIII 表目錄 XX 圖目錄 XXI 第一章 緒論 1 1.1研究動機與目的 1 1.2文獻回顧 2 1.2.1增強式學習 2 1.2.2個體化推薦系統 4 1.3論文架構 5 第二章 增強式學習介紹 6 2.1增強式學習 6 2.2馬可夫決策過程 9 2.2.1動態規劃法 13 2.2.2蒙地卡羅方法 16 2.2.3時間差分方法 19 2.3 增強式學習之困境 22 第三章 基於個體推薦系統之Q-Learning 24 3.1 簡介 24 3.2 個體化學習 24 3.2.1 傳統式個體化學習 26 3.2.2 基於社交領域之個體化學習 27 3.3 推薦系統介紹 29 3.3.1 基於科技之增強學習 30 3.3.2 TEL推薦系統模型 30 3.3.3 混合式信任推薦系統 31 3.4 個體化混合推薦系統設計 34 3.5 引入新型推薦系統之Q-learning 38 3.5.1懸崖散步模擬架構設計 43 3.5.2視覺追蹤系統架構設計 47 第四章 實驗結果與設備介紹 52 4.1 實驗設備與場景 52 4.1.1 實驗設備 52 4.1.2 實驗場景 57 4.2 實驗結果與討論 59 4.2.1懸崖散步模擬 61 4.2.2視覺追蹤系統 70 4.2.3結果比較 80 第五章 討論與建議 82 5.1 結論 82 5.2 未來展望與建議 82 參考文獻 84

    [1] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforement Learning: A Survey,” Journal of Articial Intelligence Research, vol. 4, pp. 237-285, May. 1996.
    [2] R. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, U.K.: The MIT Press, 1998.
    [3] D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Belmont, USA: Athena Scientific, 1996.
    [4] C. J. C. H. Watkins and P. Dayan, “Q-Learning,” Machine Learning, vol. 8, no. 3-4, pp. 279-292, May 1992.
    [5] D. Y. Dong, C. L. Chen, H. X. Li, and T. J. Tarn, “Quantum Reinforcement Learning,” IEEE Transactions on Systems, Man, and Cybernetics Part B: Cybernetics, vol. 38, no. 5, pp. 1207-1220, Jul. 2008.
    [6] J. S. Campbell, S. N. Givigi, and H. M. Schwartz, “Multiple Model Q-Learning for Stochastic Asynchronous Rewards,” Intelligent and Robotic Systems, vol. 81, no. 3-4, pp. 407-422, Mar. 2016.
    [7] C. Chen, H. Li, and D. Dong, “Hybrid Control for Autonomous Mobile Robot Navigation Using Hierarchical Q-Learning,” IEEE Robotics and Automation Magazine, vol. 15, no. 2, pp. 37-47, Jun. 2008.
    [8] W. D. Smart and L. P. Kaelbling, “Effective Reinforcement Learning for Mobile Robots,” in Proceeding of the IEEE Conference on Robotics and Automation, Washington, DC, USA, May 2002, pp. 3404-3410.
    [9] T. Kondo and K. Ito, “A Reinforcement Learning with Evolutionary State Recruitment Strategy for Autonomous Mobile Robots Control,” Robotics and Autonomous Systems, vol. 46, no. 2, pp. 111-124, Feb. 2004.
    [10] C. I. Connolly, “Harmonic Functions and Collision Probabilities,” International Journal of Robotics Research, vol. 16, no. 4, pp. 497-507, Aug. 1997.
    [11] S. G. Tzafestas and G. G. Rigatos, “Fuzzy Reinforcement Learning Control for Compliance Tasks of Robotic Manipulators,” IEEE Transactions on Systems, Man, and Cybernetics Part B: Cybernetics, vol. 32, no. 1, pp. 107-113, Feb. 2002.
    [12] M. J. Er and C. Deng, “Online Tuning of Fuzzy Inference Systems Using Dynamic Fuzzy Q-Learning,” IEEE Transactions on Systems Man and Cybernetics Part B: Cybernetics, vol. 34, no. 3, pp. 1478-1489, Jun. 2004.
    [13] K. Mehlhorn, B. R. Newell, P. M. Todd, M. D. Lee, K. Morgan, V. A. Braithwaite, D. Hausmann, K. Fiedler, and C. Gonzalez, “Unpacking the Exploration-Exploitation Tradeoff: A Synthesis of Human and Animal Literatures,” Decision, vol. 2, no. 3, pp. 191-215, Jul. 2015.
    [14] J. G. March, “Exploration and Exploitation in Organizational Learning,” Organization Science, vol. 2, no. 1, pp. 71-87, Feb. 1991.
    [15] S. Auh, “Balancing Exploration and Exploitation: The Moderating Role of Competitive Intensity,” Journal of Business Research, vol. 58, no. 12, pp. 1652-1661, Dec. 2005.
    [16] R. S. Sutton, “Learning to Predict by the Methods of Temporal Differences,” Machine Learning, vol. 3, no. 1, pp. 9-44, Aug. 1988.
    [17] M. Guo, Y. Liu, and J. Malec, “A New Q-Learning Algorithm Based on the Metropolis Criterion,” IEEE Transactions on Systems, Man, and Cybernetics Part B: Cybernetics, vol. 34, no. 5, pp. 2140-2143, Sep. 2004
    [18] P. Y. Glorennec and L. Jouffe, “Fuzzy Q-Learning,” in Proceedings of the 6th IEEE International Conference on Fuzzy Systems, Barcelona, Spain, 1997, pp. 659-662.
    [19] V. Derhami, V. J. Majd, and M. N. Ahmadabadi, “Exploration and Explotration Balance Management in Fuzzy Reinforcement Learning,” Fuzzy Sets and Systems, vol. 161, no. 4, pp. 578-595, Feb. 2010.
    [20] S. M. Weiss and C. A. Kulikowski, Computer Systems That Learn. San Francisco, USA: Morgan Kaufmann Publishers, 1991.
    [21] S. B. Kotsiantis, “Supervised Machine Learning: A Review of Classification Techniques”, in Proceedings of the 2007 Conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies, Amsterdam, The Netherlands, Jul. 2007, pp. 3-24.
    [22] S. Haykin, Neural Networks and Learning Machines. London, U.K.: Pearson, 2008.
    [23] G. Hinton and T.J. Sejnowski, Unsupervised Learning and Map Formation: Foundations of Neural Computation. Cambridge, U.K.: The MIT Press, 1999.
    [24] 蔡孟璇,運用辦監督式學習法於小樣本分類,博士論文,國立成功大學,工業與資訊管理研究所,台灣,2012。
    [25] M. L. Littman, “Reinforcement Learning Improves Behaviour from Evaluative Feedback,” Nature, vol. 521, pp. 445-451, May 2015.
    [26] A. Gosavi, Simulation-based Optimization: Parametric Optimization Techniques and Reinforcement. New York, USA: Springer, 2003.
    [27] 林敬斌,使用增強式學習法改善一個臺灣的股價指數期貨當沖交易系統,碩士論文,國立臺灣大學,資訊工程研究所,台灣,2009。
    [28] R. J. Williams, “Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning,” Machine Learning, vol. 8, no. 3-4, pp. 229-256, May 1992.
    [29] S. F. Desouky and H. M. Schwartz, “A Novel Technique to Design Fuzzy Logic Controller Using Q-learning and Genetic Algorithm in the Pursuit-Evation Game,” in Proceedings of the IEEE Conference on System Man and Cybernetics, San Antonio, USA, Oct. 2009, pp. 2609-2615.
    [30] G. Tesauro, “Temporal Difference Learning and TD-Gammon,” Communications of the ACM, vol. 38, no. 3, pp. 58-68, Mar. 1995.
    [31] V. Gullapalli, J. A. Franklin, and H. Benbrahim, “Acquiring Robot Skills via Reinforcement Learning,” IEEE Control Systems Magazine, vol. 14, no. 1, pp. 13-24, Feb. 1994.
    [32] C. Chunlin and C. Zonghai, “Reinforcement Learning for Mobile Robot: From Reaction to Deliberation,” Journal of Systems Engineering and Electronics, vol. 16, no. 3, pp. 611-617, Sep. 2005.
    [33] N. Mohajerin, M.B. Menhaj, and A. Doustmohammadi, “A Reinforcement Learning Fuzzy Controller for the Ball and Plate System,” in Proceedings of the IEEE Conference on Fuzzy System, Barcelona, Spain, Jul. 2010, pp. 1-8.
    [34] A. Konar, I. G. Chakraborty, S. J. Singh, L. C. Jain, and Atulya K. Nagar, “A Deterministic Improved Q-Learning for Path Planning of a Mobile Robot,” IEEE Transactions on System, Man, and Cybernetics: Systems, vol. 43, no. 5, pp. 1141-1153, Sep. 2013.
    [35] N. Manouselis, H. Drachsler, R. Vuorikari, H. Hummel, and R. Koper, Recommender Systems in Technology Enhanced Learning. New York, USA: Springer, 2011.
    [36] R. Vuorikari, N. Manouselis, and E. Duval, “Special Issue on Social Information Retrieval for Technology Enhanced Learning,” Journal of Digital Information, vol. 10, no. 2, Jan. 2007.
    [37] M. J. Pazzani and D. Billsus, “Content-Based Recommendation Systems,” The Adaptive Web, P. Brusilovsky, A. Kobsa, and W. Nejdl, eds., vol. 4321, pp. 377-408, Springer, 2007.
    [38] J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl, “Evaluating Collaborative Filtering Recommender Systems,” ACM Transactions Information Systems, vol. 22, no. 1, pp. 241-250, Jan. 2000.
    [39] R. Burke, “Knowledge-Based Recommender Systems,” Encyclopedia of Library and Information Science, vol. 69, no. 32, pp. 180-200, 2000.
    [40] R. Burke, “Hybrid Web Recommender Systems,” The Adaptive Web, P. Brusilovsky, A. Kobsa, and W. Nejdl, eds., vol. 4321, pp. 377-408, New York, USA: Springer, 2007.
    [41] H. Drachsler, H. Hummel, and R. Koper, “Personal Recommender Systems for Learners in Lifelong Learning Networks: The Requirements, Techniques and Model,” International Journal of Learning Technology, vol. 3, no. 4, pp. 404-423, Jul. 2008.
    [42] K. Verbert, N. Manouselis, X. Ochoa, M. Wolpers, H. Drachsler, I. Bosnic, and E. Duval, “Context-Aware Recommender Systems for Learnng: A Survey and Future Challenges,” IEEE Transactions on Learning Technologies, vol. 5, no. 4, pp. 318-335, Oct.-Dec. 2012.
    [43] S. Epstein and B. W. Epstein, The First Book of Teaching Machines. New York, USA: Franklin Watts, 1961.
    [44] S. H. D. Fiedler and T. Väljataga, “Personal Learning Environments: Concept or Technology?,” International Journal of Virtual and Personal Learning Environments, vol. 2, no. 4, pp. 1-11, Oct. 2011.
    [45] D. Sleeman and J. S. Brown, Intelligent Tutoring Systems. Cambridge, U.K.: Academic, 1982.
    [46] M. C. Polson and J. J. Richardson, Foundations of Intelligent Tutoring Systems. East Sussex, U.K.: Psychology Press, 2013.
    [47] M. Yudelson, P. Brusilovsky, and V. Zadorozhny, “A User Modeling Server for Contemporary Adaptive Hypermedia: An Evaluation of the Push Approach to Evidence Propagation,” in Proceedings of the 11th International Conference on User Modeling, Corfu, Greece, Jul. 2007, pp. 27-36.
    [48] V. Zadorozhny, M. Yudelson, and P. Brusilovsky, “A Framework for Performance Evaluation of User Modeling Servers for Web Applications,” Web Intelligence and Agent Systems: An International Journal, vol. 6, no. 2, pp. 175-191, 2008.
    [49] H. Drachsler, Navigation Support for Learners in Informal Learning Networks. Netherlands, Heerlen: Open Universiteit Nerderland, 2009.
    [50] L.H. Wong and C. K. Looi, “Adaptable Learning Pathway Generation with Ant Colony Optimization,” Journal of Educational Technology & Society, vol. 12, no. 3, pp. 309-326, Jul. 2009.
    [51] E. Karataev and V. Zadorozhny, “Adaptive Social Learning Based on Crowdsourcing,” IEEE Transactions on Learning Technologies, vol. 10, no. 2, pp. 128-139, Jan. 2016.
    [52] M. Erdt, A. Fernández, and C. Rensing, “Evaluating Recommender Systems for Technology Enhanced Learning: A Quantitative Survey,” IEEE Transactions on Learning Technologies, vol. 8, no. 4, pp. 326-344, Jun. 2015.
    [53] G. Adomavicius, B. Mobasher, F. Ricci, and A. Tuzhilin, “Context-Aware Recommender Systems,” AI Magazine, vol. 32, no. 3, pp. 67-80, 2011.
    [54] J. Vassileva, “Towards Adaotation in E-Learning 2.0,” IEEE Transactions Learning Technologies, vol. 1, no. 4, pp. 199-214, Oct.-Dec. 2008.
    [55] R. Beale and P. Lonsdale, “Mobile Context Aware Systems: The Intelligence to Support Tasks and Effectively Utilise Resouces,” Mobile Human-Computer Interaction, S. Brewster and M. Dunlop, eds., vol. 3160, pp. 573-576, New York, USA: Springer, 2004.
    [56] T. Winograd, “Architectures for Context,” Human-Computer Interaction, vol. 19, no. 2, pp. 401-419, Dec. 2001.
    [57] A. Schmidt, M. Beigl, and H. W. Gallersen “There Is More to Context than Location,” Computers and Graphics, vol. 23, no. 6, pp. 893-901, Dec. 1999.
    [58] R. O. Duda, P. E. Hart, and D. G. Storl, Pattern Classification 2nd Edition, New York, USA: Wiley, 2001.
    [59] E. Alpaydin, Introduction to Machine Learning. Cambridge, U.K.: The MIT Press, 2010.
    [60] W. B. Powell, Approxmate Dynamic Programming: Solving the Curses of Dimensionality. New York, USA: Wiley, 2011.
    [61] K. S. Huang, S. W. Tan, and C. C. Chen, “Cooperative Strategy Based on Adaptive Q-Learning for Robot Soccer Systens,” IEEE Transactions on Fuzzy Systems, vol. 12, no. 4, pp. 569-576, Aug. 2004.
    [62] J. Abdi, B. Moshiri, B. Abdulhai, and A. K. Sedigh, “Forecasting of Short-Term Traffic-Flow Based on Improved Neurofuzzy Models via Emotional Temporal Difference Learning Algorithm,” Engineering Aplications of Artificial Intelligence, vol. 25, no. 5, pp. 1022-1042, Aug. 2012.
    [63] E. J. Taft and Z. Nashed, Dynamic Programming: Foundations and Principles. U.K.: CRC Press, 2010.
    [64] D. P. Kroese, T. Brereton, T. Taimre, and Z. I. Botev, “Why the Monte Carlo Method Is So Important Today,” WIREs Computational Statistics, vol. 6, no. 6, pp. 386-392, Nov. 2014.
    [65] S. A. C. McDowell, “A Simple Derivation of the Boltzmann Distribution,” Journal of Chemical Education, vol. 76, no. 10, pp. 1393-1394, Oct. 1999.
    [66] M. A. Wiering and H. V. Hasselt, “Ensemble Algorithms in Reinforcement Learning,” IEEE Transactions on Systems, Man, and Cybernetics Part B: Cybernetics, vol. 38, no. 4, pp. 930-936, Aug. 2008.
    [67] D. Buckley and L. Wilson, The Personalisation by Pieces Framework: A Framework for the Incremental Transformation of Pedagogy Towards Greater Learner Empowerment in Schools. Cambridge, U.K.: CEA Publishing, 2006.
    [68] P. Brusilovsky, “Adaptive and Intelligent Technologies for Web-Based Education,” International Journal of Artificial Intelligence in Education, vol. 13, no. 2-4, pp. 159-172, Apr. 2003.
    [69] R. S. Baker and P. S. Inventado, Learning Analytics. New York, USA: Springer, 2014.
    [70] Y. Akbulut and C. S. Cardak, “Adaptive Educational Hypermedia Accommodating Learning Styles: A Content Analysis of Publications from 2000 to 2011,” Computers & Education, vol. 58, no. 2, pp. 835-842, Feb. 2012.
    [71] P. D. Bra, G. J. Houben, and H. Wu, “AHAM: A Dexter-Based Reference Model for Adaptive Hypermedia,” in Proceedings of the 10th ACM Conference on Hypertext and Hypermedia, Darmstadt, Germany, Feb. 1999, pp. 147-156.
    [72] M. Cannataro and A. Pugliese, Hypermedia: Openness, Structural Awareness, Adaptivity. Berlin, Germany: Springer, 2002.
    [73] E. Knutov, “GAF: Genetic Adaptation Framework,” in Proceedings of the 5th International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems, Hannover, Germany, Jul. 2008, pp. 400-404.
    [74] A. I. Cristea and F. Ghali, “Towards Adaptation in E-Learning 2.0,” The New Review of Hypermedia and Multimedia, vol. 17, no. 2, pp. 199-238, Apr. 2011.
    [75] M. Šimko, M. Barla, and M. Bieliková, “ALEF: A Framework for Adaptive Web-Based Learning 2.0,” in Proceedings of the IFIP International Conference on Key Competencies in the Knowledge Society, Brisbane, Australia, Sep. 2010, pp. 367-378.
    [76] J. B. Schafer, J. A. Konstan, and J. Riedl, “E-Commerce Recommendation Applications,” Data Mining and Knowledge Discovery, vol. 5, no. 1-2, pp. 115-153, Jan. 2001.
    [77] F. O. Isinkaye, Y. O. Folajimi, and B. A. Ojokoh, “Recommendation Systems: Principles, Methods and Evaluation,” Egyptian Informatics Journal, vol. 16, no. 3, pp. 261-273, Nov. 2015.
    [78] E. C. Wenger and W. M. Snyder, “Communities of Practice: The Organizational Frontier,” Harvard Business Review, vol. 78, no. 1, pp. 139-146, Jan. 2000.
    [79] N. Luhman, Trust and Power: Two Works. Chichester, U.K.: Wiley, 1979.
    [80] X. L. Zheng, C. C. Chen, J. L. Hung, W. He, F. X. Hong, and Z. Lin, “A Hybird Trust-Based Recommender System for Online Communities of Practice,” IEEE Transactions on Learning Technologies, vol. 8, no. 4, pp. 345-356, Oct.-Dec. 2015.
    [81] B. N. Araabi, S. Mastoureshgh, and M. N. Ahmadabadi, “A Study on Expertise of Agents and Its Effects on Cooperative Q-Learning,” IEEE Transactions on Systems, Man, and Cybernetics Part B: Cybernetics, vol. 37, no. 2, pp. 398-409, Apr. 2007.
    [82] N. Guenard, T. Hamel, and R. Mahony, “A Practical Visual Servo Control for an Unmanned Aerial Vhicle,” IEEE Transactions on Robotics, vol. 24, no. 2, pp. 331-340, Apr. 2008.
    [83] 賴彥均,應用Q-learning於搭載機械手臂自走車系統之基於影像視覺伺服研究,碩士論文,國立成功大學,電機工程研究所,台灣,2017。
    [84] H. Shi, X. Li, K. S. Hwang, and G. Xu, “Decoupled Visual Servoing with Fuzzy Q-Learning,” IEEE Transactions on Industrial Informatics, vol. 14, no. 1, pp. 241-252, Jan. 2018.
    [85] T. M. Marin and T. Duckettin, “Fast Reinforcement Learning for Vision-Guided Mobile Robots,” in Proceedings of the 2005 IEEE International Conference on Robotics and Automation, Barcelona, Spain, Apr. 2005, pp. 4170-4775.
    [86] 工研院,IMP-2硬體使用手冊,2010。

    下載圖示 校內:2023-06-08公開
    校外:2023-06-08公開
    QR CODE