簡易檢索 / 詳目顯示

研究生: 張頌宇
Chang, Sung-Yu
論文名稱: 將可解釋人工智慧應用於使用 OpenAI Gym 和 Unity 3D 的多模型強化學習
Application of Explainable AI to A Multi-Model RL Using OpenAI Gym and Unity 3D
指導教授: 蘇文鈺
Su, Wen-Yu
學位類別: 碩士
Master
系所名稱: 敏求智慧運算學院 - 智慧科技系統碩士學位學程
MS Degree Program on Intelligent Technology Systems
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 66
中文關鍵詞: 強化學習可解釋AI多模型協同學習
外文關鍵詞: Reinforcement Learning, Explainable AI, Collaborative Multi-Model Learning
相關次數: 點閱:139下載:26
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文提出了一個方便將 Unity 3D 環境與 OpenAI Gym 方便地結合,名為 Gymize 的框架,實現了在複雜數位孿生 3D 環境中的強化學習。我們用賽車遊戲來作為案例研究,針對不同觀察空間設置進行了多種模型訓練,並通過視覺化工具如 saliency maps 和 GradCAM++ 解釋模型的決策過程。最後,我們為了解決多模型協同學習的問題,提出了一個基於政治的框架以及基於能量的選舉機制,有效模擬了公民在社會中的決策過程。本論文的研究結果展現 Gymize 框架可以更方便的結合強化學習的環境,使用可視化工具提高解釋性方面的潛力,並對多模型協同學習問題提供了一個公平穩定的解決方案。

    This thesis presents Gymize, a framework that seamlessly integrates the Unity 3D environment with OpenAI Gym, enabling reinforcement learning in complex 3D digital twin environments. We use a kart racing game as a case study, conducting multiple model training with different observation space configurations and interpreted the decision-making processes of the models using visualization tools such as saliency maps and GradCAM++. Finally, to address the challenges of collaborative multi-model learning, we proposed a politics framework and an energy-based election mechanism that effectively simulate the decision-making processes of citizens in society. The results of this study demonstrate that the Gymize framework facilitates the integration of reinforcement learning environments, enhances interpretability through visualization tools, and provides a fair and stable solution for collaborative multi-model learning.

    中文摘要 i Abstract ii Contents iii List of Tables v List of Figures vi 1 Introduction 1 2 Related Works 5 2.1 OpenAI Gym 5 2.2 Stable Baselines3 (SB3) 6 2.3 SIMA 7 2.4 Explainable Artificial Intelligence (XAI) 8 2.4.1 Saliency Map 8 2.4.2 Class Activation Mapping(CAM) 9 2.5 Ensemble Reinforcement Learning 11 2.6 Energy-based Model 11 3 This Work 13 3.1 Gymize 13 3.1.1 Overall Architecture of Gymize 14 3.1.2 Gymize Peer Connection 16 3.1.3 Python Side Architecture 18 3.1.4 Unity Side Architecture 21 3.1.5 Internal Details 23 3.2 Explainable Visualization Tools 24 3.2.1 Streaming Mechanism Based on WebSocket and MQTT 24 3.2.2 Explainable Artificial Intelligence (XAI) Support 25 3.2.3 Front-End Dashboard Visualization Tools 28 3.3 Collaborative Multi-Model Learning 30 3.3.1 Politics Framework 31 3.3.2 Energy-Based Election Mechanism 33 4 Result 36 4.1 Kart Racing Game 36 4.2 Collaborative Multi-Model Learning 40 4.3 Visualization Explanation 42 5 Conclusion and Future Works 45 5.1 Conclusion 45 5.2 Future Works 46 References 48

    [1] Tom M Mitchell. Machine Learning. McGraw-Hill, Inc., 1997.
    [2] Stefan Feuerriegel, Jochen Hartmann, Christian Janiesch, and Patrick Zschech. Generative ai. Business & Information Systems Engineering, 66(1):111—126, 2024.
    [3] Richard S Sutton and Andrew G Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
    [4] Yann LeCun, L«eon Bottou, Yoshua Bengio, and Patrick Ha↵ner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278—2324, 1998.
    [5] Alex Krizhevsky, Ilya Sutskever, and Geo↵rey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097—1105, 2012.
    [6] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
    [7] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. stat, 1050:1, 2014.
    [8] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.
    [9] Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877—1901, 2020.
    [10] Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4401—4410, 2019.
    [11] Alec Radford, Je↵rey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. OpenAI Blog, 1(8):9, 2019.
    [12] Ahmad EL Sallab, Mohammed Abdou, Etienne Perot, and Senthil Yogamani. Deep reinforcement learning framework for autonomous driving. Electronic Imaging, 29(19):70—76, 2017.
    [13] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. nature, 518(7540):529—533, 2015.
    [14] David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484—489, 2016.
    [15] Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Micha‹el Mathieu, Andrew Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782):350—354, 2019.
    [16] David Gunning, Mark Stefik, Jaesik Choi, Timothy Miller, Simone Stumpf, and Guang-Zhong Yang. Xai—explainable artificial intelligence. Science robotics, 4(37):eaay7120, 2019.
    [17] Fei Tao, He Zhang, Ang Liu, and Andrew YC Nee. Digital twin in industry: State-of-the-art. IEEE Transactions on industrial informatics, 15(4):2405—2415, 2018.
    [18] Xuemin Hu, Shen Li, Tingyu Huang, Bo Tang, Rouxing Huai, and Long Chen. How simulation helps autonomous driving: A survey of sim2real, digital twins, and parallel intelligence. IEEE Transactions on Intelligent Vehicles, 9(1), 2024.
    [19] Muhammad Usman Shoukat, Lirong Yan, Bin Zou, Jiawen Zhang, Ashfaq Niaz, and Muhammad Umair Raza. Application of digital twin technology in the field of autonomous driving test. In 2022 Third International Conference on Latest trends in Electrical Engineering and Computing Technologies (INTELLECT), pages 1—6. IEEE, 2022.
    [20] Qian Qi, Fei Tao, Yiheng Zuo, and Dan Zhao. Digital twin and big data towards smart manufacturing and industry 4.0: 360 degree comparison. IEEE Access, 6:3585—3593, 2018.
    [21] Marianna Charitonidou. Urban scale digital twins in data-driven society: Challenging digital universalism in urban planning decision-making. International Journal of Architectural Computing, 20(2):238—238, 2022.
    [22] Mario Lamagna, Daniele Groppi, Meysam M Nezhad, and Giuseppe Piras. A comprehensive review on digital twins for smart energy management system. International Journal of Energy Production and Management. 2021. Vol. 6. Iss. 4, 6(4):323—334, 2021.
    [23] Manuel S M‹uller, Nasser Jazdi, and Michael Weyrich. Self-improving models for the intelligent digital twin: Towards closing the reality-to-simulation gap. Ifac-Papersonline, 55(2):126—131, 2022.
    [24] Unity Technologies. Unity. http://unity3d.com, 2024. Accessed: 2024-07-10.
    [25] Maria Abi Raad, Arun Ahuja, Catarina Barros, Frederic Besse, Andrew Bolt, Adrian Bolton, Bethanie Brownfield, Gavin Buttimore, Max Cant, Sarah Chakera, et al. Scaling instructable agents across many simulated worlds. arXiv preprint arXiv:2404.10179, 2024.
    [26] Yanjie Song, Ponnuthurai Nagaratnam Suganthan, Witold Pedrycz, Junwei Ou, Yongming He, Yingwu Chen, and Yutong Wu. Ensemble reinforcement learning: A survey. Applied Soft Computing, page 110975, 2023.
    [27] Susan T Fiske and Shelley E Taylor. Social cognition. Mcgraw-Hill Book Company, 1991.
    [28] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym, 2016.
    [29] Peter Stone and Manuela Veloso. Multiagent systems: A survey from a machine learning perspective. Autonomous Robots, 8(3):345—383, 2000.
    [30] Antonin Ran, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1—8, 2021.
    [31] K Simonyan, A Vedaldi, and A Zisserman. Deep inside convolutional networks: visualising image classification models and saliency maps. In Proceedings of the International Conference on Learning Representations (ICLR). ICLR, 2014.
    [32] Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2921—2929, 2016.
    [33] Yann LeCun, Sumit Chopra, Raia Hadsell, Marc’Aurelio Ranzato, and Fu-Jie Huang. A tutorial on energy-based learning. In Predicting structured data, volume 1, pages 2—1. MIT Press, 2006.
    [34] Farama Foundation. Gymnasium api - spaces. https://gymnasium.farama.org/api/spaces/, 2023. Accessed: 2024-07-10.
    [35] Farama Foundation. Gymnasium api - environments. https://gymnasium.farama.org/api/env/, 2023. Accessed: 2024-07-10.
    [36] Farama Foundation. Gymnasium api - wrappers. https://gymnasium.farama.org/api/wrappers/, 2023. Accessed: 2024-07-10.
    [37] Farama Foundation. Gymnasium api - vector environments. https://gymnasium.farama.org/api/vector/, 2023. Accessed: 2024-07-10.
    [38] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
    [39] Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928—1937. PMLR, 2016.
    [40] Farama Foundation. Farama gymnasium documentation, 2024. Accessed: 2024-07-10.
    [41] Jordan Terry, Benjamin Black, Nathaniel Grammel, Mario Jayakumar, Ananth Hari, Ryan Sullivan, Luis S Santos, Clemens Die↵endahl, Caroline Horsch, Rodrigo Perez-Vicente, et al. Pettingzoo: Gym for multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 34:15032—15043, 2021.
    [42] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
    [43] Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
    [44] Antonin Ran, Ashley Hill, Maximilian Ernestus, Adam Gleave, Anssi Kanervisto, and Noah Dormann. Stable-baselines3: Vectorized environments. https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html, 2021. Accessed: 2024-07-10.
    [45] Finale Doshi-Velez and Been Kim. Towards a rigorous science of interpretable machine learning. stat, 1050:2, 2017.
    [46] Leilani H Gilpin, David Bau, Ben Z Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. Explaining explanations: An overview of interpretability of machine learning. In 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA), pages 80—89. IEEE, 2018.
    [47] Leila Arras, Gr«egoire Montavon, Klaus-Robert M‹uller, and Wojciech Samek. Explaining recurrent neural network predictions in sentiment analysis. In 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis WASSA 2017: Proceedings of the Workshop, pages 159—168. The Association for Computational Linguistics, 2017.
    [48] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618—626, 2017.
    [49] Aditya Chattopadhay, Anirban Sarkar, Prantik Howlader, and Vineeth N Balasubramanian. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE winter conference on applications of computer vision (WACV), pages 839—847. IEEE, 2018.
    [50] MA Wiering. Explorations in E cient Reinforcement Learning. PhD thesis, Citeseer, 1999.
    [51] Nicol‘o Cesa-Bianchi, Claudio Gentile, G«abor Lugosi, and Gergely Neu. Boltzmann exploration done right. Advances in neural information processing systems, 30, 2017.
    [52] Abhishek Nandy, Manisha Biswas, Abhishek Nandy, and Manisha Biswas. Unity ml-agents. Neural Networks in Unity: C# Programming for Windows 10, pages 27—67, 2018.
    [53] Ian Fette and Alexey Melnikov. The websocket protocol. Technical report, Internet Engineering Task Force (IETF), 2011.
    [54] FFmpeg Team. FFmpeg. FFmpeg Project, 2023. https://ffmpeg.org/.
    [55] Google. Protocol Bu↵ers: Google’s Data Interchange Format, 2023. https://developers.google.com/protocol-buffers.
    [56] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley Professional, 1994.
    [57] Joseph Albahari Nigro and Ben Albahari. C# 5.0 in a Nutshell: The Definitive Reference. O’Reilly Media, 2012.
    [58] OASIS Standard. Mqtt version 5.0. OASIS Standard, 2019.
    [59] Debargha Mukherjee, Jim Bankoski, Adrian Grange, Jingning Han, John Koleszar, Paul Wilkins, Yaowu Xu, and Ronald Bultje. The latest open-source video codec vp9-an overview and preliminary results. In 2013 Picture Coding Symposium (PCS), pages 390—393. IEEE, 2013.
    [60] Remi Barbieri, Mike Gorbunov, and Dominic Hellyer. Pyav - a pythonic binding for ffmpeg. https://pyav.org/. Accessed: 2024-07-10.
    [61] Google Developers. Webcodecs api. https://developer.mozilla.org/en-US/docs/Web/API/WebCodecs_API. Accessed: 2024-07-10.
    [62] Ho-Taek Joo and Kyung-Joong Kim. Visualization of deep reinforcement learning using grad-cam: how ai plays atari games? In 2019 IEEE conference on games (CoG), pages 1—2. IEEE, 2019.
    [63] Michael Wooldridge and Nicholas R Jennings. Intelligent agents: Theory and practice. The knowledge engineering review, 10(2):115—152, 1995.
    [64] Jing J Liang, A Kai Qin, Ponnuthurai N Suganthan, and S Baskar. Comprehensive learning particle swarm optimizer for global optimization of multimodal functions. IEEE transactions on evolutionary computation, 10(3):281—295, 2006.
    [65] Raphael Koster, Jan Balaguer, Andrea Tacchetti, Ari Weinstein, Tina Zhu, Oliver Hauser, Duncan Williams, Lucy Campbell-Gillingham, Phoebe Thacker, Matthew Botvinick, et al. Human-centred mechanism design with democratic ai. Nature Human Behaviour, 6(10):1398—1407, 2022.
    [66] Ortwin Renn, Thomas Webler, Horst Rakel, Peter Dienel, and Branden Johnson. Public participation in decision making: a three-step procedure. Policy sciences, 26:189—214, 1993.
    [67] Jonathan Ho and Stefano Ermon. Generative adversarial imitation learning. Advances in neural information processing systems, 29, 2016.
    [68] Yue Xu, Zengde Deng, Mengdi Wang, Wenjun Xu, Anthony Man-Cho So, and Shuguang Cui. Voting-based multiagent reinforcement learning for intelligent iot. IEEE Internet of Things Journal, 8(4):2681—2693, 2020.
    [69] Leonardo Lucio Custode and Giovanni Iacca. Social interpretable reinforcement learning. arXiv preprint arXiv:2401.15480, 2024.
    [70] Ludwig Boltzmann. Vorlesungen ‹uber Gastheorie. Johann Ambrosius Barth, 1896.
    [71] Duncan Black. The Theory of Committees and Elections. Cambridge University Press, 1958.
    [72] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135—1144, 2016.
    [73] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017.
    [74] Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. Feudal networks for hierarchical reinforcement learning. In International conference on machine learning, pages 3540—3549. PMLR, 2017.
    [75] Bairu Hou, Jinghan Jia, Yihua Zhang, Guanhua Zhang, Yang Zhang, Sijia Liu, and Shiyu Chang. Textgrad: Advancing robustness evaluation in nlp by gradient-driven optimization. arXiv preprint arXiv:2212.09254, 2022.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE