研究生: |
蔡岩霖 Cai, Yan-Lin |
---|---|
論文名稱: |
基於強化學習之空戰模擬 Air Combat Simulation Based on Reinforcement Learning |
指導教授: |
楊憲東
Yang, Ciann-Dong |
學位類別: |
碩士 Master |
系所名稱: |
工學院 - 航空太空工程學系 Department of Aeronautics & Astronautics |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 中文 |
論文頁數: | 122 |
中文關鍵詞: | 空戰 、強化學習 、PPG 、APIC |
外文關鍵詞: | Air Combat, Reinforcement Learning, Pure Pursuit Guidance Law(PPG), Approximate Posture Increment Control(APIC) |
相關次數: | 點閱:118 下載:26 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著機器學習的發展,在空戰領域上也逐漸被應用,其優勢是可以比人類有更快的反應時間,並且不受心理因素及人體物理限制,做出更高機動的動作;機器學習共被分成三類,本論文以其中的強化學習作為主要方法,並且分成固定目標以及移動目標分別進行訓練;除此之外,本論文實現兩種導引律的設計,分別為Pure Pursuit Guidance law(PPG)與Approximate Posture Increment Control (APIC),並且進行導引律的比較。在固定目標,本論文利用強化學習中的Soft Actor Critic (SAC)演算法證明此方法比傳統導引律的PPG表現來的好。在6.1節的移動目標中,本論文採用SAC演算法與優先經驗回放結合APIC,改善導引律在命中移動目標上的表現,證明強化學習結合APIC表現更好;在6.2節利用Soft Actor Critic (SAC) 演算法直接操控飛行器的角速度命令,達到擊落移動目標的目的。
With the development of machine learning, it has gradually been applied in the field of air combat. Its advantage is that it can have faster reaction times than humans and can perform higher maneuverability actions without being affected by psychological factors and physical limitations of the human body. Machine learning is generally divided into three categories. This paper uses reinforcement learning as the primary method, with training conducted for both fixed targets and moving targets. Additionally, this paper implements the design of two guidance laws: Pure Pursuit Guidance (PPG) and Approximate Posture Increment Control (APIC), and compares their performances.
For fixed targets, this paper uses the Soft Actor Critic (SAC) algorithm in reinforcement learning to demonstrate that this method performs better than the traditional PPG guidance law. In Section 6.1, for moving targets, this paper combines the SAC algorithm with prioritized experience replay and APIC to improve the performance of the guidance law in hitting moving targets, demonstrating that reinforcement learning combined with APIC performs better. In Section 6.2, the SAC algorithm is used to directly control the angular velocity command of the aircraft to achieve the goal of shooting down moving targets.
[1] 李自虎, "現代戰爭中空中兵力之運用與影響," 1994.
[2] J. E. Kaufmann, 希特勒的閃擊戰. 麥田出版, 1996.
[3] P. Liu and Y. Ma, "A deep reinforcement learning based intelligent decision method for UCAV air combat," in Modeling, Design and Simulation of Systems: 17th Asia Simulation Conference, AsiaSim 2017, Melaka, Malaysia, August 27–29, 2017, Proceedings, Part I 17, 2017: Springer, pp. 274-286.
[4] A. De Marco, P. M. D’Onza, and S. Manfredi, "A deep reinforcement learning control approach for high-performance aircraft," Nonlinear Dynamics, vol. 111, no. 18, pp. 17037-17077, 2023.
[5] W.-r. Kong, D.-y. Zhou, Y. Zhou, and Y.-y. Zhao, "Hierarchical reinforcement learning from competitive self-play for dual-aircraft formation air combat," Journal of Computational Design and Engineering, vol. 10, no. 2, pp. 830-859, 2023.
[6] Y. Dong, J. Ai, and J. Liu, "Guidance and control for own aircraft in the autonomous air combat: A historical review and future prospects," Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering, vol. 233, no. 16, pp. 5943-5991, 2019.
[7] N. Ernest, K. Cohen, C. Schumacher, and D. Casbeer, "Learning of intelligent controllers for autonomous unmanned combat aerial vehicles by genetic cascading fuzzy methods," SAE Technical Paper, 0148-7191, 2014.
[8] N. Ernest, K. Cohen, E. Kivelevitch, C. Schumacher, and D. Casbeer, "Genetic fuzzy trees and their application towards autonomous training and control of a squadron of unmanned combat aerial vehicles," Unmanned Systems, vol. 3, no. 03, pp. 185-204, 2015.
[9] N. Ernest, D. Carroll, C. Schumacher, M. Clark, K. Cohen, and G. Lee, "Genetic fuzzy based artificial intelligence for unmanned combat aerial vehicle control in simulated air combat missions," Journal of Defense Management, vol. 6, no. 1, pp. 2167-0374, 2016.
[10] N. Ernest, D. Carroll, N. Bogart, and K. Cohen, "Perspectives on Genetic Fuzzy Based Artificial Intelligence for Cooperative Control of Unmanned Fighter Aircraft."
[11] S. D. Holcomb, W. K. Porter, S. V. Ault, G. Mao, and J. Wang, "Overview on deepmind and its alphago zero ai," in Proceedings of the 2018 international conference on big data and education, 2018, pp. 67-71.
[12] A. P. Pope et al., "Hierarchical Reinforcement Learning for Air Combat At DARPA's AlphaDogfight Trials," IEEE Transactions on Artificial Intelligence, 2022.
[13] C. R. DeMay, E. L. White, W. D. Dunham, and J. A. Pino, "Alphadogfight trials: Bringing autonomy to air combat," Johns Hopkins APL Technical Digest, vol. 36, no. 2, pp. 154-163, 2022.
[14] J. H. Bae, H. Jung, S. Kim, S. Kim, and Y.-D. Kim, "Deep reinforcement learning-based air-to-air combat maneuver generation in a realistic environment," IEEE Access, vol. 11, pp. 26427-26440, 2023.
[15] H. Jung and Y.-D. Kim, "Improved 1vs1 Air Combat Model With Self-Play Soft Actor-Critic and Sparse Rewards," in 2023 23rd International Conference on Control, Automation and Systems(ICCAS), 2023: IEEE, pp. 1962-1968.
[16] 梁嘉肇, "談空軍飛行軍官生涯規劃," 國防雜誌, vol. 20, no. 8, pp. 26-39, 2005.
[17] 陸逸中, 莊易倫, 陳永立, 鄭馥宗, and 賴重宇, "飛行員體適能與抗G動作效益的評估," (in 繁體中文), 中華民國航空醫學暨科學期刊, vol. 34, no. 1&2, pp. 51-58, 2020, doi: 10.7011/jamsroc.202012_34(1_2).0005.
[18] 蔡玉敏 and 吳柏翰, "戰機飛行員抗G力之體能訓練策略," (in 繁體中文), 中華體育季刊, vol. 33, no. 4, pp. 231-241, 2019, doi: 10.6223/qcpe.201912_33(4).0003.
[19] Q. Yang, J. Zhang, G. Shi, J. Hu, and Y. Wu, "Maneuver decision of UAV in short-range air combat based on deep reinforcement learning," IEEE Access, vol. 8, pp. 363-378, 2019.
[20] T.-y. Sun, S.-j. Tsai, Y.-n. Lee, S.-m. Yang, and S.-h. Ting, "The study on intelligent advanced fighter air combat decision support system," in 2006 IEEE International Conference on Information Reuse & Integration, 2006: IEEE, pp. 39-44.
[21] J. G. Dean, "Autonomous Guidance of Unmanned Combat Air Vehicles in Basic Fighter Maneuvering," 2022.
[22] A. E. a. T. Command(AETC), "Employment Fundamental T-38C/Introduction to Fighter Fundamentals (IFF)," AETC TTP11-1, 2020.
[23] E. F. T-38C, "Introduction to Fighter Fundamentals (IFF) Air Education and Training Command(AETC), AETC TTP11-1," 2020.
[24] L. M. corporation, "F-16C/D blocks 50 and 52+ flight manual supplemental, T.O. GR1F-16CJ-1-1," 2003.
[25] 蔣豐隆, "混合型飛彈導引律及強健導引常數設計於空戰模擬之應用," 2014.
[26] J. Hu, L. Wang, T. Hu, C. Guo, and Y. Wang, "Autonomous maneuver decision making of dual-UAV cooperative air combat based on deep reinforcement learning," Electronics, vol. 11, no. 3, p. 467, 2022.
[27] R. L. Shaw, "Fighter combat," Tactics and Maneuvering; Naval Institute Press: Annapolis, MD, USA, 1985.
[28] J. Berndt, "JSBSim: An open source flight dynamics model in C++," in AIAA Modeling and Simulation Technologies Conference and Exhibit, 2004, p. 4923.
[29] J. S. Berndt, "JSBSim," ed, 2011.
[30] L. T. Nguyen, Simulator study of stall/post-stall characteristics of a fighter airplane with relaxed longitudinal static stability. National Aeronautics and Space Administration, 1979.
[31] B. L. Stevens and F. Frank, "l., Lewis, F., L.: Aircraft Control and Simulation," ed: John Wiley and Sons, Inc., NY, 1992.
[32] L. Sonneveldt, "Nonlinear F-16 model description," Delft University of Technology, Netherlands, 2006.
[33] F. Garcia and E. Rachelson, "Markov decision processes," Markov Decision Processes in Artificial Intelligence, pp. 1-38, 2013.
[34] V. Konda and J. Tsitsiklis, "Actor-critic algorithms," Advances in neural information processingsystems, vol. 12, 1999.
[35] T. Haarnoja et al., "Soft actor-critic algorithms and applications," arXiv preprint arXiv:1812.05905, 2018.
[36] T. Schaul, J. Quan, I. Antonoglou, and D. Silver, "Prioritized experience replay," arXiv preprint arXiv:1511.05952, 2015.
[37] L. Kabela, "Experience Replay Methods in Soft Actor-Critic."
[38] C. Wang and K. Ross, "Boosting soft actor-critic: Emphasizing recent experience without forgetting the past," arXiv preprint arXiv:1906.04009, 2019.
[39] D.-I. You and D. H. Shim, "Design of an aerial combat guidance law using virtual pursuit point concept," Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering, vol. 229, no. 5, pp. 792-813, 2015.
[40] Y. Baba, H. Takano, S. Miyamoto, and K. Ono, "Air combat guidance law for an UCAV," in 1st UAV Conference, 2002, p. 3427.
[41] H. Shin, J. Lee, D. H. Shim, and D.-I. You, "Design of a virtual fighter pilot and simulation environment for unmanned combat aerial vehicles," in AIAA Guidance, Navigation, and Control Conference, 2017, p. 1027.
[42] T. Yamasaki, H. Takano, and Y. Baba, "Robust path-following for UAV using pure pursuit guidance," in Aerial Vehicles: IntechOpen, 2009.
[43] C.-C. Peng, "Approximate Posture Increment Control for the Targeting Task of Differential Drive Robots," in 2023 IEEE 6th International Conference on Knowledge Innovation and Invention (ICKII), 2023: IEEE, pp. 755-760.
[44] D.-I. You and H. Shim, "Design of an autonomous air combat guidance law using a virtual pursuit point for UCAV," Journal of the Korean Society for Aeronautical & Space Sciences, vol. 42, no. 3, pp. 199-212, 2014.
[45] K. Yang, S. Kim, Y. Lee, C. Jang, and Y.-D. Kim, "Manual-Based Automated Maneuvering Decisions for Air-to-Air Combat," Journal of Aerospace Information Systems, vol. 21, no. 1, pp. 28-36, 2024.
[46] W. Zu, H. Yang, R. Liu, and Y. Ji, "A multi-dimensional goal aircraft guidance approach based on reinforcement learning with a reward shaping algorithm," Sensors, vol. 21, no. 16, p. 5643, 2021.
[47] A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann, "Stable-baselines3: Reliable reinforcement learning implementations," Journal of Machine Learning Research, vol. 22, no. 268, pp. 1-8, 2021.
[48] P. Bonanni, The ART OF THE KILL. 1991.
[49] K. Weiren, Z. Deyun, K. Zhang, and Y. Zhen, "Air combat autonomous maneuver decision for one-on-one within visual range engagement base on robust multi-agent reinforcement learning," in 2020 IEEE 16th International Conference on Control & Automation (ICCA), 2020: IEEE, pp. 506-512.