簡易檢索 / 詳目顯示

研究生: 唐其
Tang, Chi
論文名稱: 深度強化學習於高動態飛行器之自動降落
Automatic Landing of High Maneuverability Aircraft using Deep Reinforcement Learning
指導教授: 賴盈誌
Lai, Ying-Chi
學位類別: 碩士
Master
系所名稱: 工學院 - 民航研究所
Institute of Civil Aviation
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 76
中文關鍵詞: 深度強化學習深度確定性策略梯度法自動降落高動態飛行器下滑道追尋
外文關鍵詞: Deep Reinforcement Learning, Deep Deterministic Policy Gradient, Automatic Landing, High Maneuverability Aircraft, Glide Slope Following
相關次數: 點閱:152下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 固定翼飛行器的飛行過程中,降落是最為關鍵的階段。然而,現有的自動降落系統在飛行器之控制上仍然有許多需要被解決的問題。本研究將深度強化學習演算法中的深度確定性策略梯度方法 (Deep Deterministic Policy Gradient, DDPG) 應用於高動態飛行器之自動降落,並於模擬環境中驗證其降落能力。本研究以降落過程中之設計須求來定義強化學習中之獎勵 (Reward),訓練DDPG代理人 (Agent) 學習飛行器降落的控制方法、也稱為策略 (Policy)。本研究首先以動態相對較低的民航客機驗證DDPG 的降落控制能力並與相關文獻進行比較,再將其應用於F-16高動態飛行器之自動降落。除此之外,本研究亦以不同的獎勵設計方法訓練代理人,並考慮不同超參數 (Hyperparameters) 對於自動降落控制訓練的影響。DDPG 於民航機之自動降落取得了顯著的結果,且與文獻相比在降落的控制上具有較好的表現,顯示了DDPG方法可以相當程度地學習固定翼之自動降落。DDPG於F-16高動態飛行器之應用更展現了DDPG與基準控制器相應或更好的下滑道追尋控制能力。本研究中的DDPG皆在電腦隨機生成的風擾環境中訓練與驗證,因此有其相應的強健性。另外,DDPG除了可以用來訓練類神經網路達成所需目標,亦可利用其自主學習的特性,觀察其如何以不同控制方法達成目標,藉此也可以探索更為複雜的飛行控制方法。

    Landing phase remains to be one of the most crucial and difficult tasks to achieve among the flight of an aircraft, especially for a high maneuverability aircraft. The proof-of-concept controller in this research implemented the use of DDPG (Deep Deterministic Policy Gradient), a DRL (Deep Reinforcement Learning) approach, in attempt to find the control method, or policies, given the designed requirements using rewards. This research validated the capability of DDPG agent on the control of commercial aircraft landing with comparisons to baseline controllers. It is then applied to the glide slope tracking function in the landing phase of an F-16 high maneuverability aircraft. In this study, new methods in reward shaping, or reward engineering, and the investigation of the effects of hyperparameters used in the training for control of aircraft landing are proposed. The results of using DDPG for control of commercial aircraft landing, with comparisons made with numerous baseline Neural Network approaches, proves the ability and potential of such DRL method. Implementation to the glide slope tracking function of the F-16 aircraft demonstrated comparable, or even better results compared to the F-16 baseline controller. Both implementations are validated in numerous wind disturbance conditions, which showcased the robustness of the DDPG agents. Furthermore, it is also found that besides the capability of DDPG agents to develop control policies for aircraft landings, such method provides insights of the controls and states of aircraft during landing, enabling guidelines of the flight characteristics of the aircraft in landing for pilots or design of controllers.

    Abstract II Acknowledgements IV Contents VI List of Tables VIII List of Figures IX Nomenclature XI Chapter 1 Introduction 1 1.1 Research Background 1 1.2 Motivation and Objectives 4 1.3 Literature Review 5 1.4 Thesis Overview 8 Chapter 2 Aircraft and Wind Dynamic Models 9 2.1 Commercial Aircraft Model 9 2.2 Nonlinear F-16 Model 11 2.3 Linear F-16 Model 14 2.4 Atmospheric and Wind Turbulence Model 15 Chapter 3 Methodology 17 3.1 Automatic Landing Design 17 3.1.1 Commercial Aircraft Landing 17 3.1.2 F-16 Fighter Jet Landing 19 3.2 Baseline Controllers and Stability Augmentation Systems 21 3.3 Deep Reinforcement Learning 27 Chapter 4 Simulations and Results 30 4.1 Trim State for Simulation 30 4.2 Verification of the Simulation Environments 31 4.3 DDPG Agent Trained for Commercial Aircraft 35 4.3.1 DDPG_20: Landing of Commercial Aircraft under 20ft/s Wind 38 4.3.2 DDPG_75: Landing of Commercial Aircraft under 75ft/s Wind 40 4.3.3 DDPG_Outerloop: Application of DDPG to the Outer Loop Control 40 4.3.4 Validation of DDPG_20, DDPG_75 and DDPG_Outerloop 43 4.4 DDPG Agent Trained for the F-16 Aircraft 45 4.4.1 DDPG_F16Linear: DDPG on Glide Slope Tracking of F-16 48 4.4.2 Validation of DDPG_F16Linear with Trial Flights 51 4.4.3 Comparison of DDPG_F16Linear Agent with Baseline Controllers 53 4.4.4 Validation of DDPG_F16Linear using 6DoF Nonlinear F-16 Model 55 4.5 DDPG Agent Trainings and Hyperparameters Tuning 65 Chapter 5 Summary 70 5.1 Conclusion 70 5.2 Future Works 71 References 73

    [1] J.-G. Juang, L.-H. Chien, and F. Lin, "Automatic landing control system design using adaptive neural network and its hardware realization," IEEE Systems Journal, vol. 5, no. 2, pp. 266-277, 2011.
    [2] L. T. Nguyen, Simulator study of stall/post-stall characteristics of a fighter airplane with relaxed longitudinal static stability. National Aeronautics and Space Administration, 1979.
    [3] J. Urnes and R. Hess, "Development of the F/A-18A automatic carrier landing system," Journal of Guidance, Control, and Dynamics, vol. 8, no. 3, pp. 289-295, 1985.
    [4] R. H. JM Urnes, R. Moomaw, and R. Huff, "H-dot automatic carrier landing system for approach control in turbulence," Journal of Guidance and Control, vol. 4, no. 2, pp. 177-183, 1981.
    [5] M. Steinberg, "Development and simulation of an F/A-18 fuzzy logic automatic carrier landing system," in [Proceedings 1993] Second IEEE International Conference on Fuzzy Systems, 1993: IEEE, pp. 797-802.
    [6] M. Steinberg, "A fuzzy logic based F/A-18 automatic carrier landing system," in Guidance, Navigation and Control Conference, 1992, p. 4392.
    [7] M. B. Subrahmanyam, "H-infinity design of F/A-18A automatic carrier landing system," Journal of guidance, control, and dynamics, vol. 17, no. 1, pp. 187-191, 1994.
    [8] M. Anderson, "Inner and outer loop manual control of carrier aircraft landing," in Guidance, Navigation, and Control Conference, 2015, p. 3877.
    [9] A. Page and M. Steinberg, "A comparison of neural, fuzzy, evolutionary, and adaptive approaches for carrier landing," in AIAA Guidance, Navigation, and Control Conference and Exhibit, 2001, p. 4085.
    [10] K. Nho and R. K. Agarwal, "Automatic landing system design using fuzzy logic," Journal of Guidance, Control, and Dynamics, vol. 23, no. 2, pp. 298-304, 2000.
    [11] J. Luke, D. Ridgley, and D. Walker, "Flight controller design using mixed H (2)/H-infinity optimization with a singular H-infinity constraint," in Guidance, Navigation, and Control Conference, 1994, p. 3659.
    [12] B. prasad B and S. Pradeep, "Automatic landing system design using feedback linearization method," in AIAA infotech@ Aerospace 2007 conference and exhibit, 2007, p. 2733.
    [13] D. V. Rao and T. H. Go, "Automatic landing system design using sliding mode control," Aerospace Science and Technology, vol. 32, no. 1, pp. 180-187, 2014.
    [14] S. N. Singh, M. L. Steinberg, and A. Page, "Nonlinear adaptive and sliding mode flight path control of F/A-18 model," IEEE Transactions on Aerospace and Electronic Systems, vol. 39, no. 4, pp. 1250-1262, 2003.
    [15] C.-D. Yang, C.-C. Luo, S.-J. Liu, and Y.-H. Chang, "Applications of genetic-Taguchi algorithm in flight control designs," Journal of Aerospace Engineering, vol. 18, no. 4, pp. 232-241, 2005.
    [16] S. Singh and R. Padhi, "Automatic path planning and control design for autonomous landing of UAVs using dynamic inversion," in 2009 American Control Conference, 2009: IEEE, pp. 2409-2414.
    [17] B. S. Kim and A. J. Calise, "Nonlinear flight control using neural networks," Journal of Guidance, Control, and Dynamics, vol. 20, no. 1, pp. 26-33, 1997.
    [18] A. J. Calise and R. T. Rysdyk, "Nonlinear adaptive flight control using neural networks," IEEE Control Systems Magazine, vol. 18, no. 6, pp. 14-25, 1998.
    [19] T. Lee and Y. Kim, "Nonlinear adaptive flight control using backstepping and neural networks controller," Journal of Guidance, Control, and Dynamics, vol. 24, no. 4, pp. 675-682, 2001.
    [20] Y. Shin, A. J. Calise, and M. A. Motter, "Application of adaptive autopilot designs for an unmanned aerial vehicle," 2005.
    [21] S. Yoon, Y. Kim, and S. Park, "Constrained adaptive backstepping controller design for aircraft landing in wind disturbance and actuator stuck," International Journal of Aeronautical and Space Sciences, vol. 13, no. 1, pp. 74-89, 2012.
    [22] L. Sonneveldt, Q. Chu, and J. Mulder, "Adaptive backstepping flight control for modern fighter aircraft," Advances in Flight Control Systems, pp. 23-52, 2011.
    [23] C. C. Jorgensen and C. Schley, "A neural network baseline problem for control of aircraft flare and touchdown," in Neural networks for control, 1990, pp. 403-425.
    [24] J.-g. Juang, H.-H. Chang, and W.-B. Chang, "Intelligent automatic landing system using time delay neural network controller," Applied Artificial Intelligence, vol. 17, no. 7, pp. 563-581, 2003.
    [25] J.-G. Juang and K.-C. Cheng, "Application of neural networks to disturbances encountered landing control," IEEE Transactions on Intelligent Transportation Systems, vol. 7, no. 4, pp. 582-588, 2006.
    [26] J.-G. Juang, H.-K. Chiou, and L.-H. Chien, "Analysis and comparison of aircraft landing control using recurrent neural networks and genetic algorithms approaches," Neurocomputing, vol. 71, no. 16-18, pp. 3224-3238, 2008.
    [27] C.-M. Lin, C.-F. Tai, and C.-C. Chung, "Intelligent control system design for UAV using a recurrent wavelet neural network," Neural Computing and Applications, vol. 24, no. 2, pp. 487-496, 2014.
    [28] C.-M. Lin and E.-A. Boldbaatar, "Autolanding control using recurrent wavelet Elman neural network," IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 45, no. 9, pp. 1281-1291, 2015.
    [29] E. Bøhn, E. M. Coates, S. Moe, and T. A. Johansen, "Deep reinforcement learning attitude control of fixed-wing uavs using proximal policy optimization," in 2019 International Conference on Unmanned Aircraft Systems (ICUAS), 2019: IEEE, pp. 523-533.
    [30] W. Koch, R. Mancuso, R. West, and A. Bestavros, "Reinforcement learning for UAV attitude control," ACM Transactions on Cyber-Physical Systems, vol. 3, no. 2, pp. 1-21, 2019.
    [31] K. Kersandt, "Deep reinforcement learning as control method for autonomous uavs," Universitat Politècnica de Catalunya, 2018.
    [32] A. Waldock, C. Greatwood, F. Salama, and T. Richardson, "Learning to perform a perched landing on the ground using deep reinforcement learning," Journal of Intelligent & Robotic Systems, vol. 92, no. 3-4, pp. 685-704, 2018.
    [33] Y. Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel, "Benchmarking deep reinforcement learning for continuous control," in International Conference on Machine Learning, 2016, pp. 1329-1338.
    [34] S. A. Snell, D. F. Enns, and W. L. Garrard Jr, "Nonlinear inversion flight control for a supermaneuverable aircraft," Journal of guidance, control, and dynamics, vol. 15, no. 4, pp. 976-984, 1992.
    [35] 康心奕, "超機動飛行器的四元組非線性動態反算自主飛行控制系統設計與實作," 成功大學航空太空工程學系學位論文, no. 2016 年, pp. 1-113, 2016.
    [36] J. J. Harris, "F-35 Flight Control Law Design, Development and Verification," in 2018 Aviation Technology, Integration, and Operations Conference, 2018, p. 3516.
    [37] R. C. Nelson, Flight stability and automatic control. WCB/McGraw Hill New York, 1998.
    [38] F. R. Garza and E. A. Morelli, "A collection of nonlinear aircraft simulations in matlab," 2003.
    [39] F. Neuman and J. D. Foster, Investigation of a digital automatic aircraft landing system in turbulence. National Aeronautics and Space Administration, 1970.
    [40] F-16C/D Flight Manual, T.O. GR1F F-16CJ-1, 2002.
    [41] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
    [42] T. P. Lillicrap et al., "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2015.
    [43] V. Mnih et al., "Human-level control through deep reinforcement learning," Nature, vol. 518, no. 7540, pp. 529-533, 2015.
    [44] G. Matheron, N. Perrin, and O. Sigaud, "The problem with DDPG: understanding failures in deterministic environments with sparse rewards," arXiv preprint arXiv:1911.11679, 2019.
    [45] R. Liessner, J. Schmitt, A. Dietermann, and B. Bäker, "Hyperparameter Optimization for Deep Reinforcement Learning in Vehicle Energy Management," in ICAART (2), 2019, pp. 134-144.
    [46] S. L. Smith, P.-J. Kindermans, C. Ying, and Q. V. Le, "Don't decay the learning rate, increase the batch size," arXiv preprint arXiv:1711.00489, 2017.
    [47] S. Fujimoto, H. Van Hoof, and D. Meger, "Addressing function approximation error in actor-critic methods," arXiv preprint arXiv:1802.09477, 2018.
    [48] O. Härkegård, "Backstepping and control allocation with applications to flight control," Linköpings universitet, 2003.
    [49] 張舜淵, "高攻角戰機之非線性動態反算控制律設計," 成功大學航空太空工程學系學位論文, no. 2019 年, pp. 1-191, 2019.

    下載圖示
    2026-05-03公開
    QR CODE