簡易檢索 / 詳目顯示

研究生: 何佳諭
Ho, Chia-Yu
論文名稱: 基於強化學習之無人機控制在多樣環境
Drone Control in Diverse Environments Based on Reinforcement Learning
指導教授: 賴槿峰
Lai, Chin-Feng
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 39
中文關鍵詞: 四軸飛行器強化學習深度學習
外文關鍵詞: Drone, Reinforcement Learning, Deep Learning
相關次數: 點閱:145下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,四軸飛行器及計算機效能的成熟,四軸飛行器可以代替人類完成危險或是需要空中影像的任務,但是訓練操控四軸飛行器的駕駛員需要龐大的人力資源,因此近期有許多自動控制的方法降低操控四軸飛行器的成本。本論文提出基於強化學習控制四軸飛行器自主降落,因為強化學習需要從失敗中學習經驗,微軟開源四軸飛行器的模擬器,在虛擬環境中訓練可以降低四軸飛行器的耗損率及時間成本。強化學習本身有基於價值與基於決策的方法,本論文實現基於價值的Q-學習及基於決策的REINFORCE,並且在多樣的環境中評估兩演算法之優缺點。

    In recent years, the effectiveness of drone and computers has improved. Drone can take the place of humans in dangerous missions or missions requiring aerial imagery. But training pilots to operate drone is expensive, so there are a number of ways to reduce the cost of operating drone. This paper proposes to control the autonomous landing of drone based on reinforcement learning. Reinforcement learning requires learning from failure. Microsoft open source drone simulator and training drone in virtual environment can reduce the wear rate and time cost of quadcopter. Reinforcement learning has value - based and policy - based approaches. This paper implements value-based q-learning and policy-based REINFORCE and evaluates the advantages and disadvantages of the two algorithms in diverse environments

    摘要 I Extend Abstract II 誌謝 VII 內文目錄 VIII 圖目錄 XI 第一章 緒論 1 1.1 研究動機 1 1.2 研究方向 3 1.3 章節提要 4 第二章 背景介紹與文獻探討 5 2.1強化學習 5 2.1.1 強化學習模型介紹 8 2.1.2 強化學習之應用 10 2.2 UAV(Unmanned Aerial Vehicle)應用和控制 12 2.2.1 UAV之應用 12 2.3基於強化學習控制無人機 12 第三章 研究方法 14 3.1 四軸飛行器控制 14 3.1.1飛行器的行為與狀態 14 3.2 強化學習 15 3.2.1 設定任務及其所需之參數 15 3.3 Q-learning方法訓練飛行器降落 16 3.3.1 Q-learning 16 3.4 Q-learning訓練飛行器詳細流程 17 3.4.1 初始化隨機起始點 19 3.4.2 探索(exploration)還是利用(exploitation) 19 3.4.3 更新Q-table 20 3.4.4 迭代更新 20 3.5 Policy Gradient方法訓練飛行器降落 21 3.5.1 Policy Gradient 21 3.6 REINFORCE 訓練飛行器詳細流程 22 3.6.1初始化隨機起點 23 3.6.2 選擇行為 23 3.6.3紀錄飛行過程資訊 23 3.6.4 更新模型 23 第四章 研究結果 24 4.1 實驗設計 24 4.1.1 實驗環境 24 4.1.2 實驗流程 25 4.2實驗結果 27 4.2.1實驗1 27 4.2.2實驗2 30 第五章 結論與未來展望 33 5.1 結論 33 5.2 未來展望 35 參考文獻 36

    [1] “DRONE INDUSTRY ANALYSIS: Market trends & growth forecasts - Business Insider.” [Online]. Available: https://www.businessinsider.com/drone-industry-analysis-market-trends-growth-forecasts-2017-7. [Accessed: 20-May-2019].
    [2] L. G.Sol, P. G.Estimado, L.Org, andL.Gaceta, “10884 08,” no. 10884, 2010.
    [3] V.Mnih et al., “learning,” 2015.
    [4] M.Hausknecht andP.Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs,” 2015.
    [5] Z.Wang, T.Schaul, M.Hessel, H.vanHasselt, M.Lanctot, andN.deFreitas, “Dueling Network Architectures for Deep Reinforcement Learning,” no. 9, 2015.
    [6] H.VanHasselt, A.Guez, andD.Silver, “Deep Reinforcement Learning with Double Q-Learning,” pp. 2094–2100.
    [7] T.Schaul, J.Quan, I.Antonoglou, andD.Silver, “Prioritized Experience Replay,” pp. 1–21, 2015.
    [8] M.Hessel et al., “Rainbow: Combining Improvements in Deep Reinforcement Learning,” 2017.
    [9] H.VanHasselt, A. C.Group, andC.Wiskunde, “Double Q-learning,” pp. 1–9.
    [10] C.Doersch, “Tutorial on Variational Autoencoders,” pp. 1–23, 2016.
    [11] M.Paczkowski, “Low-friction composite creping blades improve tissue properties,” Pulp Pap., vol. 70, no. 9, 1996.
    [12] D.Silver, G.Lever, D.Technologies, G. U. Y.Lever, andU. C. L.Ac, “Deterministic Policy Gradient (DPG),” Proc. 31st Int. Conf. Mach. Learn., vol. 32, no. 1, pp. 387–395, 2014.
    [13] T. P.Lillicrap et al., “CONTINUOUS CONTROL WITH DEEP REINFORCEMENT LEARNING.”
    [14] J. N.Tsitsiklis, “Actor-Critic Algorithms.”
    [15] V.Mnih et al., “Asynchronous Methods for Deep Reinforcement Learning,” vol. 48, 2016.
    [16] D.Silver et al., “Mastering the game of Go with deep neural networks and tree search.,” Nature, vol. 529, no. 7587, pp. 484–9, 2016.
    [17] D.Silver, J.Schrittwieser, K.Simonyan, I. A.-Nature, andU.2017, “Mastering the game of Go without human knowledge,” Nature.Com, 2009.
    [18] M. G.Bellemare andJ.Veness, “The Arcade Learning Environment : An Evaluation Platform for General Agents,” vol. 47, pp. 253–279, 2013.
    [19] V.Mnih et al., “Playing Atari with Deep Reinforcement Learning,” 2013.
    [20] S.Narayan, S. B.Cohen, andM.Lapata, “Ranking Sentences for Extractive Summarization with Reinforcement Learning,” 2018.
    [21] I.VSerban et al., “A Deep Reinforcement Learning Chatbot arXiv : 1709 . 02349v2 [ cs . CL ] 5 Nov 2017,” pp. 1–40.
    [22] Y.Chai andG.Liu, “Utterance censorship of online reinforcement learning chatbot,” Proc. - Int. Conf. Tools with Artif. Intell. ICTAI, vol. 2018-November, pp. 358–362, 2018.
    [23] M.Lewis, D.Yarats, Y. N.Dauphin, D.Parikh, andD.Batra, “Deal or No Deal? End-to-End Learning for Negotiation Dialogues,” 2012.
    [24] “Machine Translation - Microsoft Translator for Business.” [Online]. Available: https://www.microsoft.com/en-us/translator/business/machine-translation/#howtext. [Accessed: 04-Jun-2019].
    [25] S. A.Green, N.Bend, S.Isaacs, S.Sylvain, andR. U. S. A.Data, “( 12 ) United States Patent,” vol. 2, no. 12, 2017.
    [26] Y.Zeng, R.Zhang, andT. J.Lim, “Wireless communications with unmanned aerial vehicles: Opportunities and challenges,” IEEE Commun. Mag., vol. 54, no. 5, pp. 36–42, 2016.
    [27] J. C.Hodgson, S. M.Baylis, R.Mott, A.Herrod, andR. H.Clarke, “Precision wildlife monitoring using unmanned aerial vehicles,” Sci. Rep., vol. 6, no. March, pp. 1–7, 2016.
    [28] I. B.Richman et al., “( 12 ) United States Patent,” vol. 1, no. 12, 2017.
    [29] C. Y.Ho, S. Y.Tseng, C. F.Lai, M. S.Wang, andC. J.Chen, “A parameter sharing method for reinforcement learning model between AirSim and UAVs,” Proc. - 2018 1st Int. Cogn. Cities Conf. IC3 2018, pp. 20–23, 2018.
    [30] N.Imanberdiyev, C.Fu, E.Kayacan, andI. M.Chen, “Autonomous navigation of UAV by using real-time model-based reinforcement learning,” 2016 14th Int. Conf. Control. Autom. Robot. Vision, ICARCV 2016, vol. 2016, no. November, pp. 1–6, 2017.
    [31] G.Kahn, A.Villaflor, V.Pong, P.Abbeel, andS.Levine, “Uncertainty-Aware Reinforcement Learning for Collision Avoidance,” 2017.
    [32] T. C.Wu, S. Y.Tseng, C. F.Lai, C. Y.Ho, andY. H.Lai, “Navigating assistance system for quadcopter with deep reinforcement learning,” Proc. - 2018 1st Int. Cogn. Cities Conf. IC3 2018, pp. 16–19, 2018.
    [33] Y.Zhao, Z.Zheng, X.Zhang, andY.Liu, “Q learning algorithm based UAV path learning and obstacle avoidence approach,” Chinese Control Conf. CCC, pp. 3397–3402, 2017.
    [34] B.Zhang, Z.Mao, W.Liu, andJ.Liu, “Geometric Reinforcement Learning for Path Planning of UAVs,” J. Intell. Robot. Syst. Theory Appl., vol. 77, no. 2, pp. 391–409, 2013.
    [35] S.Shah, D.Dey, C.Lovett, andA.Kapoor, “AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles,” pp. 1–14, 2017.
    [36] “What is Unreal Engine 4.” [Online]. Available: https://www.unrealengine.com/en-US/?lang=en-US. [Accessed: 04-Jun-2019].
    [37] “Unity.” [Online]. Available: https://unity.com/. [Accessed: 04-Jun-2019].
    [38] “TensorFlow.” [Online]. Available: https://www.tensorflow.org/. [Accessed: 04-Jun-2019].

    無法下載圖示 校內:2024-07-14公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE