成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	何佳諭 Ho, Chia-Yu
論文名稱：	基於強化學習之無人機控制在多樣環境 Drone Control in Diverse Environments Based on Reinforcement Learning
指導教授：	賴槿峰 Lai, Chin-Feng
學位類別：	碩士 Master
系所名稱：	工學院 - 工程科學系 Department of Engineering Science
論文出版年：	2019
畢業學年度：	107
語文別：	中文
論文頁數：	39
中文關鍵詞：	四軸飛行器、強化學習、深度學習
外文關鍵詞：	Drone, Reinforcement Learning, Deep Learning
相關次數：	點閱：145 下載：2
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近年來，四軸飛行器及計算機效能的成熟，四軸飛行器可以代替人類完成危險或是需要空中影像的任務，但是訓練操控四軸飛行器的駕駛員需要龐大的人力資源，因此近期有許多自動控制的方法降低操控四軸飛行器的成本。本論文提出基於強化學習控制四軸飛行器自主降落，因為強化學習需要從失敗中學習經驗，微軟開源四軸飛行器的模擬器，在虛擬環境中訓練可以降低四軸飛行器的耗損率及時間成本。強化學習本身有基於價值與基於決策的方法，本論文實現基於價值的Q-學習及基於決策的REINFORCE，並且在多樣的環境中評估兩演算法之優缺點。

In recent years, the effectiveness of drone and computers has improved. Drone can take the place of humans in dangerous missions or missions requiring aerial imagery. But training pilots to operate drone is expensive, so there are a number of ways to reduce the cost of operating drone. This paper proposes to control the autonomous landing of drone based on reinforcement learning. Reinforcement learning requires learning from failure. Microsoft open source drone simulator and training drone in virtual environment can reduce the wear rate and time cost of quadcopter. Reinforcement learning has value - based and policy - based approaches. This paper implements value-based q-learning and policy-based REINFORCE and evaluates the advantages and disadvantages of the two algorithms in diverse environments

摘要	I
Extend Abstract	II
誌謝	VII
內文目錄	VIII
圖目錄	XI
第一章 緒論	1
1 研究動機	1
2 研究方向	3
3 章節提要	4
第二章 背景介紹與文獻探討	5
1強化學習	5
1.1 強化學習模型介紹	8
1.2 強化學習之應用	10
2 UAV(Unmanned Aerial Vehicle)應用和控制	12
2.1 UAV之應用	12
3基於強化學習控制無人機	12
第三章 研究方法	14
1 四軸飛行器控制	14
1.1飛行器的行為與狀態	14
2 強化學習	15
2.1 設定任務及其所需之參數	15
3 Q-learning方法訓練飛行器降落	16
3.1 Q-learning	16
4 Q-learning訓練飛行器詳細流程	17
4.1 初始化隨機起始點	19
4.2 探索(exploration)還是利用(exploitation)	19
4.3 更新Q-table	20
4.4 迭代更新	20
5 Policy Gradient方法訓練飛行器降落	21
5.1 Policy Gradient	21
6 REINFORCE 訓練飛行器詳細流程	22
6.1初始化隨機起點	23
6.2 選擇行為	23
6.3紀錄飛行過程資訊	23
6.4 更新模型	23
第四章 研究結果	24
1 實驗設計	24
1.1 實驗環境	24
1.2 實驗流程	25
2實驗結果	27
2.1實驗1	27
2.2實驗2	30
第五章 結論與未來展望	33
1 結論	33
2 未來展望	35
參考文獻	36
                                    

[1] “DRONE INDUSTRY ANALYSIS: Market trends & growth forecasts - Business Insider.” [Online]. Available: https://www.businessinsider.com/drone-industry-analysis-market-trends-growth-forecasts-2017-7. [Accessed: 20-May-2019].
[2] L. G.Sol, P. G.Estimado, L.Org, andL.Gaceta, “10884 08,” no. 10884, 2010.
[3] V.Mnih et al., “learning,” 2015.
[4] M.Hausknecht andP.Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs,” 2015.
[5] Z.Wang, T.Schaul, M.Hessel, H.vanHasselt, M.Lanctot, andN.deFreitas, “Dueling Network Architectures for Deep Reinforcement Learning,” no. 9, 2015.
[6] H.VanHasselt, A.Guez, andD.Silver, “Deep Reinforcement Learning with Double Q-Learning,” pp. 2094–2100.
[7] T.Schaul, J.Quan, I.Antonoglou, andD.Silver, “Prioritized Experience Replay,” pp. 1–21, 2015.
[8] M.Hessel et al., “Rainbow: Combining Improvements in Deep Reinforcement Learning,” 2017.
[9] H.VanHasselt, A. C.Group, andC.Wiskunde, “Double Q-learning,” pp. 1–9.
[10] C.Doersch, “Tutorial on Variational Autoencoders,” pp. 1–23, 2016.
[11] M.Paczkowski, “Low-friction composite creping blades improve tissue properties,” Pulp Pap., vol. 70, no. 9, 1996.
[12] D.Silver, G.Lever, D.Technologies, G. U. Y.Lever, andU. C. L.Ac, “Deterministic Policy Gradient (DPG),” Proc. 31st Int. Conf. Mach. Learn., vol. 32, no. 1, pp. 387–395, 2014.
[13] T. P.Lillicrap et al., “CONTINUOUS CONTROL WITH DEEP REINFORCEMENT LEARNING.”
[14] J. N.Tsitsiklis, “Actor-Critic Algorithms.”
[15] V.Mnih et al., “Asynchronous Methods for Deep Reinforcement Learning,” vol. 48, 2016.
[16] D.Silver et al., “Mastering the game of Go with deep neural networks and tree search.,” Nature, vol. 529, no. 7587, pp. 484–9, 2016.
[17] D.Silver, J.Schrittwieser, K.Simonyan, I. A.-Nature, andU.2017, “Mastering the game of Go without human knowledge,” Nature.Com, 2009.
[18] M. G.Bellemare andJ.Veness, “The Arcade Learning Environment : An Evaluation Platform for General Agents,” vol. 47, pp. 253–279, 2013.
[19] V.Mnih et al., “Playing Atari with Deep Reinforcement Learning,” 2013.
[20] S.Narayan, S. B.Cohen, andM.Lapata, “Ranking Sentences for Extractive Summarization with Reinforcement Learning,” 2018.
[21] I.VSerban et al., “A Deep Reinforcement Learning Chatbot arXiv : 1709 . 02349v2 [ cs . CL ] 5 Nov 2017,” pp. 1–40.
[22] Y.Chai andG.Liu, “Utterance censorship of online reinforcement learning chatbot,” Proc. - Int. Conf. Tools with Artif. Intell. ICTAI, vol. 2018-November, pp. 358–362, 2018.
[23] M.Lewis, D.Yarats, Y. N.Dauphin, D.Parikh, andD.Batra, “Deal or No Deal? End-to-End Learning for Negotiation Dialogues,” 2012.
[24] “Machine Translation - Microsoft Translator for Business.” [Online]. Available: https://www.microsoft.com/en-us/translator/business/machine-translation/#howtext. [Accessed: 04-Jun-2019].
[25] S. A.Green, N.Bend, S.Isaacs, S.Sylvain, andR. U. S. A.Data, “( 12 ) United States Patent,” vol. 2, no. 12, 2017.
[26] Y.Zeng, R.Zhang, andT. J.Lim, “Wireless communications with unmanned aerial vehicles: Opportunities and challenges,” IEEE Commun. Mag., vol. 54, no. 5, pp. 36–42, 2016.
[27] J. C.Hodgson, S. M.Baylis, R.Mott, A.Herrod, andR. H.Clarke, “Precision wildlife monitoring using unmanned aerial vehicles,” Sci. Rep., vol. 6, no. March, pp. 1–7, 2016.
[28] I. B.Richman et al., “( 12 ) United States Patent,” vol. 1, no. 12, 2017.
[29] C. Y.Ho, S. Y.Tseng, C. F.Lai, M. S.Wang, andC. J.Chen, “A parameter sharing method for reinforcement learning model between AirSim and UAVs,” Proc. - 2018 1st Int. Cogn. Cities Conf. IC3 2018, pp. 20–23, 2018.
[30] N.Imanberdiyev, C.Fu, E.Kayacan, andI. M.Chen, “Autonomous navigation of UAV by using real-time model-based reinforcement learning,” 2016 14th Int. Conf. Control. Autom. Robot. Vision, ICARCV 2016, vol. 2016, no. November, pp. 1–6, 2017.
[31] G.Kahn, A.Villaflor, V.Pong, P.Abbeel, andS.Levine, “Uncertainty-Aware Reinforcement Learning for Collision Avoidance,” 2017.
[32] T. C.Wu, S. Y.Tseng, C. F.Lai, C. Y.Ho, andY. H.Lai, “Navigating assistance system for quadcopter with deep reinforcement learning,” Proc. - 2018 1st Int. Cogn. Cities Conf. IC3 2018, pp. 16–19, 2018.
[33] Y.Zhao, Z.Zheng, X.Zhang, andY.Liu, “Q learning algorithm based UAV path learning and obstacle avoidence approach,” Chinese Control Conf. CCC, pp. 3397–3402, 2017.
[34] B.Zhang, Z.Mao, W.Liu, andJ.Liu, “Geometric Reinforcement Learning for Path Planning of UAVs,” J. Intell. Robot. Syst. Theory Appl., vol. 77, no. 2, pp. 391–409, 2013.
[35] S.Shah, D.Dey, C.Lovett, andA.Kapoor, “AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles,” pp. 1–14, 2017.
[36] “What is Unreal Engine 4.” [Online]. Available: https://www.unrealengine.com/en-US/?lang=en-US. [Accessed: 04-Jun-2019].
[37] “Unity.” [Online]. Available: https://unity.com/. [Accessed: 04-Jun-2019].
[38] “TensorFlow.” [Online]. Available: https://www.tensorflow.org/. [Accessed: 04-Jun-2019].

校內：2024-07-14公開
校外：不公開電子論文尚未授權公開，紙本請查館藏目錄

簡易檢索 / 詳目顯示

相關論文