簡易檢索 / 詳目顯示

研究生: 黃雅鈺
Huang, Ya-Yu
論文名稱: 應用增強式學習DQN於植保機之三維路徑規劃
Automatic Three-dimensional Path Planning for Agricultural Spraying Drone based on Deep Q-Learning
指導教授: 黃悅民
Huang, Yueh-Min
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 71
中文關鍵詞: 深度Q學習增強式學習路徑演算法智慧農業無人機
外文關鍵詞: Deep Q Learning (DQN), Unmanned Aerial Vehicles (UAV), Intelligent Agriculture, Reinforced learning, Path algorithm
相關次數: 點閱:58下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著人類追求便利和智慧化的生活、技術不斷地創新,這促使大部分年輕人都投入了科技產業發展,但伴隨著的是農業人口的遞減造成勞作力不足的問題,因此也開始有了農業機械化的興起,透過無人機的幫助不需要在依靠人為背著藥箱進行噴灑便可以在空中進行作業,但台灣的地形狹小有許多果園是位於山間導致地形較為起伏而且為了節省土地面積會將作物參雜栽種,所以在飛航員操控無人機時也增加了一定的難度,因此本研究提出了一套無人機三維路徑規劃系統,嘗試使用增強式學習方法(Reinforcement Learning)讓系統可以依據複雜的環境資訊自行判斷最佳農噴路徑,並與先前的路徑演算法做優缺點比較。
    之前被應用於路徑規劃的演算法像是蟻群演算法(Ant Colony Optimization, ACO)、粒子群演算法(Particle Swarm Optimization, PSO)、模擬退火演算法(Simulated annealing, SA)、基因遺傳演算法(Genetic Algorithms, GA),主要都是針對旅行商問題(Traveling Salesman Problem, TSP)的概念來解決在經過每個點位的情況下去計算最短路徑,但當環境越來越複雜,計算的複雜度也相對提高許多。因此為了讓無人機能夠躲避障礙物到達目的地,本研究提出透過增強式學習結合深度神經網路去訓練如何在複雜的環境中,除了達到前人所提出的在三維空間躲避障礙物外,還根據環境的參考因素:害蟲好發位置、樹種密度、坡地高度,來決策出一條能夠在減少農藥成本和無人機電池損耗量的最佳農噴路徑。
    本研究內容主要分為三個部分,一個部分為探討常被廣泛應用於解決銷售員問題的路徑演算法訓練結果之比較,並且分析其優缺點,第二部分為環境資料的分析處裡跟坡地地形三維資料的校正,第三部分為本研究提出的Deep Q Learning三維路徑規劃系統,並針對訓練結果進行分析與比較,探討其學習環境跟模型學習能力的關係,使無人機的路徑規劃更加智慧化。

    This study proposes a three-dimensional path planning system for UAVs, trying to use the Reinforcement learning (Deep Q Learning) so that the system can determine the best agricultural spraying path based on complex environmental information, and make advantages and disadvantages with the previous path algorithm Compare.
    The algorithms that have been applied to path planning before, such as ant colony algorithm, particle swarm algorithm, simulated annealing algorithm, genetic algorithm, are mainly based on the concept of Traveling Salesman Problem (TSP). After each point, the shortest path is calculated, but when the environment becomes more and more complicated, the complexity of the calculation is relatively increased. Thus, this study proposes to use Reinforcement learning combined with deep neural networks to train how to automatically plan three-dimensional path, which is based on environmental reference factors: the location of pests, the density of tree species, and the height of slopes to planning the best agricultural spray path, so that drones can perform autonomous agricultural spraying operations on slopes.
    The content of this study is mainly divided into three parts. The first part is to explore the comparison of the training results of the path algorithm and analyze its advantages and disadvantages. The second part is the analysis of environmental data. The third part is the Deep Q-learning 3D path planning system proposed for this study. It analyzes and compares the training results, discusses the relationship between the learning environment and the model learning ability.

    摘要 I Extended Abstract II 致謝 XX 目錄 XXI 表目錄 XXIV 圖目錄 XXV 第一章、緒論 1 1-1研究動機與背景 1 1-2研究目的 2 1-3章節編排 3 第二章、文獻探討 5 2-1 旅行員問題(Traveling Salesman Problem, TSP) 5 2-2 路徑演算法 6 2-2-1 蟻群演算法 6 2-2-2 粒子群演算法 8 2-2-3 模擬退火演算法 9 2-2-4 基因遺傳演算法 11 2-3 增強式學習(Reinforcement Learning) 13 2-3-1 馬爾科夫決策過程(Markov decision process, MDP) 14 2-3-2 Q-Learning 15 2-3-3 Deep Q Network (DQN) 17 2-4 無人機避障之應用 18 2-5 台灣座標系統轉換 22 第三章、開發測試平台及環境資料介紹 24 3-1 Tensorflow開發環境 24 3-2實驗設備 25 3-3無人機圖像蒐集設備 26 3-4 地形建模軟體與介面開發平台 27 3-4-1 地形建模軟體 27 3-4-2介面開發平台 29 第四章、系統設計與實作 30 4-1系統架構 30 4-1-1 無人機自動路徑規劃系統架構 30 4-1-2 增強式學習結合深度網路模型訓練步驟 31 4-2環境資料樣本 32 4-2-1 座標轉換 32 4-2-2 環境分類介紹 34 4-2-3 環境資料之產生 36 4-3 資料處理與模型實作 38 4-3-1 獎賞table產生 38 4-3-2 梯度下降法(GradientDescent) 40 4-3-3 經驗學習(Experiment replay) 41 4-3-4 模型與權重儲存 43 4-3-5 高度比對與座標轉換 44 第五章、實驗與結果分析 50 5-1 演算法訓練結果比較 50 5-1-1 蟻群演算法實驗結果 51 5-1-2 粒子群演算法實驗結果 52 5-1-3 模擬退火演算法實驗結果 53 5-1-4 基因遺傳演算法實驗結果 54 5-2 DQN獎賞設置的訓練結果 56 5-3 DQN網路層數不同的訓練結果 59 5-4 DQN 飛行方向4個跟8個的訓練結果 62 5-5 不同環境的訓練結果比較 63 第六章、結論與未來展望 66 6-1 結論 66 6-2 未來展望 67 參考文獻 69

    [1]Applegate, D. L., Bixby, R. E., Chvátal, V. and Cook, W. J. "The traveling salesman problem". Princeton university press., 2011.
    [2]Chen, C. J., Huang, Y. Y., Li, Y. S., Chang, C. Y., and Huang, Y. M. "Identification of fruit tree pests with deep learning on embedded drone to achieve accurate pesticide spraying." IEEE Access, Vol. 9, p.p. 21986-21997, 2019.
    [3]Chen, C. J., Huang, Y. Y., Li, Y. S., Chang, C. Y., and Huang, Y. M. "An AIoT Based Smart Agricultural System for Pests Detection." IEEE Access, Vol. 8, p.p. 180750-180761, 2020.
    [4]Dorigo, M., Maniezzo, V. and Colorni, A. "Ant system: optimization by a colony of cooperating agents.", IEEE Transactions on Systems, Man, and Cybernetics, Part B, Vol. 26.1, p.p. 29-41, 1996.
    [5]Dorigo, M. and Blum, C. "Ant colony optimization theory: A survey." Theoretical computer science 344.2-3, p.p. 243-278, 2015.
    [6]Fan, J., Wang, Z., Xie, Y. and Yang, Z. "A theoretical analysis of deep Q-learning." Learning for Dynamics and Control. PMLR, 2020.
    [7]Han, X., Dong, Y., Yue, L. and Xu, Q, "State transition simulated annealing algorithm for discrete-continuous optimization problems." IEEE Access, Vol. 7, p.p. 44391-44403, 2019.
    [8]Kennedy, J. and Eberhart, R. "Particle swarm optimization." Proceedings of ICNN'95-international conference on neural networks. Vol. 4. IEEE, 1995.
    [9]Kaelbling, L. P., Littman, M. L. and Moore, A. W. "Reinforcement learning: A survey." Journal of artificial intelligence research 4, p.p. 237-285, 1996.
    [10] Liu, W., Si, P., Sun, E., Li, M., Fang, C., & Zhang, Y., "Green mobility management in UAV-assisted IoT based on dueling DQN." ICC 2019-2019 IEEE International Conference on Communications (ICC), 2019.
    [11] Liu, L., Tian, B., Zhao, X. and Zong, Q. "UAV Autonomous Trajectory Planning in Target Tracking Tasks via a DQN Approach." In 2019 IEEE International Conference on Real-time Computing and Robotics (RCAR), p.p. 277-282, 2019.
    [12] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602, 2013.
    [13] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G. "Human-level control through deep reinforcement learning." Nature, Vol. 518.7540, p.p. 529-533, 2015.
    [14] Mogili, U. R. and Deepak, B. B. V. L. "Review on application of drone systems in precision agriculture." Procedia computer science, Vol. 133, p.p. 502-509, 2018.
    [15] Mirjalili, S., "Genetic algorithm." Evolutionary algorithms and neural networks. Springer, Cham, p.p.43-55, 2019.
    [16] Puterman, M. L. "Markov decision processes." Handbooks in operations research and management science 2, p.p. 331-434, 1990.
    [17] Qian, H. and Su, T. "Hybrid algorithm based on max and min ant system and particle swarm optimization for solving TSP problem." 2018 33rd Youth Academic Annual Conference of Chinese Association of Automation (YAC). IEEE, 2018.
    [18] Qu, C., Gai, W., Zhong, M. and Zhang, J. "A novel reinforcement learning based grey wolf optimizer algorithm for unmanned aerial vehicles (UAVs) path planning." Applied Soft Computing, Vol. 89, No. 106099, 2020.
    [19] Rutenbar, R. A. "Simulated annealing algorithms: An overview." IEEE Circuits and Devices magazine, Vol. 5.1, p.p. 19-26, 1989.
    [20] Razali, N. M., and Geraghty, J. "Genetic algorithm performance with different selection strategies in solving TSP." Proceedings of the world congress on engineering. Vol. 2. No. 1. Hong Kong: International Association of Engineers, 2011.
    [21] Shetty, A., Shetty, A., Puthusseri, K. S. and Shankaramani, R. "An improved ant colony optimization algorithm: Minion Ant (MAnt) and its application on TSP." 2018 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 2018.
    [22] Watkins, C. J., & Dayan, P. "Q-learning." Machine learning, Vol. 8.3-4, p.p. 279-292, 1992.
    [23] Wiering, M. A. and Van Otterlo, M. "Reinforcement learning." Adaptation, learning, and optimization 12.3, 2012.
    [24] Wan, K., Gao, X., Hu, Z., and Wu, G. "Robust motion control for UAV in dynamic uncertain environments using deep reinforcement learning." Remote sensing, Vol. 12.4 No. 640, 2020.
    [25] Yijing, Z., Zheng, Z., Xiaoyi, Z., & Yang, L. "Q learning algorithm based UAV path learning and obstacle avoidence approach." 2017 36th Chinese Control Conference (CCC). IEEE, 2017.
    [26] Yan, C., Xiang, X. and Wang, C. "Towards real-time path planning through deep reinforcement learning for a UAV in dynamic environments." Journal of Intelligent & Robotic Systems, p.p. 1-13, 2019.
    [27] "PHANTOM 4 PRO - DJI", 2018. [Online]. Available: https://www.dji.com/tw/phantom-4-pro. [Available: June. 2021].
    [28] https://gym.openai.com/envs/CartPole-v1/. [Available: June. 2021].
    [29] "Tensorflow", Tensorflow, 2019. [Online]. Available: https://www.tensorflow.org/. [Available: Jul. 2021]
    [30] "NVIDIA 1080 TI", NVIDIA, 2019. [Online]. Available: https://www.nvidia.com/zh-tw/geforce/products/10series/geforce-gtx-1080-ti/. [Available: June. 2021].
    [31] "Pix4Dmapper", Pix4Dmapper. [Online]. Available: https://www.pix4d.com/product/pix4dmapper-photogrammetry-software/.
    [32] https://www.aec.gov.tw/緊急應變/什麼是輻災/核子事故/國際核能事件分級制--5_39_3558_3560.html. [Available: Jul. 2021].

    下載圖示 校內:2023-08-31公開
    校外:2023-08-31公開
    QR CODE