簡易檢索 / 詳目顯示

研究生: 王敬傑
Wang, Jing-Jie
論文名稱: 以深度強化學習建構無人搬運車系統路徑規劃
The Path Planning Study for an Unmanned Vehicle System Based on Deep Reinforcement Learning
指導教授: 王泰裕
Wang, Tai-Yue
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業與資訊管理學系
Department of Industrial and Information Management
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 55
中文關鍵詞: 自動化物流運籌系統強化學習路徑規劃
外文關鍵詞: automatic material handling system, reinforcement learning, path planning
相關次數: 點閱:121下載:50
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在半導體業中,晶圓製造商為了提升生產效率大多導入了自動化物流運籌系統,該系統主要以無人車在空中軌道上搬運晶圓至各機台進行加工,由於每日有上千台無人車在處理晶圓交付任務,軌道的交通情況無時無刻地在變化,若無法動態感知軌道環境來規劃無人車行駛路徑,則容易發生多台無人車皆依循相同路徑前進而造成擁塞的情況。2020年有學者將Q-learning方法應用於自動化物流運籌系統的無人車路徑規劃,透過與系統環境互動,學習在目前狀態下該往何種方向前進以盡可能避免發生擁塞的情況,實驗證明該方法在路徑績效表現以及計算複雜度上優於一般路徑演算法,但該方法之缺點為考量的系統狀態有限,且在訓練完成後仍具有動作隨機性,導致無法做出最佳路徑決策。DQN為Q-learning與類神經網路結合的深度強化學習模型,用於改善Q-learning於複雜狀態情境難以收斂的缺點,而Double DQN為DQN的改善版本,用以改善DQN容易高估Q值的問題。本研究在基於Double DQN的架構上對其進行預訓練,稱為pre-trained DDQN,具有預訓練權重的深度強化學習模型,可以加速收斂並獲得更佳動作策略。因此,本研究目的為利用pre-trained DDQN來考量更多自動化物流運籌系統資訊且消除動作隨機性,並利用預訓練權重提升模型效能,以獲得當前狀態下的最佳決策來為無人車規劃晶圓交付路徑,並透過建構模擬系統與Q-learning進行績效比較。首先利用Dijkstra演算法產生預訓練樣本以對Double DQN模型進行預訓練,再將該模型與模擬系統互動以蒐集樣本並持續訓練,結果顯示pre-trained DDQN以及隨機初始權重的DDQN、DQN方法皆優於Q-learning,在大部分模擬情境中皆能獲得較低的平均交付時間(delivery time)和較高的吞吐量(throughput),驗證了基於DQN方法的有效性,其中又以本研究提出的pre-trained DDQN績效為最佳,相較於Q-learning,平均可縮短10.25%的平均交付時間以及提升11.82%的吞吐量。

    In the semiconductor industry, most wafer foundries improve production efficiency by the Automatic Material Handling System (AMHS). Through the system, wafers in process can be transported by Overhead Foist Transport (OHT) vehicles along guided rails to multiple processing machines. There are hundreds of OHTs moving in the system on a daily basis. If the system fails to plan delivery paths of OHTs based on the traffic conditions of the tracks properly, congestion would occur when multiple OHTs follow the same route. Hwang et al. (2020) used Q-learning to capture and solve congestion phenomenon. This method uses system information to guide each OHT effectively so that it could avoid congestion and find an efficient path. However, the method still has its shortcomings. One is that the system information considered is not complete enough, and the other is that this method may not select the path that has the shortest expected travel time after training. In this study, we propose a pre-trained DDQN model to overcome the shortcomings that mentioned above. DDQN is the algorithm that combines Q-learning and neural networks. It can overcome the difficulty of too much information that Q-learning can’t deal with. And pre-training weights can facilitate the speed of convergence while training DDQN. We use simulation to compare pre-trained DDQN with Q-learning. The results show that pre-trained DDQN can reduce 10.25% of average delivery time and improve 11.82% of throughput in contrast to Q-learning. That is, the system can get better routing policy by pre-trained DDQN.

    目錄 x 表目錄 xii 圖目錄 xiii 第一章 緒論 1 第一節 研究背景與動機 1 第二節 研究目的 2 第三節 研究範圍與假設 3 第四節 研究流程 4 第五節 論文架構 4 第二章 文獻探討 6 第一節 自動化物流運籌系統 6 第二節 AMHS車隊管理 11 第三節 人工智慧應用於路徑規劃 16 第四節 強化學習 17 第五節 Dijkstra演算法 22 第六節 小結 23 第三章 建構AMHS路徑規劃系統 24 第一節 問題描述 24 第二節 模型建構程序 25 第三節 模擬系統 26 第四節 訓練樣本 29 第五節 DDQN預訓練流程 30 第六節 Pre-trained DDQN訓練流程 31 第七節 系統評估指標 33 第八節 小結 34 第四章 模型分析與驗證 35 第一節 情境說明 35 第二節 動作策略選擇 36 第三節 模型參數設定 37 第四節 模型績效與比較 39 第五節 小結 47 第五章 結論與建議 49 第一節 結論 49 第二節 研究貢獻 50 第三節 未來研究建議與方向 51 參考文獻 52

    顏豪君, & 巫木誠. (2005). 快速評估12吋晶圓廠AMHS的模擬方法. 陽明交通大學工業工程與管理系所.
    Ahn, K., Lee, K., Yeon, J., & Park, J. (2022). Congestion-aware dynamic routing for an overhead hoist transporter system using a graph convolutional gated recurrent unit. IISE Transactions, 54(8), 803-816.
    Bartholdi III, J. J., & Platzman, L. K. (1989). Decentralized control of automated guided vehicles on a simple loop. IIE transactions, 21(1), 76-81.
    Bartlett, K., Lee, J., Ahmed, S., Nemhauser, G., Sokol, J., & Na, B. (2014). Congestion-aware dynamic routing in automated material handling systems. Computers & Industrial Engineering, 70, 176-182.
    Binder, H., & Honold, A. (1999). Automation and fab concepts for 300 mm wafer manufacturing. Microelectronic Engineering, 45(2-3), 91-100.
    Boyan, J., & Littman, M. (1993). Packet routing in dynamically changing networks: A reinforcement learning approach. Advances in Neural Information Processing Systems, 6.
    Choi, S., & Yeung, D.-Y. (1995). Predictive Q-routing: A memory-based reinforcement learning approach to adaptive traffic control. Advances in Neural Information Processing Systems, 8.
    Egbelu, P. J., & Tanchoco, J. M. (1984). Characterization of automatic guided vehicle dispatching rules. The International Journal of Production Research, 22(3), 359-374.
    Hu, H., Yang, X., Xiao, S., & Wang, F. (2021). Anti-conflict AGV path planning in automated container terminals based on multi-agent reinforcement learning. International Journal of Production Research, 1-16.
    Huang, H.-W., Lu, C.-H., & Fu, L.-C. (2007). Lot dispatching and scheduling integrating OHT traffic information in the 300mm wafer fab. 2007 IEEE International Conference on Automation Science and Engineering.
    Hwang, I., & Jang, Y. J. (2020). Q (λ) learning-based dynamic route guidance algorithm for overhead hoist transport systems in semiconductor fabs. International Journal of Production Research, 58(4), 1199-1221.
    Koo, P.-H., Jang, J., & Suh, J. (2005). Vehicle dispatching for highly loaded semiconductor production considering bottleneck machines first. International Journal of Flexible Manufacturing Systems, 17(1), 23-38.
    Kumar, S., & Miikkulainen, R. (1999). Confidence based dual reinforcement q-routing: An adaptive online network routing algorithm. IJCAI.
    Kuo, C.-H., & Huang, C.-S. (2006). Dispatching of overhead hoist vehicles in a fab intrabay using a multimission-oriented controller. The International Journal of Advanced Manufacturing Technology, 27(7), 824-832.
    Kurosaki, R., Nagao, N., Komada, H., Watanabe, Y., & Yano, H. (1997). AMHS for 300 mm wafer. 1997 IEEE International Symposium on Semiconductor Manufacturing Conference Proceedings (Cat. No. 97CH36023).
    Lee, S., Lee, J., & Na, B. (2018). Practical routing algorithm using a congestion monitoring system in semiconductor manufacturing. IEEE Transactions on Semiconductor Manufacturing, 31(4), 475-485.
    Liao, D.-Y., Jeng, M.-D., & Zhou, M. (2007). Application of Petri nets and Lagrangian relaxation to scheduling automatic material-handling vehicles in 300-mm semiconductor manufacturing. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 37(4), 504-516.
    Lin, J., Wang, F.-K., & Yen, P.-Y. (2001). Simulation analysis of dispatching rules for an automated interbay material handling system in wafer fab. International Journal of Production Research, 39(6), 1221-1238.
    Lin, J. T., & Huang, C.-J. (2014). A simulation-based optimization approach for a semiconductor photobay with automated material handling system. Simulation Modelling Practice and Theory, 46, 76-100.
    Lin, J. T., & Huang, C.-W. (2011). A novel vehicle assignment method for automated material handling system in semiconductor manufacturing. 2011 IEEE 18th International Conference on Industrial Engineering and Engineering Management.
    Lin, J. T., Wang, F. K., & Wu, C. K. (2003). Simulation analysis of the connecting transport AMHS in a wafer fab. IEEE Transactions on Semiconductor Manufacturing, 16(3), 555-564.
    Lin, J. T., Wu, C.-H., & Huang, C.-W. (2013). Dynamic vehicle allocation control for automated material handling system in semiconductor manufacturing. Computers & Operations Research, 40(10), 2329-2339.
    Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., & Ostrovski, G. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533.
    Nadoli, G., & Pillai, D. (1994). Simulation in automated material handling systems design for semiconductor manufacturing. Proceedings of Winter Simulation Conference.
    Nakamura, R., Sawada, K., Shin, S., Kumagai, K., & Yoneda, H. (2015). Model reformulation for conflict-free routing problems using Petri Net and Deterministic Finite Automaton. Artificial Life and Robotics, 20(3), 262-269.
    Ndiaye, M. A., Dauzère-Pérès, S., Yugma, C., Rullière, L., & Lamiable, G. (2016). Automated transportation of auxiliary resources in a semiconductor manufacturing facility. 2016 Winter Simulation Conference (WSC).
    Peshkin, L., & Savova, V. (2002). Reinforcement learning for adaptive routing. Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No. 02CH37290).
    Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.
    Welch, P. (1983). The statistical analysis of simulation results. In The computer modeling handbook, ed. S. Lavenberg, 268-328. New York: Academic Press.
    Yang, J.-W., Cheng, H.-C., Chiang, T.-C., & Fu, L.-C. (2008). Multiobjective lot scheduling and dynamic OHT routing in a 300-mm wafer fab. 2008 IEEE international conference on systems, man and cybernetics.
    Yang, Y., Juntao, L., & Lingling, P. (2020). Multi‐robot path planning based on a deep reinforcement learning DQN algorithm. CAAI Transactions on Intelligence Technology, 5(3), 177-183.
    Yu, S., Zhou, J., Li, B., Mabu, S., & Hirasawa, K. (2012). Q value-based Dynamic Programming with SARSA Learning for real time route guidance in large scale road networks. The 2012 International Joint Conference on Neural Networks (IJCNN).
    Zolfpour-Arokhlo, M., Selamat, A., Hashim, S. Z. M., & Afkhami, H. (2014). Modeling of route planning system based on Q value-based dynamic programming with multi-agent reinforcement learning algorithms. Engineering Applications of Artificial Intelligence, 29, 163-177.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE