| 研究生: |
王敬傑 Wang, Jing-Jie |
|---|---|
| 論文名稱: |
以深度強化學習建構無人搬運車系統路徑規劃 The Path Planning Study for an Unmanned Vehicle System Based on Deep Reinforcement Learning |
| 指導教授: |
王泰裕
Wang, Tai-Yue |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 工業與資訊管理學系 Department of Industrial and Information Management |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 中文 |
| 論文頁數: | 55 |
| 中文關鍵詞: | 自動化物流運籌系統 、強化學習 、路徑規劃 |
| 外文關鍵詞: | automatic material handling system, reinforcement learning, path planning |
| 相關次數: | 點閱:121 下載:50 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在半導體業中,晶圓製造商為了提升生產效率大多導入了自動化物流運籌系統,該系統主要以無人車在空中軌道上搬運晶圓至各機台進行加工,由於每日有上千台無人車在處理晶圓交付任務,軌道的交通情況無時無刻地在變化,若無法動態感知軌道環境來規劃無人車行駛路徑,則容易發生多台無人車皆依循相同路徑前進而造成擁塞的情況。2020年有學者將Q-learning方法應用於自動化物流運籌系統的無人車路徑規劃,透過與系統環境互動,學習在目前狀態下該往何種方向前進以盡可能避免發生擁塞的情況,實驗證明該方法在路徑績效表現以及計算複雜度上優於一般路徑演算法,但該方法之缺點為考量的系統狀態有限,且在訓練完成後仍具有動作隨機性,導致無法做出最佳路徑決策。DQN為Q-learning與類神經網路結合的深度強化學習模型,用於改善Q-learning於複雜狀態情境難以收斂的缺點,而Double DQN為DQN的改善版本,用以改善DQN容易高估Q值的問題。本研究在基於Double DQN的架構上對其進行預訓練,稱為pre-trained DDQN,具有預訓練權重的深度強化學習模型,可以加速收斂並獲得更佳動作策略。因此,本研究目的為利用pre-trained DDQN來考量更多自動化物流運籌系統資訊且消除動作隨機性,並利用預訓練權重提升模型效能,以獲得當前狀態下的最佳決策來為無人車規劃晶圓交付路徑,並透過建構模擬系統與Q-learning進行績效比較。首先利用Dijkstra演算法產生預訓練樣本以對Double DQN模型進行預訓練,再將該模型與模擬系統互動以蒐集樣本並持續訓練,結果顯示pre-trained DDQN以及隨機初始權重的DDQN、DQN方法皆優於Q-learning,在大部分模擬情境中皆能獲得較低的平均交付時間(delivery time)和較高的吞吐量(throughput),驗證了基於DQN方法的有效性,其中又以本研究提出的pre-trained DDQN績效為最佳,相較於Q-learning,平均可縮短10.25%的平均交付時間以及提升11.82%的吞吐量。
In the semiconductor industry, most wafer foundries improve production efficiency by the Automatic Material Handling System (AMHS). Through the system, wafers in process can be transported by Overhead Foist Transport (OHT) vehicles along guided rails to multiple processing machines. There are hundreds of OHTs moving in the system on a daily basis. If the system fails to plan delivery paths of OHTs based on the traffic conditions of the tracks properly, congestion would occur when multiple OHTs follow the same route. Hwang et al. (2020) used Q-learning to capture and solve congestion phenomenon. This method uses system information to guide each OHT effectively so that it could avoid congestion and find an efficient path. However, the method still has its shortcomings. One is that the system information considered is not complete enough, and the other is that this method may not select the path that has the shortest expected travel time after training. In this study, we propose a pre-trained DDQN model to overcome the shortcomings that mentioned above. DDQN is the algorithm that combines Q-learning and neural networks. It can overcome the difficulty of too much information that Q-learning can’t deal with. And pre-training weights can facilitate the speed of convergence while training DDQN. We use simulation to compare pre-trained DDQN with Q-learning. The results show that pre-trained DDQN can reduce 10.25% of average delivery time and improve 11.82% of throughput in contrast to Q-learning. That is, the system can get better routing policy by pre-trained DDQN.
顏豪君, & 巫木誠. (2005). 快速評估12吋晶圓廠AMHS的模擬方法. 陽明交通大學工業工程與管理系所.
Ahn, K., Lee, K., Yeon, J., & Park, J. (2022). Congestion-aware dynamic routing for an overhead hoist transporter system using a graph convolutional gated recurrent unit. IISE Transactions, 54(8), 803-816.
Bartholdi III, J. J., & Platzman, L. K. (1989). Decentralized control of automated guided vehicles on a simple loop. IIE transactions, 21(1), 76-81.
Bartlett, K., Lee, J., Ahmed, S., Nemhauser, G., Sokol, J., & Na, B. (2014). Congestion-aware dynamic routing in automated material handling systems. Computers & Industrial Engineering, 70, 176-182.
Binder, H., & Honold, A. (1999). Automation and fab concepts for 300 mm wafer manufacturing. Microelectronic Engineering, 45(2-3), 91-100.
Boyan, J., & Littman, M. (1993). Packet routing in dynamically changing networks: A reinforcement learning approach. Advances in Neural Information Processing Systems, 6.
Choi, S., & Yeung, D.-Y. (1995). Predictive Q-routing: A memory-based reinforcement learning approach to adaptive traffic control. Advances in Neural Information Processing Systems, 8.
Egbelu, P. J., & Tanchoco, J. M. (1984). Characterization of automatic guided vehicle dispatching rules. The International Journal of Production Research, 22(3), 359-374.
Hu, H., Yang, X., Xiao, S., & Wang, F. (2021). Anti-conflict AGV path planning in automated container terminals based on multi-agent reinforcement learning. International Journal of Production Research, 1-16.
Huang, H.-W., Lu, C.-H., & Fu, L.-C. (2007). Lot dispatching and scheduling integrating OHT traffic information in the 300mm wafer fab. 2007 IEEE International Conference on Automation Science and Engineering.
Hwang, I., & Jang, Y. J. (2020). Q (λ) learning-based dynamic route guidance algorithm for overhead hoist transport systems in semiconductor fabs. International Journal of Production Research, 58(4), 1199-1221.
Koo, P.-H., Jang, J., & Suh, J. (2005). Vehicle dispatching for highly loaded semiconductor production considering bottleneck machines first. International Journal of Flexible Manufacturing Systems, 17(1), 23-38.
Kumar, S., & Miikkulainen, R. (1999). Confidence based dual reinforcement q-routing: An adaptive online network routing algorithm. IJCAI.
Kuo, C.-H., & Huang, C.-S. (2006). Dispatching of overhead hoist vehicles in a fab intrabay using a multimission-oriented controller. The International Journal of Advanced Manufacturing Technology, 27(7), 824-832.
Kurosaki, R., Nagao, N., Komada, H., Watanabe, Y., & Yano, H. (1997). AMHS for 300 mm wafer. 1997 IEEE International Symposium on Semiconductor Manufacturing Conference Proceedings (Cat. No. 97CH36023).
Lee, S., Lee, J., & Na, B. (2018). Practical routing algorithm using a congestion monitoring system in semiconductor manufacturing. IEEE Transactions on Semiconductor Manufacturing, 31(4), 475-485.
Liao, D.-Y., Jeng, M.-D., & Zhou, M. (2007). Application of Petri nets and Lagrangian relaxation to scheduling automatic material-handling vehicles in 300-mm semiconductor manufacturing. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 37(4), 504-516.
Lin, J., Wang, F.-K., & Yen, P.-Y. (2001). Simulation analysis of dispatching rules for an automated interbay material handling system in wafer fab. International Journal of Production Research, 39(6), 1221-1238.
Lin, J. T., & Huang, C.-J. (2014). A simulation-based optimization approach for a semiconductor photobay with automated material handling system. Simulation Modelling Practice and Theory, 46, 76-100.
Lin, J. T., & Huang, C.-W. (2011). A novel vehicle assignment method for automated material handling system in semiconductor manufacturing. 2011 IEEE 18th International Conference on Industrial Engineering and Engineering Management.
Lin, J. T., Wang, F. K., & Wu, C. K. (2003). Simulation analysis of the connecting transport AMHS in a wafer fab. IEEE Transactions on Semiconductor Manufacturing, 16(3), 555-564.
Lin, J. T., Wu, C.-H., & Huang, C.-W. (2013). Dynamic vehicle allocation control for automated material handling system in semiconductor manufacturing. Computers & Operations Research, 40(10), 2329-2339.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., & Ostrovski, G. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533.
Nadoli, G., & Pillai, D. (1994). Simulation in automated material handling systems design for semiconductor manufacturing. Proceedings of Winter Simulation Conference.
Nakamura, R., Sawada, K., Shin, S., Kumagai, K., & Yoneda, H. (2015). Model reformulation for conflict-free routing problems using Petri Net and Deterministic Finite Automaton. Artificial Life and Robotics, 20(3), 262-269.
Ndiaye, M. A., Dauzère-Pérès, S., Yugma, C., Rullière, L., & Lamiable, G. (2016). Automated transportation of auxiliary resources in a semiconductor manufacturing facility. 2016 Winter Simulation Conference (WSC).
Peshkin, L., & Savova, V. (2002). Reinforcement learning for adaptive routing. Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No. 02CH37290).
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.
Welch, P. (1983). The statistical analysis of simulation results. In The computer modeling handbook, ed. S. Lavenberg, 268-328. New York: Academic Press.
Yang, J.-W., Cheng, H.-C., Chiang, T.-C., & Fu, L.-C. (2008). Multiobjective lot scheduling and dynamic OHT routing in a 300-mm wafer fab. 2008 IEEE international conference on systems, man and cybernetics.
Yang, Y., Juntao, L., & Lingling, P. (2020). Multi‐robot path planning based on a deep reinforcement learning DQN algorithm. CAAI Transactions on Intelligence Technology, 5(3), 177-183.
Yu, S., Zhou, J., Li, B., Mabu, S., & Hirasawa, K. (2012). Q value-based Dynamic Programming with SARSA Learning for real time route guidance in large scale road networks. The 2012 International Joint Conference on Neural Networks (IJCNN).
Zolfpour-Arokhlo, M., Selamat, A., Hashim, S. Z. M., & Afkhami, H. (2014). Modeling of route planning system based on Q value-based dynamic programming with multi-agent reinforcement learning algorithms. Engineering Applications of Artificial Intelligence, 29, 163-177.