簡易檢索 / 詳目顯示

研究生: 劉志瑋
Liu, Zhi-Wei
論文名稱: PaT: 用於大尺寸在線二維裝箱的 Transformer 深度強化學習方法
PaT: Transformer-based Deep Reinforcement Learning for Large-sized Online 2D Bin Packing
指導教授: 蘇銓清
Sue, Chuan-Ching
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 64
中文關鍵詞: 二維裝箱問題大尺寸箱子深度強化學習Transformer
外文關鍵詞: 2D bin packing problem, Large-sized Bin, Deep Reinforcement Learning, Transformer
相關次數: 點閱:11下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,深度強化學習展現出解決在線裝箱問題的潛力,但大多數研究僅著重於動作空間較小的場景,動作空間僅有數十至數百,難以應對大尺寸箱子所帶來的決策挑戰。因此,本研究聚焦於大尺寸的二維裝箱環境,在最大箱子尺寸達 40×40,對應動作空間高達 1600 的環境中進行實驗,以驗證所提方法在大規模動作空間下的擴展性與泛用性。
    為了提升模型在位置結構上的感知能力,本研究提出了新的框架 PaT (Pa 代表 packing,T 代表 Transformer),引入 Transformer 架構作為結構性特徵萃取模組,以捕捉環境狀態與真實動作遮罩之間的結構性關係,讓模型能夠基於結構性特徵,提升學習效率並做出更準確的物品放置決策。
    在三種不同尺寸的裝箱環境中的實驗結果表明,相較於傳統啟發式以及 DRL 方法,PaT 在整體上展現良好的空間使用率,特別是在中大尺寸 (20×20 和 40×40) 環境中,分別為 87.2% 和 83.1%。另一方面,在與 CNN 架構的實驗比較中,PaT 採用的 Transformer 架構在中大型環境 (20×20 和 40×40) 分別帶來 2.8% 與 1.6% 的性能提升,但在小尺寸 (10×10) 環境中略為下降 0.3%,可能原因是 10×10 環境較簡單,無需複雜模型進行建模,而 Transformer 架構因結構較為複雜、參數量較多,較容易在小問題中出現過擬合,因此輸給結構較為簡單的 CNN 架構。
    除此之外,本研究也提出一種基於狀態填充和動作映射的模型泛化方法,使在大尺寸環境訓練的模型能夠轉移至小尺寸的箱子環境中執行,而無需重新訓練。

    Over the past few years, Deep Reinforcement Learning (DRL) has shown promise in addressing online bin packing problems. Nevertheless, most studies have focused primarily on scenarios with relatively small action spaces, typically ranging from tens to hundreds, which poses challenges when scaling to larger bin sizes. This study presents PaT (Pa is for packing, and T is for Transformer), Transformer-based DRL designed specifically for large-sized online two-dimensional bin packing problem (2D-BPP), with action spaces reaching up to 1600 (40×40).
    PaT incorporates a Transformer architecture to effectively capture structural relationships between environmental states and ground-truth action masks, enhancing model learning efficiency and decision-making accuracy.
    Experimental results across three environments (10×10, 20×20, and 40×40) demonstrate that PaT achieves high space utilization compared to both traditional heuristic methods and other DRL methods, especially in medium and large environments, reaching 87.2% and 83.1% in 20×20 and 40×40 settings, respectively. Moreover, compared to a CNN-based architecture, PaT’s Transformer model yields performance improvements of 2.8% and 1.6% in the 20×20 and 40×40 environments, while showing a slight decrease of 0.3% in the 10×10 environment, possibly due to the simpler problem structure that does not require complex modeling. The Transformer architecture is more complex and contains more parameters. As a result, it is more prone to overfitting in small problems, leading to worse performance compared to the simpler CNN architecture.
    Additionally, a generalization mechanism based on state padding and action mapping is proposed, allowing models trained on larger bins to perform effectively on smaller-sized bins without retraining.

    摘要 I SUMMARY III 致謝 VIII List of Tables XI List of Figures XII 1 Introduction 1 2 Background and Related Work 4 2.1 Bin Packing Problem 4 2.1.1 Offline and Online 4 2.1.2 Multidimensional Bin Packing Problems 5 2.1.2.1 One-Dimensional Bin Packing Problem (1D-BPP) 5 2.1.2.2 Two-Dimensional Bin Packing Problem (2D-BPP) 5 2.1.2.3 Three-Dimensional Bin Packing Problem (3D-BPP) 6 2.2 Action Mask 7 3 Proposed Method 9 3.1 State Space 11 3.2 Action Space 12 3.3 Reward Function 13 3.4 Network Design 14 3.4.1 Transformer Network 15 3.4.2 Actor, Critic and Mask Network 17 3.4.3 Hybrid Network (CNN with Transformer) 19 3.5 Action Mask 20 3.6 Loss Function 21 3.7 Algorithm 23 3.8 Generalization Mechanism 25 4 Experiment Evaluations 27 4.1 Data Generation 27 4.2 Environment Setup 27 4.3 Parameter Setting 28 4.4 Baseline 30 4.5 Simulation Result 31 4.5.1 Space Utilization Comparison Across Different Algorithms 32 4.5.2 Replacing Transformer with CNN 34 4.5.3 Space Utilization Comparison Across Different Reward Functions 35 4.5.4 Result for Generalization Mechanism 37 5 Conclusion and Future Work 40 Reference 42 Appendix 44 A.1 Reproducibility Problem with Transformer in PyTorch 44 A.2 Hybrid Network Design and Experiment Evaluations 44 A.3 Grid Search of Constant g in Replacement Method 47 A.4 Bin Packing Visualization Interface 49

    [1] M. R. Garey and D. S. Johnson, "Approximation Algorithms for Bin Packing Problems: A Survey," in Analysis and Design of Algorithms in Combinatorial Optimization, ch. 2, pp. 147-172, 1981.
    [2] H. I. Christensen, A. Khan, S. Pokutta, and P. Tetali, "Approximation and online algorithms for multidimensional bin packing: A survey," Comput. Sci. Rev., vol. 24, pp. 63-79, May. 2017.
    [3] O. Kundu, S. Dutta, and S. Kumar, "Deep-Pack: A Vision-Based 2D Online Bin Packing Algorithm with Deep Reinforcement Learning," in 2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), pp. 1-7, Oct. 2019.
    [4] H. Zhao, Q. She, C. Zhu, Y. Yang, and K. Xu, "Online 3D Bin Packing with Constrained Deep Reinforcement Learning," Proceedings of the AAAI Conference on Artificial Intelligence, pp. 741-749, 2021.
    [5] H. Xiong, C. Guo, J. Peng, K. Ding, W. Chen, X. Qiu, L. Bai, and J. Xu, "GOPT: Generalizable Online 3D Bin Packing via Transformer-Based Deep Reinforcement Learning," IEEE Robotics and Automation Letters, vol. 9, pp. 10335-10342, 2024.
    [6] M. Kang, H. Kee, Y. Park, J. Kim, J. Jeong, G. Cheon, J. Lee, and S. Oh, "Gradual Receptive Expansion Using Vision Transformer for Online 3D Bin Packing," IROS, pp. 7338-7343, 2024.
    [7] H. Yin, F. Chen, and H. He, "Solving Offline 3D Bin Packing Problem with Large-sized Bin via Two-stage Deep Reinforcement Learning," Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, pp. 2576-2578, 2024.
    [8] Y. Wu, E. Mansimov, R. B. Grosse, S. Liao, and J. Ba, "Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation," Advances in neural information processing systems, pp. 5279-5288, 2017.
    [9] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," CoRR, pp. 1-12, 2017.
    [10] M. Haouari and M. Serairi, "Heuristics for the variable sized bin-packing problem," Computers & Operations Research, vol. 36, no. 10, pp. 2877-2884, Oct. 2009.
    [11] B. S. Baker, "A new proof for the first-fit decreasing bin-packing algorithm," Journal of Algorithms, vol. 6, no. 1, pp. 49-70, Mar. 1985.
    [12] L. Epstein and R. van Stee, "Improved Results for a Memory Allocation Problem," Theory of Computing Systems, vol. 48, no. 1, pp. 79-92, 2009.
    [13] G. Dósa and J. Sgall, "Optimal Analysis of Best Fit Bin Packing," in Automata, Languages, and Programming, pp. 429-441, 2014.
    [14] B. S. Baker and J. S. Schwarz, "Shelf Algorithms for Two-Dimensional Packing Problems," Siam Journal on Computing, vol. 12, no. 3, pp. 508-525, 1983.
    [15] L. Wei, A. Lim, and W. Zhu, "A Skyline-Based Heuristic for the 2D Rectangular Strip Packing Problem," in Modern Approaches in Applied Intelligence, pp. 286-295, 2011.
    [16] H. Van Hasselt, A. Guez, and D. Silver, "Deep Reinforcement Learning with Double Q-Learning," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, no. 1, pp. 1-13, Sep. 2016.
    [17] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, and S. Gelly, "An image is worth 16x16 words: Transformers for image recognition at scale," ICLR, pp. 1-22, May. 2021.
    [18] S. Huang and S. Ontañón, "A Closer Look at Invalid Action Masking in Policy Gradient Algorithms," The International FLAIRS Conference Proceedings, vol. 35, pp. 1-10, May. 2022.
    [19] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention is all you need," Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6000-6010, 2017.
    [20] A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann, "Stable-Baselines3: Reliable Reinforcement Learning Implementations," Journal of Machine Learning Research, vol. 22, no. 268, pp. 1-8, 2021.

    無法下載圖示 校內:2030-08-19公開
    校外:2030-08-19公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE