| 研究生: |
林倇伃 Lin, Wan-Yu |
|---|---|
| 論文名稱: |
應用多目標強化學習於易腐性商品之生產存貨路徑規劃問題 Application of Multi-Objective Reinforcement Learning to Production Inventory Routing Problem for Perishable Goods |
| 指導教授: |
王泰裕
Wang, Tai-Yue |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 工業與資訊管理學系 Department of Industrial and Information Management |
| 論文出版年: | 2024 |
| 畢業學年度: | 112 |
| 語文別: | 中文 |
| 論文頁數: | 79 |
| 中文關鍵詞: | 易腐性商品 、生產存貨路徑規劃 、多目標強化學習 |
| 外文關鍵詞: | Multi-objective reinforcement learning, Production inventory routing problem, Perishable goods |
| 相關次數: | 點閱:45 下載:8 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究探討二階供應鏈單一供應商與多間同質零售商的生產與配送優化問題,供應商提供之產品種類為如生鮮蔬果、肉品等易腐性商品,不考慮原物料供貨以及延遲交貨和缺貨等特殊情境,下游之零售商其顧客需求為固定速率,研究中,上游供應商最佳化之決策包含決定生產批量、下游之配送運輸批量與配送路徑,考慮的目標包含最小化生產存貨與配送之總成本、碳排放量、交貨時間,以及最大化平均商品品質指標。首先,本研究建構易腐性商品生產存貨路徑規劃問題之混合整數規劃模型,並採用一種基於深度Q網路(Deep Q-Network, DQN)的多目標強化學習方法,將問題轉換為強化學習的系統環境,並利用DQN良好處理複雜狀態、動作空間的優良特性,以求得當前狀態下的最適生產、運輸批量與配送路徑;並透過加入決策價值和自定義優先權重,以提升DQN對於評估單一目標獎勵值的能力。在模型學習訓練完畢後,上述之方法與文獻方法和最佳化模型進行數值實驗,評估不同目標值、損失值、平均獎勵值。計算結果顯示,本研究所提方法在目標式為平均食品品質、交貨時間等微幅數值優化上較為敏銳,且在中大型規模情境亦具備良好的收斂能力和穩健性,故此方法具實際應用的價值,由敏感度分析亦能發現零售商個數對不同目標式而言皆有正向關係,時間期數對交貨時間的正影響性尤為強烈,運輸車輛數則相反,且部分因子間亦存在顯著的正交互作用,根據數值分析結果,建議易腐性商品供應商應針對實際營運情形事先權衡,以期獲得合理預算下的最佳參數組合。
The integrated production lot sizing and routing problem in a two-echelon with a single supplier and several retailers is investigated in this thesis. A perishable item is considered when the delay of delivery and shortage of raw material are assumed negligible. The demand from each retailer varied at each period but with a constant demand rate. The optimal production quantity at the supplier and delivery size to each retailer with their delivery routes are to be determined. Considered total cost of supplier, average food quality, carbon emissions and delivery time. Via a multi-objective deep Q-network (DQN) approach, a mixed-integer programming (MIP) model transformed into a dynamic system for the reinforcement learning framework. The proposed approach finds the optimal production and transportation quantities, and delivery route for each customer. Priority weights and value functions can be assigned by the decision maker so that different objectives can be prioritized in DQNs. Numerical experiments have been performed that the new approach enhancing food quality and delivery time, compared with Global Local Near-Neighbor Particle Swarm Optimization (GLNPSO) and MIP model. For large dataset, it appears that the convergence of the approach is talented, and applicability of the method is also promising. Sensitivity analysis indicates positive correlations between number of retailers and four objectives, with a significant effect of planning horizon on delivery time and negative correlation with vehicle numbers, suggesting practical parameter adjustments for perishable goods suppliers.
Abdelmaguid, T. F., & Dessouky, M. M. (2006). A genetic algorithm approach to the integrated inventory-distribution problem. International Journal of Production Research, 44(21), 4445-4464.
Absi, N., Archetti, C., Dauzère-Pérès, S., & Feillet, D. (2015). A two-phase iterative heuristic approach for the production routing problem. Transportation Science, 49(4), 784-795.
Adulyasak, Y., Cordeau, J. F., & Jans, R. (2014a). Formulations and branch-and-cut algorithms for multivehicle production and inventory routing problems. INFORMS Journal on Computing, 26(1), 103-120.
Adulyasak, Y., Cordeau, J. F., & Jans, R. (2014b). Optimization-based adaptive large neighborhood search for the production routing problem. Transportation Science, 48(1), 20-45.
Aminzadegan, S., Tamannaei, M., & Fazeli, M. (2021). An integrated production and transportation scheduling problem with order acceptance and resource allocation decisions. Applied Soft Computing, 112, 107770.
Amorim, P., Günther, H. O., & Almada-Lobo, B. (2012). Multi-objective integrated production and sdistribution planning of perishable products. International Journal of Production Economics, 138(1), 89-101.
Archetti, C., Bertazzi, L., Paletta, G., & Speranza, M. G. (2011). Analysis of the maximum level policy in a production-distribution system. Computers & Operations Research, 38(12), 1731-1746.
Armentano, V. A., Shiguemoto, A. L., & Løkketangen, A. (2011). Tabu search with path relinking for an integrated production–distribution problem. Computers & Operations Research, 38(8), 1199-1209.
Bard, J. F., & Nananukul, N. (2009). The integrated production–inventory–distribution–routing problem. Journal of Scheduling, 12, 257-280.
Billington, P. J., McClain, J. O., & Thomas, L. J. (1983). Mathematical programming approaches to capacity-constrained MRP systems: review, formulation and problem reduction. Management Science, 29(10), 1126-1141.
Boudia, M., Louly, M. A. O., & Prins, C. (2007). A reactive GRASP and path relinking for a combined production–distribution problem. Computers & Operations Research, 34(11), 3402-3419.
Boudia, M., & Prins, C. (2009). A memetic algorithm with dynamic population management for an integrated production–distribution problem. European Journal of Operational Research, 195(3), 703-715.
Braekers, K., Ramaekers, K., & Van Nieuwenhuyse, I. (2016). The vehicle routing problem: State of the art classification and review. Computers & industrial engineering, 99, 300-313.
Campbell, A., Clarke, L., Kleywegt, A., & Savelsbergh, M. (1998). The inventory routing problem. TG Crainic, G. Laporte, eds. Fleet Management and Logistics. Kluwer Academic Publishers Group, Boston, MA, 95, 113.
Chan, F. T., Wang, Z. X., Goswami, A., Singhania, A., & Tiwari, M. K. (2020). Multi-objective particle swarm optimisation based integrated production inventory routing planning for efficient perishable food logistics operations. International Journal of Production Research, 58(17), 5155-5174.
Chandra, P., & Fisher, M. L. (1994). Coordination of production and distribution planning. European Journal of Operational Research, 72(3), 503-517.
Coelho, L. C., Cordeau, J. F., & Laporte, G. (2012). Consistency in multi-vehicle inventory -routing. Transportation Research Part C: Emerging Technologies, 24, 270-287.
Cohen, M. A., & Lee, H. L. (1988). Strategic analysis of integrated production-distribution systems: models and methods. Operations Research, 36(2), 216-228.
Covert, R. P., & Philip, G. C. (1973). An EOQ model for items with Weibull distribution deterioration. AIIE transactions, 5(4), 323-326.
De Moor, B. J., Gijsbrechts, J., & Boute, R. N. (2022). Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management. European Journal of Operational Research, 301(2), 535-545.
Federgruen, A., & Simchi-Levi, D. (1995). Analysis of vehicle routing and inventory-routing problems. Handbooks in Operations Research and Management Science, 8, 297-373.
Ferguson, M., & Ketzenberg, M. E. (2006). Information sharing to improve retail product freshness of perishables. Production and Operations Management, 15(1), 57-73.
Fumero, F., & Vercellis, C. (1999). Synchronized development of production, inventory, and distribution schedules. Transportation Science, 33(3), 330-340.
Ghare, P. (1963). A model for exponentially decaying inventories. Journal of Industrial Engineering, 14, 238-243.
Ghasemkhani, A., Tavakkoli-Moghaddam, R., Rahimi, Y., Shahnejat-Bushehri, S., & Tavakkoli-Moghaddam, H. (2022). Integrated production-inventory-routing problem for multi-perishable products under uncertainty by meta-heuristic algorithms. International Journal of Production Research, 60(9), 2766-2786.
Ghiami, Y., Williams, T., & Wu, Y. (2013). A two-echelon inventory model for a deteriorating item with stock-dependent demand, partial backlogging and capacity constraints. European Journal of Operational Research, 231(3), 587-597.
Golsefidi, A. H., & Jokar, M. R. A. (2020). A robust optimization approach for the production-inventory-routing problem with simultaneous pickup and delivery. Computers & Industrial Engineering, 143, 106388.
Govindan, K., Jafarian, A., Khodaverdi, R., & Devika, K. (2014). Two-echelon multiple-vehicle location–routing problem with time windows for optimization of sustainable supply chain network of perishable food. International Journal of Production Economics, 152, 9-28.
Gunantara, N. (2018). A review of multi-objective optimization: Methods and its applications. Cogent Engineering, 5(1), 1502242.
Hayes, C. F., Rădulescu, R., Bargiacchi, E., Källström, J., Macfarlane, M., Reymond, M., ... & Roijers, D. M. (2022). A practical guide to multi-objective reinforcement learning and planning. Autonomous Agents and Multi-Agent Systems, 36(1), 26.
He, Z., Tran, K. P., Thomassey, S., Zeng, X., Xu, J., & Yi, C. (2022). Multi-objective optimization of the textile manufacturing process using deep-Q-network based multi-agent reinforcement learning. Journal of Manufacturing Systems, 62, 939-949.
Hsu, C. I., Hung, S. F., & Li, H. C. (2007). Vehicle routing problem with time-windows for perishable food delivery. Journal of Food Engineering, 80(2), 465-475.
Keeney, R. L., & Raiffa, H. (1993). Decisions with multiple objectives: preferences and value trade-offs. Cambridge university press.
Lei, L., Liu, S., Ruszczynski, A., & Park, S. (2006). On the integrated production, inventory, and distribution routing problem. IIE Transactions, 38(11), 955-970.
Li, T., Meng, Y., & Tang, L. (2023). Scheduling of continuous annealing with a multi-objective differential evolution algorithm based on deep reinforcement learning. IEEE Transactions on Automation Science and Engineering.
Li, Y., Chu, F., & Chen, K. (2016). Coordinated production inventory routing planning for perishable food. Ifac-Papersonline, 50(1), 4246-4251.
Li, Y., Chu, F., Feng, C., Chu, C., & Zhou, M. (2018). Integrated production inventory routing planning for intelligent food logistics systems. IEEE Transactions on Intelligent Transportation Systems, 20(3), 867-878.
Liu, P., Hendalianpour, A., Razmi, J., & Sangari, M. S. (2021). A solution algorithm for integrated production-inventory-routing of perishable goods with transshipment and uncertain demand. Complex & Intelligent Systems, 7, 1349-1365.
Liu, L., & Liu, S. (2020). Integrated production and distribution problem of perishable products with a minimum total order weighted delivery time. Mathematics, 8(2), 146.
Lodree Jr, E. J., & Uzochukwu, B. M. (2008). Production planning for a deteriorating item with stochastic demand and consumer choice. International Journal of Production Economics, 116(2), 219-232.
Luo, Q., Fan, Q., Deng, Q., Guo, X., Gong, G., & Liu, X. (2023). Solving bi-objective integrated scheduling problem of production, inventory and distribution using a modified NSGA-II. Expert Systems with Applications, 225, 120074.
Mousavi, R., Bashiri, M., & Nikzad, E. (2022). Stochastic production routing problem for perishable products: Modeling and a solution algorithm. Computers & Operations Research, 142, 105725.
Muckstadt, J. A. (2005). A Continuous Time, Multi-Echelon, Multi-Item System with Time-Based Service Level Constraints. Analysis and Algorithms for Service Parts Supply Chains, 109-138.
Nahmias, S. (1982). Perishable inventory theory: A review. Operations Research, 30(4), 680-708.
Raafat, F. (1991). Survey of literature on continuously deteriorating inventory models. Journal of the Operational Research Society, 42, 27-37.
Rahimi, M., Baboli, A., & Rekik, Y. (2014). A bi-objective inventory routing problem by considering customer satisfaction level in context of perishable product. In 2014 IEEE Symposium on Computational Intelligence in Production and Logistics Systems (CIPLS) (pp. 91-97). IEEE.
Rahimi, M., Baboli, A., & Rekik, Y. (2017). Inventory routing problem for perishable products by considering customer satisfaction and green criteria. In Dynamics in Logistics: Proceedings of the 5th International Conference LDIC, 2016 Bremen, Germany (pp. 445-455). Springer International Publishing.
Ruokokoski, M., Solyali, O. G. U. Z., Cordeau, J. F., Jans, R., & Süral, H. (2010). Efficient formulations and a branch-and-cut algorithm for a production-routing problem. GERAD.
Savelsbergh, M., & Song, J. H. (2008). An optimization algorithm for the inventory routing problem with continuous moves. Computers & Operations Research, 35(7), 2266-2282.
Selukar, M., Jain, P., & Kumar, T. (2022). Inventory control of multiple perishable goods using deep reinforcement learning for sustainable environment. Sustainable Energy Technologies and Assessments, 52, 102038.
Shaabani, H. (2022). A literature review of the perishable inventory routing problem. The Asian Journal of Shipping and Logistics, 38(3), 143-161.
Silver, E. A., Pyke, D. F., & Peterson, R. (1998). Inventory Management and Production Planning and Scheduling (Vol. 3, p. 30). New York: Wiley.
Solyalı, O., & Süral, H. (2009). A relaxation based solution approach for the inventory control and vehicle routing problem in vendor managed systems. In Modeling, Computation and Optimization (pp. 171-189).
Sutton, R. S., & Barto, A. G. (1998). Introduction to Reinforcement Learning (Vol. 135, pp. 223-260). Cambridge: MIT press.
Tajmajer, T. (2018). Modular multi-objective deep reinforcement learning with decision values. In 2018 Federated conference on computer science and information systems (FedCSIS) (pp. 85-93). IEEE.
Thomas, L. J., & McClain, J. O. (1993). An overview of production planning. Handbooks in Operations Research and Management Science, 4, 333-370.
Tsiros, M., & Heilman, C. M. (2005). The effect of expiration dates and perceived risk on purchasing behavior in grocery store perishable categories. Journal of Marketing, 69(2), 114-129.
Vahdani, B., Niaki, S. T. A., & Aslanzade, S. (2017). Production-inventory-routing coordination with capacity and time window constraints for perishable products: Heuristic and metaheuristic algorithms. Journal of Cleaner Production, 161, 598-618.
Van Moffaert, K., Drugan, M. M., & Nowé, A. (2013, April). Scalarized multi-objective reinforcement learning: Novel design techniques. In 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (pp. 191-199). IEEE.
Wagner, H. M., & Whitin, T. M. (1958). Dynamic version of the economic lot size model. Management Science, 5(1), 89-96.
Wang, Y., Liu, H., Zheng, W., Xia, Y., Li, Y., Chen, P., ... & Xie, H. (2019). Multi-objective workflow scheduling with deep-Q-network-based multi-agent reinforcement learning. IEEE access, 7, 39974-39982.
Watkins, C. J. C. H. (1989). Learning from delayed rewards. PhD Thesis, University of Cambridge, England.
Zipkin, P. H. (2000). Foundations of inventory management. McGraw-Hill.