| 研究生: |
吳昀哲 Wu, Yun-Zhe |
|---|---|
| 論文名稱: |
近端策略優化之深度強化學習於金融科技現金運補決策最佳化之研究 Optimizing Cash Replenishment Decisions in FinTech Using Proximal Policy Optimization Based Deep Reinforcement Learning |
| 指導教授: |
李昇暾
Li, Sheng-Tun |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 工業與資訊管理學系 Department of Industrial and Information Management |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 中文 |
| 論文頁數: | 57 |
| 中文關鍵詞: | 金融科技 、現金管理 、車輛路徑問題 、深度強化學習 、近端策略優化 |
| 外文關鍵詞: | FinTech, Cash Management, Vehicle Routing Problem, Proximal Policy Optimization, Deep Reinforcement Learning |
| 相關次數: | 點閱:31 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在現代金融體系中,現金管理在確保銀行正常運營和提升客戶滿意度方面扮演著關鍵角色。各分行的現金需求具有高度的不確定性,受地區經濟活動、當地消費習慣、季節性波動和突發事件的影響。因此,銀行必須在滿足客戶現金需求的同時,防止各分行出現現金短缺或過量堆積,以避免服務質量下降和不必要的管理成本。隨著分行網絡和地理範圍的擴大,現金管理需求變得更加多樣化和複雜。
過去的銀行運補決策,大多利用人工經驗方式進行判斷及決策,此做法不僅考驗人工的決策能力,也進一步造成過多成本的花費,故本研究提出了一個基於近端策略優化的深度強化學習演算法解決方案,用於優化現金運補策略。利用銀行的真實歷史現金庫存數據,將每一間分行訓練一個強化學習代理,考慮日常現金餘額、運補金額和現金庫存上限等因素,讓代理可以優化其運補策略,改善現金運補決策,而後加入區域調度中心,透過訓練區域中心代理人接收各分行運補資訊,以最佳化整體決策。本研究將車輛路徑問題納入考量,目的在於可透過最有效的現金運輸路徑來最小化運輸成本。
實驗結果表明,多數分行的強化學習決策優於人工決策,且整體成本有顯著降低。加入區域調度中心的考量後,能有效減少不必要的運補動作,從而降低了運輸成本。研究結果預期將為銀行業在面對動態且不確定的金融環境中實現自動化和更高效的現金管理策略。
In modern banking, efficient cash management is essential for operational stability and customer satisfaction. Due to uncertainties from regional economies, consumer habits, and seasonal events, traditional cash replenishment decisions relying on manual judgment often lead to excessive costs and inefficiencies. This study proposes a deep reinforcement learning framework using Proximal Policy Optimization (PPO) to optimize cash replenishment strategies. Each bank branch is assigned an individual PPO agent trained on historical cash data, considering daily balances, replenishment actions, and inventory limits. A regional dispatch center further enhances decision-making by coordinating replenishment across branches using a centralized agent. To minimize transportation costs, the Vehicle Routing Problem (VRP) is incorporated to determine optimal cash delivery routes. Experimental results show that PPO-based agents outperform human decisions, significantly reducing overall costs and unnecessary actions. The proposed method offers a scalable and automated solution for intelligent cash management under uncertain financial environments.
Allen, D. S. (1998). How closely do banks manage vault cash? Federal Reserve Bank of St. Louis Review, 80(July/August 1998).
Armenise, R., Birtolo, C., Sangianantoni, E., & Troiano, L. (2010, 7-10 Dec. 2010). A generative solution for ATM cash management. 2010 International Conference of Soft Computing and Pattern Recognition.
Baumol, W. J. (1952). The transactions demand for cash: An inventory theoretic approach. The Quarterly journal of economics, 66(4), 545-556.
Box, G. E., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time series analysis: forecasting and control. John Wiley & Sons.
Brownlee, J. (2018). Deep learning for time series forecasting: predict the future with MLPs, CNNs and LSTMs in Python. Machine Learning Mastery.
Dantzig, G. B., & Ramser, J. H. (1959). The truck dispatching problem. Management Science, 6(1), 80-91.
Deng, Y., Bao, F., Kong, Y., Ren, Z., & Dai, Q. (2016). Deep direct reinforcement learning for financial signal representation and trading. IEEE transactions on neural networks and learning systems, 28(3), 653-664.
Hubert, T., Schrittwieser, J., Antonoglou, I., Barekatain, M., Schmitt, S., & Silver, D. (2021). Learning and planning in complex action spaces. International Conference on Machine Learning,
Hyndman, R. J., & Athanasopoulos, G. (2018). Forecasting: principles and practice. OTexts.
Kim, J., Kang, M.-J., Lee, K., Moon, H., & Jeon, B.-K. (2023). Deep Reinforcement Learning for Asset Allocation: Reward Clipping. arXiv preprint arXiv:2301.05300.
Laporte, G. (2009). Fifty years of vehicle routing. Transportation science, 43(4), 408-416.
Liu, Q., Jiang, Z., Yang, H.-J., Khosravi, M., Waite, J. R., & Sarkar, S. (2025). HP3O: Hybrid-Policy Proximal Policy Optimization with Best Trajectory.
Liu, X., Han, L., Kang, L., Liu, J., & Miao, H. (2025). Preference learning based deep reinforcement learning for flexible job shop scheduling problem. Complex & Intelligent Systems, 11(2), 144.
Miller, M. H., & Orr, D. (1966). A model of the demand for money by firms. The Quarterly journal of economics, 80(3), 413-435.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533.
Moody, J., Wu, L., Liao, Y., & Saffell, M. (1998). Performance functions and reinforcement learning for trading systems and portfolios. Journal of forecasting, 17(5‐6), 441-470.
Moubariki, Z., Beljadid, L., Tirari, M. E. H., Kaicer, M., & Thami, R. O. H. (2019). Enhancing cash management using machine learning. 2019 1st international conference on smart systems and data science (ICSSD).
Nascimento, J., & Powell, W. (2010). Dynamic programming models and algorithms for the mutual fund cash balance problem. Management Science, 56(5), 801-815.
Nazari, M., Oroojlooy, A., Snyder, L., & Takác, M. (2018). Reinforcement learning for solving the vehicle routing problem. Advances in neural information processing systems, 31.
Orji, M., Omale, S., Kate, C., & Solomon, J. (2016). The Role of Liquidity and Profitability as a Tool for Effective Cash Management in Nigerian Commercial Banks. American Journal of Theoretical and Applied Business, 2(4), 38-45.
Peng, B., Wang, J., & Zhang, Z. (2020). A deep reinforcement learning algorithm using dynamic attention model for vehicle routing problems. Artificial Intelligence Algorithms and Applications: 11th International Symposium, ISICA 2019, Guangzhou, China, November 16–17, 2019, Revised Selected Papers 11,
Plaat, A. (2022). Model-Based Reinforcement Learning. In A. Plaat (Ed.), Deep Reinforcement Learning (pp. 135-167). Springer Nature Singapore.
Puterman, M. L. (1990). Chapter 8 Markov decision processes. In Handbooks in Operations Research and Management Science (Vol. 2, pp. 331-434). Elsevier.
Salas-Molina, F., & Rodríguez-Aguilar, J. A. (2018). Data-driven multiobjective decision-making in cash management. EURO Journal on Decision Processes, 6(1), 77-91.
Salas-Molina, F., Rodríguez-Aguilar, J. A., & Pla-Santamaria, D. (2018). Boundless multiobjective models for cash management. The Engineering Economist, 63(4), 363-381.
Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015). Trust region policy optimization. International conference on machine learning.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (Vol. 1). MIT press Cambridge.
Tan, C. B., Toledo, E., Ellis, B., Foerster, J. N., & Huszár, F. (2024). Beyond the Boundaries of Proximal Policy Optimization. arXiv preprint arXiv:2411.00666.
Thanh, B. T., Van Tuan, D., Chi, T. A., Van Dai, N., Dinh, N. T. Q., Thuy, N. T., & Hoa, N. T. X. (2023). Multiobjective Logistics Optimization for Automated ATM Cash Replenishment Process. International Conference on Intelligence of Things.
Wang, H.-n., Liu, N., Zhang, Y.-y., Feng, D.-w., Huang, F., Li, D.-s., & Zhang, Y.-m. (2020). Deep reinforcement learning: a survey. Frontiers of Information Technology & Electronic Engineering, 21(12), 1726-1744.
Zhang, J., Hu, R., Wang, Y.-J., Yang, Y.-Y., & Qian, B. (2023). Deep Reinforcement Learning for Solving Multi-objective Vehicle Routing Problem. International Conference on Intelligent Computing.
Zheng, Y., & Tu, K. (2023). A Robust Forecasting Framework for Multi-Series Cash Flow Prediction. 2023 6th International Conference on Information Communication and Signal Processing (ICICSP).
校內:2030-06-11公開