成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	吳昀哲 Wu, Yun-Zhe
論文名稱：	近端策略優化之深度強化學習於金融科技現金運補決策最佳化之研究 Optimizing Cash Replenishment Decisions in FinTech Using Proximal Policy Optimization Based Deep Reinforcement Learning
指導教授：	李昇暾 Li, Sheng-Tun
學位類別：	碩士 Master
系所名稱：	管理學院 - 工業與資訊管理學系 Department of Industrial and Information Management
論文出版年：	2025
畢業學年度：	113
語文別：	中文
論文頁數：	57
中文關鍵詞：	金融科技、現金管理、車輛路徑問題、深度強化學習、近端策略優化
外文關鍵詞：	FinTech, Cash Management, Vehicle Routing Problem, Proximal Policy Optimization, Deep Reinforcement Learning
相關次數：	點閱：31 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在現代金融體系中，現金管理在確保銀行正常運營和提升客戶滿意度方面扮演著關鍵角色。各分行的現金需求具有高度的不確定性，受地區經濟活動、當地消費習慣、季節性波動和突發事件的影響。因此，銀行必須在滿足客戶現金需求的同時，防止各分行出現現金短缺或過量堆積，以避免服務質量下降和不必要的管理成本。隨著分行網絡和地理範圍的擴大，現金管理需求變得更加多樣化和複雜。
過去的銀行運補決策，大多利用人工經驗方式進行判斷及決策，此做法不僅考驗人工的決策能力，也進一步造成過多成本的花費，故本研究提出了一個基於近端策略優化的深度強化學習演算法解決方案，用於優化現金運補策略。利用銀行的真實歷史現金庫存數據，將每一間分行訓練一個強化學習代理，考慮日常現金餘額、運補金額和現金庫存上限等因素，讓代理可以優化其運補策略，改善現金運補決策，而後加入區域調度中心，透過訓練區域中心代理人接收各分行運補資訊，以最佳化整體決策。本研究將車輛路徑問題納入考量，目的在於可透過最有效的現金運輸路徑來最小化運輸成本。
實驗結果表明，多數分行的強化學習決策優於人工決策，且整體成本有顯著降低。加入區域調度中心的考量後，能有效減少不必要的運補動作，從而降低了運輸成本。研究結果預期將為銀行業在面對動態且不確定的金融環境中實現自動化和更高效的現金管理策略。

In modern banking, efficient cash management is essential for operational stability and customer satisfaction. Due to uncertainties from regional economies, consumer habits, and seasonal events, traditional cash replenishment decisions relying on manual judgment often lead to excessive costs and inefficiencies. This study proposes a deep reinforcement learning framework using Proximal Policy Optimization (PPO) to optimize cash replenishment strategies. Each bank branch is assigned an individual PPO agent trained on historical cash data, considering daily balances, replenishment actions, and inventory limits. A regional dispatch center further enhances decision-making by coordinating replenishment across branches using a centralized agent. To minimize transportation costs, the Vehicle Routing Problem (VRP) is incorporated to determine optimal cash delivery routes. Experimental results show that PPO-based agents outperform human decisions, significantly reducing overall costs and unnecessary actions. The proposed method offers a scalable and automated solution for intelligent cash management under uncertain financial environments.

摘要 I
Summary II
致謝 VII
目錄 IX
圖目錄 XI
表目錄 XII
第一章 緒論 1
1研究背景 1
2研究目的 2
3研究流程 3
第二章 文獻回顧 4
1 現金管理	4
2 強化學習	5
2.1 基本框架 5
2.2 深度Q學習(Deep Q-Network) 6
2.3 近端策略優化 7
2.4 強化學習於金融領域應用	9
3 時間序列	11
3.1 時間序列與強化學習	11
4 車輛路徑問題	13
5 小結	14
第三章 研究方法	15
1 個案描述與問題定義 15
2 研究框架	16
3 資料處理	17
4 車輛路徑問題	17
5 強化學習框架	17
5.1 狀態與動作空間的設定 18
5.2 獎勵函數設定 19
6 近端策略優化	21
7本章符號彙整	23
第四章 實驗設計與分析 25
1實驗設計	25
2資料集說明 25
3超參數設定 26
4 評估指標	27
5 實驗結果與分析 28
5.1 運鈔車輛路徑規劃 28
5.2分行運補及比較 29
5.3 區域調度中心運補 35
6 小結	37
第五章 結論與未來展望 38
1 結論	38
2 未來展望與研究限制 38
參考文獻 40
                                    

Allen, D. S. (1998). How closely do banks manage vault cash? Federal Reserve Bank of St. Louis Review, 80(July/August 1998).
Armenise, R., Birtolo, C., Sangianantoni, E., & Troiano, L. (2010, 7-10 Dec. 2010). A generative solution for ATM cash management. 2010 International Conference of Soft Computing and Pattern Recognition.
Baumol, W. J. (1952). The transactions demand for cash: An inventory theoretic approach. The Quarterly journal of economics, 66(4), 545-556.
Box, G. E., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time series analysis: forecasting and control. John Wiley & Sons.
Brownlee, J. (2018). Deep learning for time series forecasting: predict the future with MLPs, CNNs and LSTMs in Python. Machine Learning Mastery.
Dantzig, G. B., & Ramser, J. H. (1959). The truck dispatching problem. Management Science, 6(1), 80-91.
Deng, Y., Bao, F., Kong, Y., Ren, Z., & Dai, Q. (2016). Deep direct reinforcement learning for financial signal representation and trading. IEEE transactions on neural networks and learning systems, 28(3), 653-664.
Hubert, T., Schrittwieser, J., Antonoglou, I., Barekatain, M., Schmitt, S., & Silver, D. (2021). Learning and planning in complex action spaces. International Conference on Machine Learning,
Hyndman, R. J., & Athanasopoulos, G. (2018). Forecasting: principles and practice. OTexts.
Kim, J., Kang, M.-J., Lee, K., Moon, H., & Jeon, B.-K. (2023). Deep Reinforcement Learning for Asset Allocation: Reward Clipping. arXiv preprint arXiv:2301.05300.
Laporte, G. (2009). Fifty years of vehicle routing. Transportation science, 43(4), 408-416.
Liu, Q., Jiang, Z., Yang, H.-J., Khosravi, M., Waite, J. R., & Sarkar, S. (2025). HP3O: Hybrid-Policy Proximal Policy Optimization with Best Trajectory.
Liu, X., Han, L., Kang, L., Liu, J., & Miao, H. (2025). Preference learning based deep reinforcement learning for flexible job shop scheduling problem. Complex & Intelligent Systems, 11(2), 144.
Miller, M. H., & Orr, D. (1966). A model of the demand for money by firms. The Quarterly journal of economics, 80(3), 413-435.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533.
Moody, J., Wu, L., Liao, Y., & Saffell, M. (1998). Performance functions and reinforcement learning for trading systems and portfolios. Journal of forecasting, 17(5‐6), 441-470.
Moubariki, Z., Beljadid, L., Tirari, M. E. H., Kaicer, M., & Thami, R. O. H. (2019). Enhancing cash management using machine learning. 2019 1st international conference on smart systems and data science (ICSSD).
Nascimento, J., & Powell, W. (2010). Dynamic programming models and algorithms for the mutual fund cash balance problem. Management Science, 56(5), 801-815.
Nazari, M., Oroojlooy, A., Snyder, L., & Takác, M. (2018). Reinforcement learning for solving the vehicle routing problem. Advances in neural information processing systems, 31.
Orji, M., Omale, S., Kate, C., & Solomon, J. (2016). The Role of Liquidity and Profitability as a Tool for Effective Cash Management in Nigerian Commercial Banks. American Journal of Theoretical and Applied Business, 2(4), 38-45.
Peng, B., Wang, J., & Zhang, Z. (2020). A deep reinforcement learning algorithm using dynamic attention model for vehicle routing problems. Artificial Intelligence Algorithms and Applications: 11th International Symposium, ISICA 2019, Guangzhou, China, November 16–17, 2019, Revised Selected Papers 11,
Plaat, A. (2022). Model-Based Reinforcement Learning. In A. Plaat (Ed.), Deep Reinforcement Learning (pp. 135-167). Springer Nature Singapore.
Puterman, M. L. (1990). Chapter 8 Markov decision processes. In Handbooks in Operations Research and Management Science (Vol. 2, pp. 331-434). Elsevier.
Salas-Molina, F., & Rodríguez-Aguilar, J. A. (2018). Data-driven multiobjective decision-making in cash management. EURO Journal on Decision Processes, 6(1), 77-91.
Salas-Molina, F., Rodríguez-Aguilar, J. A., & Pla-Santamaria, D. (2018). Boundless multiobjective models for cash management. The Engineering Economist, 63(4), 363-381.
Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015). Trust region policy optimization. International conference on machine learning.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (Vol. 1). MIT press Cambridge.
Tan, C. B., Toledo, E., Ellis, B., Foerster, J. N., & Huszár, F. (2024). Beyond the Boundaries of Proximal Policy Optimization. arXiv preprint arXiv:2411.00666.
Thanh, B. T., Van Tuan, D., Chi, T. A., Van Dai, N., Dinh, N. T. Q., Thuy, N. T., & Hoa, N. T. X. (2023). Multiobjective Logistics Optimization for Automated ATM Cash Replenishment Process. International Conference on Intelligence of Things.
Wang, H.-n., Liu, N., Zhang, Y.-y., Feng, D.-w., Huang, F., Li, D.-s., & Zhang, Y.-m. (2020). Deep reinforcement learning: a survey. Frontiers of Information Technology & Electronic Engineering, 21(12), 1726-1744.
Zhang, J., Hu, R., Wang, Y.-J., Yang, Y.-Y., & Qian, B. (2023). Deep Reinforcement Learning for Solving Multi-objective Vehicle Routing Problem. International Conference on Intelligent Computing.
Zheng, Y., & Tu, K. (2023). A Robust Forecasting Framework for Multi-Series Cash Flow Prediction. 2023 6th International Conference on Information Communication and Signal Processing (ICICSP).

校內：2030-06-11公開
校外：2030-06-11公開電子論文尚未授權公開，紙本請查館藏目錄

簡易檢索 / 詳目顯示

相關論文