| 研究生: |
林俊杰 Lin, Jun-Jie |
|---|---|
| 論文名稱: |
基於深度強化學習實現具有長短線交易策略的股票交易代理人 Realization of Stock Trading Agent with Long-term and Short-term Trading Strategies based on Deep Reinforcement Learning |
| 指導教授: |
陳朝鈞
Chen, Chao-Chun |
| 共同指導教授: |
洪敏雄
Hung, Min-Hsiung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 製造資訊與系統研究所 Institute of Manufacturing Information and Systems |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 中文 |
| 論文頁數: | 72 |
| 中文關鍵詞: | 深度強化學習 、量化投資 、交易代理人 、優化資本的投資回報 、視覺編碼器 、課程式方法 、長線交易策略 、短線交易策略 |
| 外文關鍵詞: | Deep Reinforcement Learning, Quantitative Investment, Trading Agents, Optimizing Return on Investment of Capital, Vision Transformer, Course Public Methodology, Long-term Trading Strategy, Short-term Trading Strategy |
| 相關次數: | 點閱:595 下載:280 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
量化投資技術的研發是金融界的熱點。深度強化學習作爲強化學習和深度學習的交叉領域,既能有強化學習的序貫決策,也具備深度學習的特徵提取能力,適用於決策型任務。深度強化學習代理人能夠在與動態環境的交互中從端到端的從環境狀態映射到動作決策行爲輸出。在股票交易過程中,使用深度強化學習搭建股票交易代理人可以取代投資者的角色進行股票投資交易。股票交易代理人具有學習股票投資交易策略的能力,並通過在股票市場中進行交易證明股票交易代理人具有戰勝市場實現優化資本的能力。本論文采用深度強化學習技術為核心的股票交易代理人,再搭配前沿的視覺編碼器神經網絡架構擬合策略函數。深度強化學習用於解析股票狀態資訊特徵,這能提高對於環境的理解。同時,使用課程式方法設計模擬股票市場環境,可以激勵股票交易代理人的投資交易策略具有複雜動作行爲。對本論文搭建的股票交易代理人的投資績效進行回溯評估分析,結果顯示股票交易代理人的投資交易策略能夠實現盈利並優於市場的表現。此外,交易代理人針對股票投資組合能夠同時具有長線交易策略和短線交易策略。
The development of quantitative investment techniques is a hot topic in the financial industry. Deep reinforcement learning is the intersection of reinforcement learning and deep learning. It has both the sequential decision-making capability of reinforcement learning and the feature extraction capability of deep learning, and is suitable for decision-making tasks. Reinforced deep learning agents can map the environment's state to the output of action decision behaviors from end-to-end in interactions with dynamic environments. The use of deep reinforcement learning to build stock trading agents can take over the role of investors in the stock trading process. Equity trading agents can learn equity investment strategies and demonstrate their ability to outperform the market by trading in the equity markets to achieve capital optimization. This paper uses deep reinforcement learning techniques as the core stock trading agent and the cutting-edge Vision Transformer neural network architecture to formulate strategy functions. Deep reinforcement learning is used to parse stock status information features, which can improve understanding of the environment. At the same time, using a programmatic approach to the design of simulated stock market environments can stimulate complex behaviors in stock trading agents' investment strategies. A retrospective analysis of the investment performance of the equity trading agents constructed for this thesis showed that the investment trading strategies of the equity trading agents were profitable and outperformed the market. In addition, the equity portfolio can have both a long-term trading style and a short-term trading style.
[1] Akhil Raj Azhikodan, Anvitha GK Bhat, and Mamatha V Jadhav. "Stock trading bot using deep reinforcement learning" In Innovations in Computer Science and Engineering, pp. 41–49, May 2019.
[2] David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, Martin Riedmiller. "Deterministic Policy Gradient Algorithms" Proceedings of the 31st International Conference on Machine Learning, January 2014.
[3] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" arXiv preprint arXiv:2010.11929, Oct 2020.
[4] E. F. Fama. "Efficient Capital Markets: A Review of Theory and Empirical Work" The Journal of Finance, vol. 25, no. 2, pp. 383–417, May 1970.
[5] Fujimoto, S., Hoof, H.V., & Meger, D. "Addressing Function Approximation Error in Actor-Critic Methods" arXiv preprint arXiv:1802.09477, Jun 2018.
[6] Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor," International Conference on Machine Learning, Jan 2018.
[7] Hasselt, H.V., Guez, A., & Silver, D. "Deep Reinforcement Learning with Double Q-Learning" arXiv preprint arXiv:1509.06461, Nov 2015.
[8] Heess, N.M., Dhruva, T., Sriram, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y., Erez, T., Wang, Z., Eslami, S.M., Riedmiller, M.A., & Silver, D. "Emergence of Locomotion Behaviours in Rich Environments" arXiv preprint arXiv:1707.02286, Jul 2017.
[9] Hongyang Yang, Xiao-Yang Liu, Shan Zhong, and Anwar Walid. "Deep reinforcement learning for automated stock trading: An ensemble strategy" Proceedings of the First ACM International Conference on AI in Finance, vol. 20, no. 31, pp. 1–8, 2020.
[10] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N.M., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. "Continuous control with deep reinforcement learning" arXiv preprint arXiv:1509.02971, Jan 2016.
[11] Lin Chen and Qiang Gao. "Application of deep reinforcement learning on automated stock trading". IEEE International Conference on Software Engineering and Service Science, vol. 25, no. 2, pp. 29–33, 2019.
[12] Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T.P., Harley, T., Silver, D., & Kavukcuoglu, K. "Asynchronous Methods for Deep Reinforcement Learning" arXiv preprint arXiv:1602.01783, Jun 2018.
[13] Mnih, V., Kavukcuoglu, K., Silver, D. et al. "Human-level control through deep reinforcement learning" Nature, vol. 518, pp. 529–533, January 2015.
[14] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M.A. "Playing Atari with Deep Reinforcement Learning" arXiv preprint arXiv:1312.5602, Dec 2013.
[15] Nicolas Heess, Greg Wayne, Yuval Tassa, Timothy Lillicrap, Martin Riedmiller, and David Silver. "Learning and transfer of modulated locomotor controllers" arXiv preprint arXiv:1610.05182, Apr 2016.
[16] Robert J. Shille, "From Efficient Markets Theory to Behavioral Finance" Journal of Economic Perspectives, vol. 17, no. 1, pp. 83–104, Winter 2003.
[17] Schubert, Erich.; Gertz, Michael. "Numerically stable parallel computation of (co-)variance" Association for Computing Machinery, pp. 10–18, July 2018.
[18] Schulman, J., Levine, S., Abbeel, P., Jordan, M.I., & Moritz, P. "Trust Region Policy Optimization" Proceedings of the 32nd International Conference on Machine Learning, vol. 37, pp. 1889–1897, May 2015.
[19] Schulman, J., Moritz, P., Levine, S., Jordan, M.I., & Abbeel, P. “High-Dimensional Continuous Control Using Generalized Advantage Estimation” arXiv preprint arXiv:1506.02438, Jun 2015.
[20] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. “Proximal Policy Optimization Algorithms” arXiv preprint arXiv:1707.06347, Jul 2017.
[21] Sutton, Richard S. and McAllester, David and Singh, Satinder and Mansour. "Policy Gradient Methods for Reinforcement Learning with Function Approximation" Proceedings of the 12th International Conference on Neural Information Processing Systems, vol. 7, pp. 1057–1063, Nov 1999.
[22] Tucker, G., Bhupatiraju, S., Gu, S.S., Turner, R.E., Ghahramani, Z., & Levine, S. "The Mirage of Action-Dependent Baselines in Reinforcement Learning" arXiv preprint arXiv:1802.10031, Apr 2018.
[23] Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., & Polosukhin, I. "Attention is All you Need" arXiv preprint arXiv:1706.03762, Jun 2017.
[24] Wang, Z., Schaul, T., Hessel, M., Hasselt, H.V., Lanctot, M., & Freitas, N.D. "Dueling Network Architectures for Deep Reinforcement Learning" arXiv preprint arXiv:1511.06581, Nov 2015.
[25] Werner F. M. De Bondt and Richard H. Thaler. "Further Evidence on Investor Overreaction and Stock Market Seasonality" The Journal of Finance, vol. 42, no. 3, pp. 557–581, Jul 1987.
[26] WU Jia, WANG Chen, Lidong Xiong, and SUN Hongyong. "Quantitative trading on stock market based on deep reinforcement learning". IEEE International Joint Conference on Neural Networks, pp. 1–8, 2019.
[27] Xing Wu, Haolei Chen, Jianjia Wang, Luigi Troiano, Vincenzo Loia, and Hamido Fujita. "Adaptive stock trading strategies with deep reinforcement learning methods". Information Sciences, Jul 2020.
[28] Yu, C., Velu, A., Vinitsky, E., Wang, Y., Bayen, A.M., & Wu, Y. "The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games" arXiv preprint arXiv:2103.01955, Mar 2021.
[29] Yue Deng, Feng Bao, Youyong Kong, Zhiquan Ren, and Qionghai Dai. "Deep direct reinforcement learning for financial signal representation and trading" IEEE transactions on neural networks and learning systems, vol. 28, no. 3, pp. 653–664, 2016.
[30] 劉利, 何先平.「基於遺傳演算法和模糊決策樹的時間序列預測模型」. 電腦工程與設計, vol. 19, pp. 5044–5046, 2008.
[31] 謝國強. 「基於支援向量回歸機的股票價格預測」. 電腦模擬, vol. 4, pp. 379–382, 2012.
[32] 陳衛華. 「基於深度學習的上證綜指波動率預測效果比較研究」. 統計與資訊理論壇, vol. 33, pp. 99–106, 2018.
[33] 張程, 周恬恬. 「基於百度指數和隨機森林的上證綜指預測」. 軟體, vol. 41, pp. 56–62, 2020.
[34] 張玉川, 張作泉. 「支援向量機在股票價格預測中的應用」. 北京交通大學學報, vol. 6, pp. 73–76, 2007.
[35] 歐陽紅兵, 黃亢, 閆洪舉. 「基於 LSTM 神經網路的金融時間序列預測」. 中國管理科學, vol. 28, pp. 27–35, 2020.
[36] 王領, 胡揚. 「基於C4.5 決策樹的股票資料採擷」. 電腦與現代化, vol. 10, pp. 21–24, 2018.
[37] 王淑燕, 曹正鳳, 陳銘芷. 「隨機森林在量化選股中的應用研究」. 運籌與管理, vol. 3, pp. 168–177, 2016.