簡易檢索 / 詳目顯示

研究生: 林俊杰
Lin, Jun-Jie
論文名稱: 基於深度強化學習實現具有長短線交易策略的股票交易代理人
Realization of Stock Trading Agent with Long-term and Short-term Trading Strategies based on Deep Reinforcement Learning
指導教授: 陳朝鈞
Chen, Chao-Chun
共同指導教授: 洪敏雄
Hung, Min-Hsiung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 製造資訊與系統研究所
Institute of Manufacturing Information and Systems
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 72
中文關鍵詞: 深度強化學習量化投資交易代理人優化資本的投資回報視覺編碼器課程式方法長線交易策略短線交易策略
外文關鍵詞: Deep Reinforcement Learning, Quantitative Investment, Trading Agents, Optimizing Return on Investment of Capital, Vision Transformer, Course Public Methodology, Long-term Trading Strategy, Short-term Trading Strategy
相關次數: 點閱:595下載:280
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 量化投資技術的研發是金融界的熱點。深度強化學習作爲強化學習和深度學習的交叉領域,既能有強化學習的序貫決策,也具備深度學習的特徵提取能力,適用於決策型任務。深度強化學習代理人能夠在與動態環境的交互中從端到端的從環境狀態映射到動作決策行爲輸出。在股票交易過程中,使用深度強化學習搭建股票交易代理人可以取代投資者的角色進行股票投資交易。股票交易代理人具有學習股票投資交易策略的能力,並通過在股票市場中進行交易證明股票交易代理人具有戰勝市場實現優化資本的能力。本論文采用深度強化學習技術為核心的股票交易代理人,再搭配前沿的視覺編碼器神經網絡架構擬合策略函數。深度強化學習用於解析股票狀態資訊特徵,這能提高對於環境的理解。同時,使用課程式方法設計模擬股票市場環境,可以激勵股票交易代理人的投資交易策略具有複雜動作行爲。對本論文搭建的股票交易代理人的投資績效進行回溯評估分析,結果顯示股票交易代理人的投資交易策略能夠實現盈利並優於市場的表現。此外,交易代理人針對股票投資組合能夠同時具有長線交易策略和短線交易策略。

    The development of quantitative investment techniques is a hot topic in the financial industry. Deep reinforcement learning is the intersection of reinforcement learning and deep learning. It has both the sequential decision-making capability of reinforcement learning and the feature extraction capability of deep learning, and is suitable for decision-making tasks. Reinforced deep learning agents can map the environment's state to the output of action decision behaviors from end-to-end in interactions with dynamic environments. The use of deep reinforcement learning to build stock trading agents can take over the role of investors in the stock trading process. Equity trading agents can learn equity investment strategies and demonstrate their ability to outperform the market by trading in the equity markets to achieve capital optimization. This paper uses deep reinforcement learning techniques as the core stock trading agent and the cutting-edge Vision Transformer neural network architecture to formulate strategy functions. Deep reinforcement learning is used to parse stock status information features, which can improve understanding of the environment. At the same time, using a programmatic approach to the design of simulated stock market environments can stimulate complex behaviors in stock trading agents' investment strategies. A retrospective analysis of the investment performance of the equity trading agents constructed for this thesis showed that the investment trading strategies of the equity trading agents were profitable and outperformed the market. In addition, the equity portfolio can have both a long-term trading style and a short-term trading style.

    摘要 I 致謝 VI 目錄 VII 表目錄 X 圖目錄 XI 第1章 緒論 1 1.1 研究背景 1 1.2 研究動機 3 1.3 研究目的 3 1.4 研究限制 4 1.4.1 無市場影響性 4 1.4.2 成交價格的確定性 4 1.5 研究方法 5 第2章 文獻探討 7 2.1 股票市場 7 2.1.1 股票市場的微觀結構 7 2.1.2 股票市場的交易過程 9 2.2 量化投資 9 2.2.1 量化投資的概念 9 2.2.2 量化投資的獲利基礎 10 2.2.3 量化投資與主觀投資的區別 11 2.2.4 量化投資的風險控制 13 2.3 強化學習 14 2.4 深度強化學習 15 2.4.1 連續狀態空間離散動作空間 17 2.4.2 連續狀態空間連續動作空間 17 2.5 深度強化學習應用於股票投資交易的相關研究 19 2.5.1 機器學習在股票價格預測的相關應用 19 2.5.2 深度強化學習在股票交易決策的相關應用 21 2.5.3 深度強化學習與機器學習在量化投資股票的應用區別 23 第3章 深度強化學習代理人應用於股票投資交易問題分析 25 3.1 深度強化學習交易代理人在股票市場的應用方向 25 3.2 深度強化學習應用於股票投資交易的挑戰 25 3.3 深度強化學習交易代理人在股票投資交易任務的優化目標 26 第4章 股票交易代理人的核心功能設計 29 4.1 股票投資組合的選擇 29 4.2 股票狀態資訊設計 34 4.3 股票交易獎勵訊號設計 41 4.4 股票交易代理人折扣因子設計 42 4.5 股票交易代理人策略與動作設計 43 4.6 股票交易代理人神經網絡模型設計 48 4.7 模擬股票市場環境設計 53 第5章 股票交易代理人的投資效能評估 56 5.1 股票投資交易任務的時間設置 56 5.2 投資效能評估-量化回測指標 58 5.3 投資效能評估-累積收益率 60 5.4 投資效能評估-盈利次數概率 61 5.5 投資效能評估-最長持股天數統計 63 第6章 結論與未來研究方向 66 6.1 結論與貢獻 66 6.2 未來研究方向 68 參考文獻 69

    [1] Akhil Raj Azhikodan, Anvitha GK Bhat, and Mamatha V Jadhav. "Stock trading bot using deep reinforcement learning" In Innovations in Computer Science and Engineering, pp. 41–49, May 2019.
    [2] David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, Martin Riedmiller. "Deterministic Policy Gradient Algorithms" Proceedings of the 31st International Conference on Machine Learning, January 2014.
    [3] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" arXiv preprint arXiv:2010.11929, Oct 2020.
    [4] E. F. Fama. "Efficient Capital Markets: A Review of Theory and Empirical Work" The Journal of Finance, vol. 25, no. 2, pp. 383–417, May 1970.
    [5] Fujimoto, S., Hoof, H.V., & Meger, D. "Addressing Function Approximation Error in Actor-Critic Methods" arXiv preprint arXiv:1802.09477, Jun 2018.
    [6] Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor," International Conference on Machine Learning, Jan 2018.
    [7] Hasselt, H.V., Guez, A., & Silver, D. "Deep Reinforcement Learning with Double Q-Learning" arXiv preprint arXiv:1509.06461, Nov 2015.
    [8] Heess, N.M., Dhruva, T., Sriram, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y., Erez, T., Wang, Z., Eslami, S.M., Riedmiller, M.A., & Silver, D. "Emergence of Locomotion Behaviours in Rich Environments" arXiv preprint arXiv:1707.02286, Jul 2017.
    [9] Hongyang Yang, Xiao-Yang Liu, Shan Zhong, and Anwar Walid. "Deep reinforcement learning for automated stock trading: An ensemble strategy" Proceedings of the First ACM International Conference on AI in Finance, vol. 20, no. 31, pp. 1–8, 2020.
    [10] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N.M., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. "Continuous control with deep reinforcement learning" arXiv preprint arXiv:1509.02971, Jan 2016.
    [11] Lin Chen and Qiang Gao. "Application of deep reinforcement learning on automated stock trading". IEEE International Conference on Software Engineering and Service Science, vol. 25, no. 2, pp. 29–33, 2019.
    [12] Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T.P., Harley, T., Silver, D., & Kavukcuoglu, K. "Asynchronous Methods for Deep Reinforcement Learning" arXiv preprint arXiv:1602.01783, Jun 2018.
    [13] Mnih, V., Kavukcuoglu, K., Silver, D. et al. "Human-level control through deep reinforcement learning" Nature, vol. 518, pp. 529–533, January 2015.
    [14] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M.A. "Playing Atari with Deep Reinforcement Learning" arXiv preprint arXiv:1312.5602, Dec 2013.
    [15] Nicolas Heess, Greg Wayne, Yuval Tassa, Timothy Lillicrap, Martin Riedmiller, and David Silver. "Learning and transfer of modulated locomotor controllers" arXiv preprint arXiv:1610.05182, Apr 2016.
    [16] Robert J. Shille, "From Efficient Markets Theory to Behavioral Finance" Journal of Economic Perspectives, vol. 17, no. 1, pp. 83–104, Winter 2003.
    [17] Schubert, Erich.; Gertz, Michael. "Numerically stable parallel computation of (co-)variance" Association for Computing Machinery, pp. 10–18, July 2018.
    [18] Schulman, J., Levine, S., Abbeel, P., Jordan, M.I., & Moritz, P. "Trust Region Policy Optimization" Proceedings of the 32nd International Conference on Machine Learning, vol. 37, pp. 1889–1897, May 2015.
    [19] Schulman, J., Moritz, P., Levine, S., Jordan, M.I., & Abbeel, P. “High-Dimensional Continuous Control Using Generalized Advantage Estimation” arXiv preprint arXiv:1506.02438, Jun 2015.
    [20] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. “Proximal Policy Optimization Algorithms” arXiv preprint arXiv:1707.06347, Jul 2017.
    [21] Sutton, Richard S. and McAllester, David and Singh, Satinder and Mansour. "Policy Gradient Methods for Reinforcement Learning with Function Approximation" Proceedings of the 12th International Conference on Neural Information Processing Systems, vol. 7, pp. 1057–1063, Nov 1999.
    [22] Tucker, G., Bhupatiraju, S., Gu, S.S., Turner, R.E., Ghahramani, Z., & Levine, S. "The Mirage of Action-Dependent Baselines in Reinforcement Learning" arXiv preprint arXiv:1802.10031, Apr 2018.
    [23] Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., & Polosukhin, I. "Attention is All you Need" arXiv preprint arXiv:1706.03762, Jun 2017.
    [24] Wang, Z., Schaul, T., Hessel, M., Hasselt, H.V., Lanctot, M., & Freitas, N.D. "Dueling Network Architectures for Deep Reinforcement Learning" arXiv preprint arXiv:1511.06581, Nov 2015.
    [25] Werner F. M. De Bondt and Richard H. Thaler. "Further Evidence on Investor Overreaction and Stock Market Seasonality" The Journal of Finance, vol. 42, no. 3, pp. 557–581, Jul 1987.
    [26] WU Jia, WANG Chen, Lidong Xiong, and SUN Hongyong. "Quantitative trading on stock market based on deep reinforcement learning". IEEE International Joint Conference on Neural Networks, pp. 1–8, 2019.
    [27] Xing Wu, Haolei Chen, Jianjia Wang, Luigi Troiano, Vincenzo Loia, and Hamido Fujita. "Adaptive stock trading strategies with deep reinforcement learning methods". Information Sciences, Jul 2020.
    [28] Yu, C., Velu, A., Vinitsky, E., Wang, Y., Bayen, A.M., & Wu, Y. "The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games" arXiv preprint arXiv:2103.01955, Mar 2021.
    [29] Yue Deng, Feng Bao, Youyong Kong, Zhiquan Ren, and Qionghai Dai. "Deep direct reinforcement learning for financial signal representation and trading" IEEE transactions on neural networks and learning systems, vol. 28, no. 3, pp. 653–664, 2016.
    [30] 劉利, 何先平.「基於遺傳演算法和模糊決策樹的時間序列預測模型」. 電腦工程與設計, vol. 19, pp. 5044–5046, 2008.
    [31] 謝國強. 「基於支援向量回歸機的股票價格預測」. 電腦模擬, vol. 4, pp. 379–382, 2012.
    [32] 陳衛華. 「基於深度學習的上證綜指波動率預測效果比較研究」. 統計與資訊理論壇, vol. 33, pp. 99–106, 2018.
    [33] 張程, 周恬恬. 「基於百度指數和隨機森林的上證綜指預測」. 軟體, vol. 41, pp. 56–62, 2020.
    [34] 張玉川, 張作泉. 「支援向量機在股票價格預測中的應用」. 北京交通大學學報, vol. 6, pp. 73–76, 2007.
    [35] 歐陽紅兵, 黃亢, 閆洪舉. 「基於 LSTM 神經網路的金融時間序列預測」. 中國管理科學, vol. 28, pp. 27–35, 2020.
    [36] 王領, 胡揚. 「基於C4.5 決策樹的股票資料採擷」. 電腦與現代化, vol. 10, pp. 21–24, 2018.
    [37] 王淑燕, 曹正鳳, 陳銘芷. 「隨機森林在量化選股中的應用研究」. 運籌與管理, vol. 3, pp. 168–177, 2016.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE