簡易檢索 / 詳目顯示

研究生: 楊少鈞
Yang, Shao-Chun
論文名稱: 使用修改後Rainbow方法以優化股市交易強化學習-以NASDAQ美股為例
Using Modified Rainbow for Enhancing Reinforcement Learning for Stock Trading-NASDAQ’s Stocks as Examples
指導教授: 王宗一
Wang, Tzone-I
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 36
中文關鍵詞: 機器學習強化學習深度學習演算法交易Q-LearningRainbow
外文關鍵詞: Machine Learning, Reinforcement Learning, Deep Learning, Algorithmic trading, Q-Learning, Rainbow
相關次數: 點閱:146下載:27
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 股票市場上充斥了許多雜訊(Noise)與不確定性(Volatile),即便在交易領域有充足的相關知識,要使用人力來分析龐大的股票市場,也需要耗費相當可觀的資源,因此,以往的交易行為是以機器來分析訊息,最後由人做決策。隨著演算法的革新以及機器效能的快速地提升,過去因效能而被認定為不可行的解決方案,正在被大量地重新檢驗,使用深度強化學習(Deep Reinforcement Learning)進行演算法交易(Algorithmic trading)便是其中一個熱門項目。本論文首先介紹強化學習的先備知識-馬可夫決策過程(Markov Decision Process, MDP)與Q學習(Q-Learning),然後基於現實規則建立交易環境並且使用深度Q學習(Deep Q-Learning, DQN)來解決交易決策最佳化的問題。交易環境將使用強化學習對10支美國納茲達克交易所(NASDAQ)的成分股進行模擬交易。本論文套用了先進技術Rainbow中六種方法對DQN更進一步優化。除此之外也針對所定義的環境設計一種相容優化方法Sibling Search,基於此方法對Rainbow進行修改,提出Sibling Rainbow,實驗顯示Sibling Search在效能與收斂速度上皆能獲得提升,最後針對實驗結果進行討論。本論文做出以下的貢獻,1.針對交易任務設計馬可夫決策過程,往後的研究可基於此模型進行擴充。2.成功套用Q-Learning來學習自動化交易行為,在平均預測準確度達54%。3.提出一個增加資料使用率的方法Sibling Search並且比較使用後的差異。4.修改Rainbow中的優化方法並且增加Sibling Search方法,此為第一個增加優化方法後,Rainbow在股票交易的研究,往後的研究可以此成果作為延伸的基礎。實驗顯示,在本論文設定的環境下,即使在手續費高達0.25%的規則下進行交易,使用Sibling Search方法優化DQN後年報酬率能提升至11%。使用Rainbow中的六種優化方法在測試資料中一年(250個交易日)報酬率最少可達到43%,使用Sibling Rainbow的報酬率更可達到306%,本研究證明強化學習用在股市交易決策上能有優異的表現,並且使用Rainbow等優化技術後可以獲得額外效率的提升。

    Algorithmic trading in the stock market has being attracting significant commercial interests in financial industry, but it’s recognized as a challenging stochastic control problem. This paper establishes a trading environment based on the real trading rules of stock market and uses two techniques of reinforcement learning - Markov decision process (MDP) and Deep Q-Learning (DQN), which are enhanced by the six state-of-the-art optimization methods in Rainbow. In addition, a compatible optimization method called Sibling Search is designed for the defined environment and several experiments show that Sibling Search can improve performance significantly. Combined Rainbow with Sibling Search and it is a novel method called Sibling Rainbow. Using other optimization methods and Sibling Search in Sibling Rainbow, the environment learns and simulates transactions on 10 NASDAQ stocks as experiments. The experiments show that the environment, even with a high transaction tax of 0.25% in the testing data, the returns increased 11% after using Sibling Search. Using the original six methods in Rainbow, the returns can reach up to 43% in one year and using the Sibling Rainbow, the return in one year reaches up to 300%. This confirms reinforcement learning can be used for algorithmic trading in stock market and higher performance can be gained by using optimization in Rainbow and Sibling Rainbow. This study contributes in the field with followings: 1. Design the Markov Decision Process (MDP) for stock trading tasks, which can be easily extended for future researches. 2. Successfully apply Q-Learning to learn trading behavior with an average 54% hit rate in automatic trading. 3. Propose Sibling Search, a method to increase data usage, which is proved to enhance the performance of Q-Learning. 4. Increase the optimization method in Rainbow with Sibling Search, the first attempt to modify Rainbow for stock trading, which can be used as the baseline for future studies.

    摘要 i Extended Abstract ii SUMMERY ii INTRODUCTION iii MATERIALS AND METHODS iv RESULTS AND DISCUSSION vii CONCLUSION ix 誌謝 x 目錄 xi 表目錄 xiv 圖目錄 xv 一、緒論 1 1.1研究背景 1 1.2研究目的 3 1.3研究方法 3 1.3.1強化學習 3 1.3.2馬可夫決策過程 4 1.3.3強化學習 4 1.3.4 Q-學習(Q-Learning) 4 1.4研究貢獻 5 1.5論文架構 5 二、相關研究 6 三、理論-背景知識 8 3.1馬可夫決策過程 8 3.2 Q-學習 9 3.3 Rainbow 10 3.3.1 Double Q-Learning 10 3.3.2 Prioritized replay 10 3.3.3 Dueling networks 11 3.3.4 Multi-step learning 11 3.3.5 Distributional RL 12 3.3.6 Noisy DQN 13 四、實驗方法 14 4.1建立交易MDP 14 4.1.1基本假設 14 4.1.2資料準備 14 4.1.3定義狀態空間 14 4.1.4定義動作空間 15 4.1.5定義獎勵函數 16 4.2手足搜尋(Sibling Search) 18 4.2.1 Sibling Search 18 4.2.2 Sibling Rainbow 18 4.3演算法 19 五、實驗設計與結果 20 5.1 評估方法與結果 20 5.1.1評估方法 20 5.1.2 Sibling-Search 22 5.1.3 Sibling Rainbow 24 5.2 結果討論 28 六、結論與未來展望 29 6.1 結論 29 6.2 未來展望與建議 29 參考文獻 30 附錄一 各模型比較 33 附錄二 各模型平均比較 36

    [1] Abu-Mostafa YS, Atiya AF. Introduction to 1nancial forecasting. Applied Intelligence 1996;6:205–13.
    [2] Hall JW. Adaptive selection of US stocks with neural nets. In: Deboeck GJ, editor. Trading on the edge: neural, genetic, and fuzzy systems for chaotic 1nancial markets. New York: Wiley; 1994. p. 45–65.
    [3] DeCoster GP, Labys WC, Mitchell DW. Evidence of chaos in commodity futures prices. Journal of Futures Markets 1992;12:291–305.
    [4] Kim, K.-J., 2003. Financial time series forecasting using support vector machines. Neurocomputing 55 (1-2), 307–319.
    [5] Huang, W., Nakamori, Y., Wang, S.-Y., 2005. Forecasting stock market movement direction with support vector machine. Computers & Operations Research 32 (10), 2513–2522.
    [6] Kimoto, Takashi et al. “Stock market prediction system with modular neural networks.” 1990 IJCNN International Joint Conference on Neural Networks (1990): 1-6 vol.1.
    [7] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436– 444. doi:10.1038/nature14539.
    [8] Krauss, C., Do, X. A., Huck, N., 2017. Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500. European Journal of Operational Research 259 (2), 689–702.
    [9] Moody, J., Wu, L., Liao, Y., Saffell, M., 1998b. Performance functions and reinforcement learning for trading systems and portfolios. Journal of Forecasting 17 (56), 441–470.
    [10] Fischer, Thomas G., 2018. "Reinforcement learning in financial markets - a survey," FAU Discussion Papers in Economics 12/2018, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.
    [11] M. Hessel, J. Modayil, H. van Hasselt, T. Schaul, G. Ostrovski, W. Dabney, D. Horgan, B. Piot, M. Azar, and D. Silver, “Rainbow: Combining improvements in deep reinforcement learning,” 2017. eprint: arXiv:1710.02298.
    [12] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, p. 484, 2016.
    [13] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton et al., “Mastering the game of go without human knowledge,”Nature, vol. 550, no. 7676, p. 354, 2017.
    [14] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, p. 529, 2015.
    [15] G. Lample and D. S. Chaplot, “Playing fps games with deep reinforcement learning.” In: Proceedings of the Association for the Advancement of Artificial Intelligence, 2017, pp. 2140–2146.
    [16] Mao, Hongzi & Alizadeh, Mohammad & Menache, Ishai & Kandula, Srikanth. (2016). Resource Management with Deep Reinforcement Learning. 50-56. 10.1145/3005745.3005750.
    [17] Arel, C. Liu, T. Urbanik and A. G. Kohls, "Reinforcement learning-based multi-agent system for network traffic signal control," in IET Intelligent Transport Systems, vol. 4, no. 2, pp. 128-135, June 2010. doi: 10.1049/iet-its.2009.0070
    [18] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio, “Generative Adversarial Networks” In: Proceedings of the Neural Information Processing Systems, 2014.
    [19] Watkins, Christopher JCH and Dayan, Peter. Q-learning. Machine learning, 8(3-4):279–292, 1992.
    [20] Neuneier, R., 1996. Optimal asset allocation using adaptive dynamic programming. In: Advances in Neural Information Processing Systems. pp. 952–958.
    [21] Bertoluzzo, F., Corazza, M., 2012. Testing different reinforcement learning configurations for financial trading: Introduction and applications. Procedia Economics and Finance 3, 68–77.
    [22] Corazza, M., Bertoluzzo, F., 2014. Q-learning-based financial trading systems with applications. Social Science Research Network Working Paper Series, University Ca’ Foscari of Venice.
    [23] Jin, O., El-Saawy, H., 2016. Portfolio management using reinforcement learning. Working paper, Stanford University.
    [24] Lin, L.-J., 1992. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning 8 (3-4), 293–321.
    [25] Silver, D., 2015. Lecture 6: Value Function Approximation. http://www0.cs.ucl.ac.uk/staff/ d.silver/web/Teaching_files/FA.pdf.
    [26] Dempster, M. A., Payne, T. W., Romahi, Y., Thompson, G. W., 2001. Computational learning techniques for intraday FX trading using popular technical indicators. IEEE Transactions on Neural Networks 12 (4), 744–754.
    [27] Dempster, M. A. H., Romahi, Y. S., 2002. Intraday FX trading: An evolutionary reinforcement learning approach. In: Proceedings of the international conference on Intelligent Data Engineering and Automated Learning. Springer, Berlin, Heidelberg, pp. 347–358.
    [28] Gu, Y., Mabu, S., Yang, Y., Li, J., Hirasawa, K., 2011. Trading rules on stock markets using Genetic Network Programming-Sarsa learning with plural subroutines. In: Proceedings of the SICE annual conference. IEEE, pp. 143–148.
    [29] Lee, J. W., Park, J., Jangmin, O., Lee, J., Hong, E., 2007. A Multiagent Approach to Q-Learning for Daily Stock Trading. IEEE Transactions on Systems, Man, and Cybernetics, part A (systems and humans) 37 (6), 864–877.
    [30] Sherstov, A. A., Stone, P., 2004. Three automated stock-trading agents: A comparative study. In: Proceeedings of the international workshop on Agent-Mediated Electronic Commerce. Springer, Berlin, Heidelberg, pp. 173–187.
    [31] Cumming, J., Alrajeh, D., Dickens, L., 2015. An investigation into the use of reinforcement learning techniques within the algorithmic trading domain. Master’s thesis, Imperial College London.
    [32] Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J. 1986. Learning representations by back-propagating errors - Nature
    [33] van Hasselt, H. 2010. Double Q-learning. In Advances in Neural Information Processing Systems 23, 2613–2621.
    [34] Schaul, T.; Quan, J.; Antonoglou, I.; and Silver, D. 2015. Prioritized experience replay. In: Proceedings of the International Conference on Learning Representations.
    [35] Wang, Z.; Schaul, T.; Hessel, M.; van Hasselt, H.; Lanctot, M.; and de Freitas, N. 2016. Dueling network architectures for deep reinforcement learning. In Proceedings of The 33rd International Conference on Machine Learning, 1995– 2003.
    [36] Sutton, R. S. 1988. Learning to predict by the methods of temporal differences. Machine learning 3(1):9–44.
    [37] Bellemare, M. G.; Dabney, W.; and Munos, R. 2017. A distributional perspective on reinforcement learning. In: Proceedings of the International Conference on Machine Learning.
    [38] Fortunato, M.; Azar, M. G.; Piot, B.; Menick, J.; Osband, I.; Graves, A.; Mnih, V.; Munos, R.; Hassabis, D.; Pietquin, O.; Blundell, C.; and Legg, S. 2017. Noisy networks for exploration. CoRR abs/1706.10295.
    [39] Atsalakis, G. S., & Valavanis, K. P. (2009). Surveying stock market forecasting techniques–part ii: Soft computing methods. Expert Systems with Applications, 36 , 5932–5941.
    [40] Nicolas Huck, Pairs selection and outranking: An application to the S&P 100 index, European Journal of Operational Research, Volume 196, Issue 2, 2009, Pages 819-825, ISSN 0377-2217,
    [41] Nicolas Huck, Pairs trading and outranking: The multi-step-ahead forecasting case, European Journal of Operational Research, Volume 207, Issue 3, 2010, Pages 1702-1716

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE