成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	王博弘 Wang, Po-Hong
論文名稱：	運用深度強化式學習建置智慧型股票預測系統 An Intelligent Stock Pricing Prediction System by Using Deep Reinforcement Learning
指導教授：	陳牧言 Chen, Mu-Yen
學位類別：	碩士 Master
系所名稱：	工學院 - 工程科學系 Department of Engineering Science
論文出版年：	2022
畢業學年度：	110
語文別：	中文
論文頁數：	47
中文關鍵詞：	深度學習、強化學習、卷積神經網路、股價預測、智慧型系統
外文關鍵詞：	Deep Learning, Reinforcement Learning, Convolutional Neural Network, Stock prediction, Intelligent Transport System
相關次數：	點閱：197 下載：80
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

金融市場預測一直是人們感興趣的熱門課題之一，近期的許多研究使用深度學習進行股票市場的預測，也都各自獲得了不同的成果。以此為啟發，本研究使用深度強化學習(Deep Reinforcement Learning)結合卷積神經網路(Convolutional Neural Network，CNN)建立一個自動交易系統來模擬股票交易。利用卷積神經網路構成的深度強化學習代理人來預測股票買賣時機點，並使用深度強化學習演算法來訓練代理人。本研究使用並比較了Deep Q-learning (DQN)、D3QN (結合Double 與 Dueling Q-learning)、Noisy Net DQN、Multi-step DQN、Deep Deterministic Policy Gradient (DDPG)、Twin Delayed Deep Deterministic policy gradient (TD3)等多種深度強化學習演算法訓練模型交易股票。
本研究一共設計了二種不同數據集的實驗來測試代理人的表現，同時也結合了主成分分析等方法來提升代理人的表現。在進行強化學習模擬之前，也先針對模型特徵提取與預測能力做了簡單的測試，透過監督式學習的方式訓練代理人類神經網路預測股價的漲跌，藉此來驗證本研究設計的代理人類神經網路的可行性。隨後分別對SPDR 標準普爾500指數ETF(SPY)數據集，以及使用12支標準普爾500指數組成的個股數據集進行模擬。結果在使用DQN演算法訓練的代理人在SPY數據集可達到了最高240.5%的報酬率，另一方面Multi-step DQN在多個股數據集中也有平均13.03%的報酬率。同時也發現結合主成分分析可以讓模型表現更加穩定，並且在個股走勢更多元的多個股數據集中有著更好的表現。
本研究驗證了深度強化學習在股票預測上的潛力，提出一種可行的強化學習環境。在未來可以設計更貼合實務的模型，做為未來進一步發展人工智能用於金融市場發展的參考。

This research proposes a deep learning model, which employs Convolutional Neural Network (CNN) and deep reinforcement learning as a trading system. This research proposes a CNN architecture with a specifically ordered feature set to predict the stock trading strategy. Feature set is extracted using different indicators, price and temporal information. Then this research explores the potential of deep reinforcement learning to optimize stock trading strategy and thus maximizes investment return. This research trains a deep reinforcement learning agent and obtains an adaptive trading strategy. This research also compares the agents based on Deep Q-Learning, Deep Deterministic Policy Gradient (DDPG), Twin Delayed Deep Deterministic (TD3) policy gradient and related derived models.
The agent’s performance is evaluated and compared by two different datasets, include SPDR S&P 500 ETF (SPY), and some individual stocks in 2020, and the proposed deep reinforcement learning approach is shown to outperform in cumulative returns.
This research proves the potential of deep reinforcement learning in financial market prediction. And this research proposed a model to build a trading system which can be improved in future.

摘要I
致謝V
目次VI
表目錄VIII
圖目錄IX
第一章 緒論1
1 研究背景1
2 研究動機與目的2
3 研究架構2
第二章 文獻探討3
1 卷積神經網路3
1.1卷積層4
1.2 池化層4
1.3 全連接層5
2 強化學習6
2.1強化學習的數學表達6
2.2 Q-learning和Deep Q-learning演算法8
2.3 Double DQN 演算法11
2.4 Dueling DQN演算法13
2.5 Noisy Net DQN演算法13
2.6 Deep Deterministic Policy Gradient演算法14
2.7 Twin Delayed Deep Deterministic policy gradient演算法15
第三章 研究方法	18
1資料採樣與處理18
2強化學習代理人模型20
3強會學習環境20
4深度強化學習演算法架構21
4.1 DQN類演算法訓練流程22
4.2 D3QN演算法改良23
4.3 Noisy Net DQN演算法改良24
4.4 Multi-step DQN演算法改良25
4.5 DDPG演算法實踐26
4.6 TD3演算法改良27
第四章 實驗結果28
1實驗環境28
2實驗數據28
3代理模型監督式學習測試29
3.1數據集主成分分析降維結果展示30
3.2監督式學習實驗結果32
4深度強化學習實驗結果35
4.1評估模型方法35
4.2 SPY數據集實驗結果36
4.3多個股數據集實驗結果38
第五章 結論與未來展望41
1 結論41
2 未來展望42
參考文獻43
附錄146
                                    

[1] Y. Kara, M. A. Boyacioglu, and Ö. K. Baykan, “Predicting direction of stock price index movement using artificial neural networks and support vector machines: The sample of the Istanbul stock exchange,” Expert systems with Applications, vol. 38(5), pp. 5311–5319, 2011.
[2] J. Wang, and J. Wang, "Forecasting stock market indexes using principle component analysis and stochastic time effective neural networks," Neurocomputing, vol.156, pp. 68-78, 2015.
[3] K. Luckyson, S. Saha, and S.R. Dey, "Predicting the direction of stock market prices using random forest," arXiv preprint arXiv:1605.00003, 2016.
[4] D. Shubharthi, Y. Kumar, S. Saha, and S. Basak. 2016. "Forecasting to Classification: Predicting the Direction of Stock Market Price Using Xtreme Gradient Boosting. " Working Paper.
[5] X. Cai, S. Hu, and X. Lin, "Feature extraction using restricted Boltzmann machine for stock price prediction," 2012 IEEE International Conference on Computer Science and Automation Engineering (CSAE), vol. 3, pp. 80-83, 2012.
[6] K. Chen, Y. Zhou, and F. Dai, "A LSTM-based method for stock returns prediction: A case study of China stock market," 2015 IEEE international conference on big data, pp. 2823-2824, 2015.
[7] Y. Yang, J. Wang, and B. Wang, “Prediction model of energy market by long short term memory with random system and complexity evaluation,” Applied Soft Computing, vol.95, pp.2020.
[8] L.D. Persio, and O. Honchar, "Artificial neural networks architectures for stock price prediction: Comparisons and applications," International journal of circuits, systems and signal processing, vol.10, pp. 403-413, 2016.
[9] J. Eapen, D. Bein and A. Verma, "Novel Deep Learning Model with CNN and Bi-Directional LSTM for Improved Stock Market Index Prediction," 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0264-0270, 2019
[10] V. Andrés, and W. Kristjanpoller. "Gold volatility prediction using a CNN-LSTM approach." Expert Systems with Applications 157, pp.113481, 2020
[11] J. Bollen, M. Huina, and Z. Xiaojun. "Twitter mood predicts the stock market." Journal of computational science 2.1, pp. 1-8, 2011
[12] Xu, Yumo, and Shay B. Cohen. "Stock movement prediction from tweets and historical prices." Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018.
[13] E. Alpaydin, Introduction to machine learning. Summit Valley Press, fourth version, 2020.
[14] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller, "Playing atari with deep reinforcement learning," arXiv preprint arXiv: 1312.5602, 2013.
[15] X.Y. Liu, H. Yang, Q. Chen, R. Zhang, L. Yang, B. Xiao, and C.D. Wang, "FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance," arXiv preprint arXiv: 2011.09607, 2020.
[16] F. Ding, and C. Luo, “An Adaptive Financial Trading System Using Deep Reinforcement Learning With Candlestick Decomposing Features,” IEEE Access, vol.8, pp. 63666 – 63678, 2020.
[17] Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., and Keutzer, K. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size." arXiv preprint arXiv:1602.07360 , 2016.
[18] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 , 2014.
[19] He, K., Zhang, X., Ren, S., and Sun, J. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
[20] Hoseinzade, Ehsan, and Saman Haratizadeh. "CNNpred: CNN-based stock market prediction using a diverse set of variables." Expert Systems with Applications 129, pp. 273-285, 2019
[21] Selvin, S., Vinayakumar, R., Gopalakrishnan, E. A., Menon, V. K., and Soman, K. P. "Stock price prediction using LSTM, RNN and CNN-sliding window model." 2017 international conference on advances in computing, communications and informatics (icacci). IEEE, 2017.
[22] Gudelek, M. Ugur, S. Arda Boluk, and A. Murat Ozbayoglu. "A deep learning based stock trading model with 2-D CNN trend detection." 2017 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 2017.
[23] Lu, W., Li, J., Wang, J., and Qin, L. "A CNN-BiLSTM-AM method for stock price prediction." Neural Computing and Applications 33.10, pp. 4741-4753, 2021
[24] Lu, W., Li, J., Li, Y., Sun, A., and Wang, J. "A CNN-LSTM-based model to forecast stock prices." Complexity 2020, 2020.
[25] Kim, Taewook, and Ha Young Kim. "Forecasting stock prices with a feature fusion LSTM-CNN model using different representations of the same data." PloS one 14.2, e0212320, 2019.
[26] Eapen, Jithin, Doina Bein, and Abhishek Verma. "Novel deep learning model with CNN and bi-directional LSTM for improved stock market index prediction." 2019 IEEE 9th annual computing and communication workshop and conference (CCWC). IEEE, 2019.
[27] Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., and Hassabis, D. "Mastering the game of Go with deep neural networks and tree search." nature 529.7587, pp. 484-489, 2016.
[28] Ye, Y., Pei, H., Wang, B., Chen, P. Y., Zhu, Y., Xiao, J., and Li, B. "Reinforcement-learning based portfolio management with augmented asset movement prediction states." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34. No. 01. 2020.
[29] Deng, Y., Bao, F., Kong, Y., Ren, Z., and Dai, Q. "Deep direct reinforcement learning for financial signal representation and trading." IEEE transactions on neural networks and learning systems 28.3, pp. 653-664, 2016.
[30] Xu, H., Chai, L., Luo, Z., and Li, S. "Stock movement prediction via gated recurrent unit network based on reinforcement learning with incorporated attention mechanisms." Neurocomputing 467, pp. 214-228, 2022.
[31] Li, X., Li, Y., Zhan, Y., and Liu, X. Y. "Optimistic bull or pessimistic bear: Adaptive deep reinforcement learning for stock portfolio allocation." arXiv preprint arXiv:1907.01503, 2019.
[32] R.S. Sutton, and A.G. Barto, Reinforcement learning: An introduction. MIT Press, Cambridge, MA, 2018.
[33] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller, "Playing atari with deep reinforcement learning," arXiv preprint arXiv: 1312.5602, 2013.
[34] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., and Hassabis, D. “Human-level control through deep reinforcement learning,” Nature, vol. 518, pp. 529-533, 2015. https://doi.org/10.1038/nature14236
[35] Van Hasselt, Hado, Arthur Guez, and David Silver. "Deep reinforcement learning with double q-learning." Proceedings of the AAAI conference on artificial intelligence. Vol. 30. No. 1. 2016.
[36] Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., and Freitas, N. "Dueling network architectures for deep reinforcement learning." International conference on machine learning. PMLR, 2016.
[37] Plappert, M., Houthooft, R., Dhariwal, P., Sidor, S., Chen, R. Y., Chen, X., Tamim, A., Pieter, A., and Andrychowicz, M. "Parameter space noise for exploration." arXiv preprint arXiv:1706.01905, 2017.
[38] Fortunato, M., Azar, M. G., Piot, B., Menick, J., Osband, I., Graves, A., and Legg, S. "Noisy networks for exploration." arXiv preprint arXiv:1706.10295, 2017.
[39] Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and Riedmiller, M. "Deterministic policy gradient algorithms." International conference on machine learning. PMLR, 2014.
[40] Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., and Wierstra, D., et al. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971, 2015.
[41] Fujimoto, Scott, Herke Hoof, and David Meger. "Addressing function approximation error in actor-critic methods." International conference on machine learning. PMLR, 2018.

校內：立即公開
校外：立即公開

簡易檢索 / 詳目顯示

相關論文