簡易檢索 / 詳目顯示

研究生: 藍振恩
Lan, Chen-En
論文名稱: TransLSTM-AR:結合 Transformer 與 LSTM 的自回歸混合模型應用於股價預測
TransLSTM-AR: A Hybrid Transformer-LSTM Autoregressive Model for Stock Price Prediction
指導教授: 陳牧言
Chen, Mu-Yen
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 74
中文關鍵詞: 股票預測時間序列預測長短期記憶深度學習
外文關鍵詞: Stock Prediction, Time Series Forecasting, Long Short-Term Memory, Deep Learning
相關次數: 點閱:21下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 股票價格預測是一項極具挑戰性的任務,主要原因在於金融市場本身高度波動、非線性且易受多種外部因素影響,如宏觀經濟數據、公司基本面變動、政治事件與市場情緒等。此外,股價序列往往呈現高度非線性與非平穩性,同時也存在高波動性與雜訊,這使得傳統的統計模型(如ARIMA、GARCH 等)難以捕捉潛藏的非線性動態與長期趨勢。本研究提出一種創新性混合模型TransLSTM-AR,結合Transformer編碼器與LSTM解碼器,以融合長期與短期的時間依賴建模能力,提升股票價格的預測精度與穩定性。在架構設計上,Transformer編碼器透過自注意力機制(Self-Attention)捕捉全局的時間序列模式,LSTM 解碼器則處理短期序列特徵與自迴歸式的預測任務。
    本研究以台積電(TSMC)與蘋果公司(APPL)的歷史股價資料進行實證分析,並透過 MAE、RMSE、MAPE、R² 等指標,並與其他九種模型做比較。在三組實驗中,TransLSTM-AR 在 TSMC 資料集中皆達最小誤差其 MAE 為 5.752,於 AAPL 前兩組實驗亦表現最佳;在五日預測中,其 MAE=3.222 與 MAPE=1.82% 仍為所有模型中最低。回測結果亦顯示 TransLSTM-AR 在實際應用中具備高度潛力。以 TSMC 為例,該模型於僅依預測結果交易時,報酬率為所有模型最高。整體而言,各模型在結合基因演算法(Genetic Algorithm , GA)後皆有報酬率提升,於 AAPL 資料,TransLSTM-AR 得到了43.38%的報酬率,是所有回測方法次高。
    綜合來看,TransLSTM-AR 有效融合 Transformer 的長依賴特徵提取能力與 LSTM 的時序記憶特性,不僅克服傳統模型侷限,亦在預測準確性與投資回測表現上展現優勢,為金融預測與智慧投資策略提供一項具體且可靠的技術方案。

    Stock price prediction is a long-standing yet highly challenging task due to the inherent complexity of financial markets, which are influenced by a variety of uncertain factors such as macroeconomic indicators, corporate fundamentals, geopolitical events, and investor sentiment. These characteristics lead to nonlinearity, non-stationarity, high volatility, and significant noise in stock price time series, making accurate forecasting a difficult endeavor. Traditional statistical models such as ARIMA and GARCH often struggle to capture these complex patterns and long-term dependencies, especially in multi-step forecasting tasks. Recent advances in deep learning have opened new possibilities for modeling these dynamics, yet individual architectures like LSTM or Transformer still have their respective limitations.
    To address these challenges, this study proposes TransLSTM-AR, a novel hybrid model that combines a Transformer encoder with an LSTM decoder to leverage both global attention-based sequence modeling and localized temporal memory. The model was evaluated using historical stock data from TSMC and Apple Inc., and benchmarked against nine baseline models including conventional LSTM, GRU, Seq2Seq LSTM, and vanilla Transformer. The experimental results show that TransLSTM-AR consistently achieves the lowest prediction errors across MAE, RMSE, MAPE, and R², particularly in the TSMC dataset. In five-day forecasts, the model also maintains superior accuracy. Backtesting reveals that TransLSTM-AR produces the highest investment return under model-only strategies in TSMC, and performs robustly in AAPL with effective risk control. Additionally, incorporating Genetic Algorithm for parameter optimization further improves investment outcomes across all models, with Transformer-based models benefiting most. Overall, TransLSTM-AR provides a powerful and empirically validated solution for financial time series forecasting and intelligent trading.

    摘要 I Abstract II 誌謝 VII 目錄 VIII 表目錄 XII 圖目錄 XIII 第一章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的 2 1.3 章節摘要 3 第二章 文獻探討 4 2.1 基因演算法 4 2.1.1 GA 的基本原理與演化流程 4 2.1.2 GA 在金融與股價預測上的應用 4 2.2 統計模型與深度學習模型 5 2.2.1 自迴歸整合移動平均模型 5 2.2.2 長短期記憶網路 6 2.2.3 雙向長短期記憶網路 9 2.2.4 門控循環單元 9 2.2.5 Transformer 10 2.2.6 激勵函數 11 2.2.7 損失函數 12 2.2.8 Dropout 12 第三章 研究方法 13 3.1 研究架構 13 3.2 TransLSTM-AR模型架構 14 3.3 資料處理 20 3.3.1 資料前處理 20 3.3.2 正規標準化 20 3.3.3 滑動窗口 20 3.4 建立研究模型 21 3.4.1 實驗一:不同的encoder-decoder架構 21 3.4.2 實驗二:改以不同類型的RNN作為解碼器 22 3.4.3 實驗三:由單步預測改為五步 23 3.5 回測方法 23 3.5.1 Buy & Hold(買入持有策略) 23 3.5.2 RSI 交易訊號(常見技術分析策略) 24 3.5.3 GA優化後的RSI交易訊號 24 3.5.4 模型預測收盤價 29 3.5.5 GA優化的RSI搭配模型預測收盤價 29 第四章 實驗設計與結果分析 30 4.1 實驗環境及參數設定 30 4.2 實驗資料集 30 4.3 實驗參數 31 4.3.1 深度學習模型參數設定 31 4.3.2 統計學習模型參數設定 32 4.4 評估績效指標 32 4.5 實驗結果與討論 33 4.5.1 實驗一預測結果 33 4.5.2 實驗二預測結果 36 4.5.3 實驗三預測結果 39 4.5.4 回測結果 41 4.5.5 穩健性實驗 46 4.5.6 討論 51 第五章 結論與未來展望 52 5.1 結論 52 5.2 未來展望 53 參考文獻 55

    [1].Zhang, G., Patuwo, B. E., & Hu, M. Y. (1998). Forecasting with artificial neural networks:: The state of the art. International journal of forecasting, 14(1), 35-62.
    [2].Sowell, F. (1992). Modeling long-run behavior with the fractional ARIMA model. Journal of monetary economics, 29(2), 277-302.
    [3].Sims, C. A. (1980). Macroeconomics and reality. Econometrica: journal of the Econometric Society, 1-48.
    [4].Elman, J. L. (1990). Finding structure in time. Cognitive science, 14(2), 179-211.
    [5].Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
    [6].Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
    [7].Bai, S., Kolter, J. Z., & Koltun, V. (2018). An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271.
    [8].Wu, H., Xu, J., Wang, J., & Long, M. (2021). Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Advances in neural information processing systems, 34, 22419-22430.
    [9].Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
    [10].Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., & Zhang, W. (2021). Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. Proceedings of the AAAI Conference on Artificial Intelligence, 35(12), 11106–11115.
    [11].Lu, W., Li, J., Li, Y., Sun, A., & Wang, J. (2020). A CNN‐LSTM‐based model to forecast stock prices. Complexity, 2020(1), 6622927.
    [12].Chen, Q., Zhang, W., & Lou, Y. (2020). Forecasting stock prices using a hybrid deep learning model integrating attention mechanism, multi-layer perceptron, and bidirectional long-short term memory neural network. Ieee Access, 8, 117365-117376.
    [13].Goldberg, D. E. (1989). Genetic algorithms in search, optimization, and machine learning. Addison-Wesley.
    [14].Huang, J. Y., Tung, C. L., & Lin, W. Z. (2023). Using social network sentiment analysis and genetic algorithm to improve the stock prediction accuracy of the deep learning-based approach. International Journal of Computational Intelligence Systems, 16(1), 93.
    [15].Baek, H. (2024). A CNN-LSTM stock prediction model based on genetic algorithm optimization. Asia-Pacific Financial Markets, 31(2), 205-220.
    [16].Deac, G. A., & Iancu, D. T. (2023, May). Trading strategy hyper-parameter optimization using genetic algorithm. In 2023 24th International Conference on Control Systems and Computer Science (CSCS) (pp. 121-127). IEEE.
    [17].Wagenmakers, E. J., & Farrell, S. (2004). AIC model selection using Akaike weights. Psychonomic bulletin & review, 11, 192-196.
    [18].Vrieze, S. I. (2012). Model selection and psychological theory: a discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Psychological methods, 17(2), 228.
    [19].Ariyo, A. A., Adewumi, A. O., & Ayo, C. K. (2014, March). Stock price prediction using the ARIMA model. In 2014 UKSim-AMSS 16th international conference on computer modelling and simulation (pp. 106-112). IEEE.
    [20].He, Z., & Tao, H. (2018). Epidemiology and ARIMA model of positive-rate of influenza viruses among children in Wuhan, China: A nine-year retrospective study. International Journal of Infectious Diseases, 74, 61-70.
    [21].Somvanshi, V. K., Pandey, O. P., Agrawal, P. K., Kalanker, N. V., Prakash, M. R., & Chand, R. (2006). Modeling and prediction of rainfall using artificial neural network and ARIMA techniques. J. Ind. Geophys. Union, 10(2), 141-151.
    [22].Graves, A., Jaitly, N., & Mohamed, A. R. (2013, December). Hybrid speech recognition with deep bidirectional LSTM. In 2013 IEEE workshop on automatic speech recognition and understanding (pp. 273-278). IEEE.
    [23].Zazo, R., Lozano-Diez, A., Gonzalez-Dominguez, J., T. Toledano, D., & Gonzalez-Rodriguez, J. (2016). Language identification in short utterances using long short-term memory (LSTM) recurrent neural networks. PloS one, 11(1), e0146917.
    [24].Zhang, X., Liang, X., Zhiyuli, A., Zhang, S., Xu, R., & Wu, B. (2019, July). At-lstm: An attention-based lstm model for financial time series prediction. In IOP Conference Series: Materials Science and Engineering (Vol. 569, No. 5, p. 052037). IOP Publishing.
    [25].Karevan, Z., & Suykens, J. A. (2020). Transductive LSTM for time-series prediction: An application to weather forecasting. Neural Networks, 125, 1-9.
    [26].Fischer, T., & Krauss, C. (2018). Deep learning with long short-term memory networks for financial market predictions. European journal of operational research, 270(2), 654-669.
    [27].Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE transactions on Signal Processing, 45(11), 2673-2681.
    [28].Huang, Z., Xu, W., & Yu, K. (2015). Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991.
    [29].Fan, Y., Tang, Q., Guo, Y., & Wei, Y. (2024). BiLSTM-MLAM: A Multi-Scale Time Series Prediction Model for Sensor Data Based on Bi-LSTM and Local Attention Mechanisms. Sensors, 24(12), 3962.
    [30].Lee, M. C., Chang, J. W., Yeh, S. C., Chia, T. L., Liao, J. S., & Chen, X. M. (2022). Applying attention-based BiLSTM and technical indicators in the design and performance analysis of stock trading strategies. Neural computing and applications, 34(16), 13267-13279.
    [31].Lu, R., & Duan, Z. (2017). Bidirectional GRU for sound event detection. Detection and Classification of Acoustic Scenes and Events, 1-3.
    [32].Minh, D. L., Sadeghi-Niaraki, A., Huy, H. D., Min, K., & Moon, H. (2018). Deep learning approach for short-term stock trends prediction based on two-stream gated recurrent unit network. Ieee Access, 6, 55392-55404.
    [33].Wen, Q., Zhou, T., Zhang, C., Chen, W., Ma, Z., Yan, J., & Sun, L. (2022). Transformers in time series: A survey. arXiv preprint arXiv:2202.07125.
    [34].Han, J., & Moraga, C. (1995, June). The influence of the sigmoid function parameters on the speed of backpropagation learning. In International workshop on artificial neural networks (pp. 195-201). Berlin, Heidelberg: Springer Berlin Heidelberg.
    [35].Fan, E. (2000). Extended tanh-function method and its applications to nonlinear equations. Physics Letters A, 277(4-5), 212-218.
    [36].Agarap, A. F. (2018). Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375.
    [37].Christoffersen, P., & Jacobs, K. (2004). The importance of the loss function in option valuation. Journal of Financial Economics, 72(2), 291-318.
    [38].Mao, A., Mohri, M., & Zhong, Y. (2023, July). Cross-entropy loss functions: Theoretical analysis and applications. In International Conference on Machine Learning (pp. 23803-23828). PMLR.
    [39].Hodson, T. O. (2022). Root mean square error (RMSE) or mean absolute error (MAE): When to use them or not. Geoscientific Model Development Discussions, 2022, 1-10.
    [40].Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1), 1929-1958.
    [41].Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27.
    [42].Wilder, J. W. (1978). New concepts in technical trading systems. Greensboro, NC.
    [43].Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. nature, 323(6088), 533-536.
    [44].Gülmez, B. (2025). GA-Attention-Fuzzy-Stock-Net: An Optimized Neuro-Fuzzy System for Stock Market Price Prediction with Genetic Algorithm and Attention Mechanism. Heliyon.

    無法下載圖示 校內:2026-07-24公開
    校外:2026-07-24公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE