| 研究生: |
張崇一 Chong, Chong-Yi |
|---|---|
| 論文名稱: |
基於理由驅動的股票走勢預測:多模型整合與堆疊泛化方法 Rationale-Driven Predictions for Stock Movements: A Multi-Model Integration and Stack Generalization Approach |
| 指導教授: |
高宏宇
Kao, Hung-Yu |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2024 |
| 畢業學年度: | 112 |
| 語文別: | 英文 |
| 論文頁數: | 54 |
| 中文關鍵詞: | 自然語言處理 、股價漲跌預測 、大語言模型 |
| 外文關鍵詞: | Natural Network Processing, Stock Movement Prediction, Large Language Model |
| 相關次數: | 點閱:44 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著計算能力的進步,大型語言模型(LLMs)在各個領域迅速發展。然而,預測股價波動仍然是一個重大挑戰,主要挑戰在於以下兩方面:
1. 可解释性不足:传统方法通常缺乏可解释性,这使得人們难以理解模型预测股价上漲或下跌的原因。在金融市場這種波動性較大的領域中,由於人們做決策時往往需要依賴專業的判斷,因此模型提供的推理理由的品質有時甚至會比模型預測的準確度還要重要。
2. 模型預測偏好:模型產生推理理由(Rationale)時往往會偏向於某個標籤,造成模型最後所輸出的推理缺少參考意義。
為了應對這些挑戰,我們提出了一個新的框架。我們獨立訓練了兩個模型,TechGPT 和 SentiGPT,分別分析股價數據和社區平臺的文本資料。通過結合 TechGPT 和 SentiGPT 的輸出,我們開發了一個名為 IntegraGPT 的綜合模型。在訓練資料收集過程中,我們使用上下文學習(In-Context Learning)要求多個大型語言模型生成推理理由,避免依賴單一的大型語言模型進行推理,以此爲資料集進行模型的訓練,解決解釋性不足和模型預測偏好的問題。
With the advancement of computational power, large language models (LLMs) have rapidly developed in various fields. However, predicting stock price fluctuations remains a significant challenge, mainly due to the following two aspects:
1. Lack Interpretability: Traditional methods usually lack interpretability, making it difficult for people to understand why the model predicts a rise or fall in stock prices. In the highly volatile financial market, since people often need to rely on professional judgment when making decisions, the quality of the reasoning provided by the model can sometimes be even more important than the accuracy of the model's predictions.
2. Model prediction bias: When generating rationales, models tend to favor a certain label, causing the final output of the model to lack reference significance.
To address these challenges, we propose a new framework. We independently trained two models, TechGPT and SentiGPT, to analyze stock price data and text data from community platforms, respectively. By combining the outputs of TechGPT and SentiGPT, we developed a comprehensive model named IntegraGPT. During the training data collection process, we used In-Context Learning to require multiple large language models to generate reasoning rationales, avoiding reliance on a single large language model for reasoning. This approach addresses the issues of insufficient interpretability and model prediction bias.
[1]Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc., 2020.
[2]Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pretraining of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.
[3] Azul Garza, Cristian Challu, and Max Mergenthaler-Canseco. Timegpt-1, 2024.
[4] Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022.
[5] Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. Mistral 7b, 2023.
[6] Rupesh A. Kamble. Short and long term stock trend prediction using decision tree. In 2017 International Conference on Intelligent Computing and Control Systems (ICICCS), pages 1371–1375, 2017.
[7] Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault, editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online, July 2020. Association for Computational Linguistics.
[8] Xiang Li, Zhenyu Li, Chen Shi, Yong Xu, Qing Du, Mingkui Tan, Jun Huang, and Wei Lin. Alphafin: Benchmarking financial analysis with retrieval-augmented stock-chain framework, 2024.
[9] Alejandro Lopez-Lira and Yuehua Tang. Can chatgpt forecast stock price movements? return predictability and large language models, 2023.
[10] Wenjie Lu, Jiazheng Li, Yifan Li, Aijun Sun, and Jingyang Wang. A cnn-lstm-based model to forecast stock prices. Complexity, 2020:1–10, 2020.
[11] Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, and Jianfeng Gao. Large language models: A survey, 2024.
[12] David M. Q. Nelson, Adriano C. M. Pereira, and Renato A. de Oliveira. Stock market’s price movement prediction with lstm neural networks. In 2017 International Joint Conference on Neural Networks (IJCNN), pages 1419–1426, 2017.
[13] Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer, 2023.
[14] Akhter Mohiuddin Rather, Arun Agarwal, and V.N. Sastry. Recurrent neural network and a hybrid model for prediction of stock returns. Expert Systems with Applications, 42(6):3234–3241, 2015.
[15] Narges Tabari, Piyusha Biswas, Bhanu Praneeth, Armin Seyeditabari, Mirsad Hadzikadic, and Wlodek Zadrozny. Causality analysis of Twitter sentiments and stock market returns. In Udo Hahn, Véronique Hoste, and Ming-Feng Tsai, editors, Proceedings of the First Workshop on Economics and Natural Language Processing, pages 11–19, Melbourne, Australia, July 2018. Association for Computational Linguistics.
[16] Hanshuang Tong, Jun Li, Ning Wu, Ming Gong, Dongmei Zhang, and Qi Zhang. Ploutos: Towards interpretable stock movement prediction with financial large language model, 2024.
[17] Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, and Thomas Scialom. Llama 2: Open foundation and fine-tuned chat models, 2023.
[18] Liping Wang, Jiawei Li, Lifan Zhao, Zhizhuo Kou, Xiaohan Wang, Xinyi Zhu, Hao Wang, Yanyan Shen, and Lei Chen. Methods for acquiring and incorporating knowledge into stock price prediction: A survey, 2023.
[19] Saizhuo Wang, Hang Yuan, Leon Zhou, Lionel M. Ni, Heung-Yeung Shum, and Jian Guo. Alpha-gpt: Human-ai interactive alpha mining for quantitative investment, 2023.
[20] David H. Wolpert. Stacked generalization. Neural Networks, 5(2):241–259, 1992.
[21] Boyi Xie, Rebecca J. Passonneau, Leon Wu, and Germán G. Creamer. Semantic frames to predict stock price movement. In Hinrich Schuetze, Pascale Fung, and Massimo Poesio, editors, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 873–883, Sofia, Bulgaria, August 2013. Association for Computational Linguistics.
[22] Qianqian Xie, Weiguang Han, Xiao Zhang, Yanzhao Lai, Min Peng, Alejandro LopezLira, and Jimin Huang. Pixiu: A comprehensive benchmark, instruction dataset and large language model for finance. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 33469–33484. Curran Associates, Inc., 2023.
[23] Yumo Xu and Shay B. Cohen. Stock movement prediction from tweets and historical prices. In Iryna Gurevych and Yusuke Miyao, editors, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1970–1979, Melbourne, Australia, July 2018. Association for Computational Linguistics.
[24] Weizhe Yuan, Graham Neubig, and Pengfei Liu. Bartscore: Evaluating generated text as text generation. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 27263–27277. Curran Associates, Inc., 2021.