| 研究生: |
詹韻玄 Chan, Yun-Hsuan |
|---|---|
| 論文名稱: |
整合情緒分析在國際之間金融指標的預測 Financial Indices Prediction through Integrating Sentiment Analysis with Factors of International Stock Markets |
| 指導教授: |
鄭順林
Jeng, Shuen-Lin |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 統計學系 Department of Statistics |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 英文 |
| 論文頁數: | 94 |
| 中文關鍵詞: | 國際股票市場 、技術分析 、情緒分析 、股票預測 |
| 外文關鍵詞: | International Stock Markets, Technical Analysis, Sentiment Analysis, Prediction, Classification |
| 相關次數: | 點閱:96 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
股票預測一直是金融時間序列的一大挑戰。近年來,社群媒體對於股票市場的影響備受矚目,如何透過情緒分析提取有效的新聞特徵成為未來的研究方向之一。除了新聞特徵的影響,股價仍然會受到其他指標的影響,如:市場特徵、技術特徵和經濟特徵等等。因此,本研究在綜合指數和個股的建模與預測上,同時考慮了市場特徵、技術特徵和新聞特徵。我們的目標是透過嘗試各種方法,在多個新聞來源中,建構出一個關鍵的新聞特徵提取方法。本研究實驗共分為兩個部分,分別為綜合指數的建模和資料探勘在個股的搜尋。在綜合指數建模的部分,我們同時加入了美國、中國、香港和台灣的綜合指數來建立四個統計模型,分別為向量自我回歸(Vector Autoregression, VAR)、向量誤差修正模型(Vector Error Correction Model, VECM)、最小絕對值收斂和選擇算子(Least Absolute Shrinkage and Selection Operator, LASSO)和多元適應性雲形迴歸(Multivariate Adaptive Regression Splines, MARS),並且在加入新聞特徵後,建立MARS模型和預測綜合指數漲跌時,達到接近90%的預測正確率,提升了超過10%的預測能力。另外,資料探勘搜尋的部分,我們使用了綜合指數建模的最佳方法,希望找出較好預測的個股,預測正確率需高於80%,且在加入新聞特徵後預測正確率提升超過9%。其中以美國的個股表現最好,在預測一個股時,最高達到88%的預測正確率。最後,我們驗證了結合新聞特徵能夠在預測股票時有更好的預測能力。
Stock market prediction is one of the major challenges in financial time series. Recently, the influence of social networks on the stock market has been brought to attention. How to extract effective news features through sentiment analysis turns into an appealing direction for future research. Apart from the impact of the news features, the stock prices are affected by other features such as basic market features, technical features, and economic features. Therefore, we not only apply both basic and technical features but also news features to predict the direction of integrated indices and individual stocks. We aim at looking through various methods to construct critical news features extraction methods from multiple news sources. Integrated indices modeling and data mining searching on individual stocks are two experiments in this study. In the part of integrated indices modeling, the factors of the international stock market relationships between the United States, China, Hong Kong, and Taiwan are considered in the model simultaneously. Four statistical models are used to fit all the features to predict market trends. They are Vector Autoregression (VAR), Vector Error Correction Model (VECM), Least Absolute Shrinkage and Selection Operator (LASSO), and Multivariate Adaptive Regression Splines (MARS). The highest predicted direction accuracy on integrated indices achieves nearly 90% which is improved more than 10% by joining news features through modeling MARS. Furthermore, we summarise the predicted direction accuracies of individual stocks which are more than 80% and are significantly improved by joining the news feature between the 200 individual stocks in multiple regions. The individual stocks in the US are predicted with the better predicted direction accuracy and the best one is nearly 90%. Finally, we suggest that the integration of the news could help to improve the stock prediction.
[1] Abdalla, I. S., and Murinde, V. Exchange rate and stock price interactions in emerging financial markets: evidence on India, Korea, Pakistan and the Philippines. Applied Financial Economics 7, 1 (1997), 25–35.
[2] Ahmad, K., Oliveira, P. C. F. d., Manomaisupat, P., Casey, M., and Taskaya, T. Description of events : An analysis of keywords and indexical names. In Proceedings of The Third International Conference on Language Resources and Evaluation (2002).
[3] Aiolfi, M., and Favero, C. A. Model uncertainty, thick modelling and the predictability of stock returns. Journal of Forecasting 24, 4 (2005), 233–254.
[4] Ballings, M., Van den Poel, D., Hespeels, N., and Gryp, R. Evaluating multiple classifiers for stock price direction prediction. Expert Systems with Applications 42, 20 (2015), 7046–7056.
[5] Barberis, N., Greenwood, R., Jin, L., and Shleifer, A. X-CAPM: An extrapolative capital asset pricing model. Journal of financial economics 115, 1 (2015), 1–24.
[6] Bekaert, G., and Hoerova, M. The VIX, the variance premium and stock market volatility. Journal of Econometrics 183, 2 (2014), 181–192.
[7] Bollen, J., Mao, H., and Zeng, X. Twitter mood predicts the stock market. Journal of Computational Science 2, 1 (2011), 1–8.
[8] Chang, C. T. On the construction and analysis of Chinese financial sentiment lexicon for financial news. Master’s thesis, Department of Computer Science University of Taipei, 2015.
[9] Chen, J. M. The construction and application of Chinese emotion word ontology. Master’s thesis, Dalian University of Technology, 2009.
[10] Chen, Y., and Hao, Y. A feature weighted support vector machine and K-nearest neighbor algorithm for stock market indices prediction. Expert Systems with Applications 80 (2017), 340–355.
[11] Cheng, R. L. A comparison of Taiwanese, Taiwan Mandarin, and Peking Mandarin. Language (1985), 352–377.
[12] Chowdhury, G. G. Introduction to modern information retrieval. Facet publishing, 2010.
[13] Deng, S., Mitsubuchi, T., Shioda, K., Shimada, T., and Sakurai, A. Combining technical analysis with sentiment analysis for stock price prediction. In 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing (2011), pp. 800–807.
[14] Enders, W. Applied econometric time series. new york: John wiley & sons. Inc (1995).
[15] Engle, R. F., and Granger, C. W. Cointegration and error correction: representation, estimation, and testing. Econometrica: Journal of the Econometric Society (1987), 251–276.
[16] Friedman, J., Hastie, T., and Tibshirani, R. The elements of statistical learning, vol. 1. Springer Series in Statistics New York, 2001.
[17] Friedman, J. H. Multivariate adaptive regression splines. The Annals of Statistics 19, 1 (1991), 1–67.
[18] Gomber, P., Kauffman, R. J., Parker, C., and Weber, B. W. On the fintech revolution: interpreting the forces of innovation, disruption, and transformation in financial services. Journal of Management Information Systems 35, 1 (2018), 220–265.
[19] Granger, C. W. Some properties of time series data and their use in econometric model specification. Journal of Econometrics 16, 1 (1981), 121–130.
[20] Hendrikse, R., Bassens, D., and Van Meeteren, M. The appleization of finance: Charting incumbent finance’s embrace of FinTech.
[21] Li, J., and Chen, W. Forecasting macroeconomic time series: LASSO-based approaches and their forecast combinations with dynamic factor models. International Journal of Forecasting 30, 4 (2014), 996–1015.
[22] Li, X., Xie, H., Chen, L., Wang, J., and Deng, X. News impact on stock price return via sentiment analysis. Knowledge-Based Systems 69 (2014), 14–23.
[23] Liu, B. Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies 5, 1 (2012), 1–167.
[24] Liu, C., Wang, J., Xiao, D., and Liang, Q. Forecasting S&P 500 stock index using statistical learning models. Open Journal of Statistics 6, 06 (2016), 1067.
[25] Long, W., Lu, Z., and Cui, L. Deep learning based feature engineering for stock price movement prediction. Knowledge-Based Systems 164 (2019), 163–173.
[26] Loughran, T., and McDonald, B. When is a liability not a liability? textual analysis, dictionaries, and 10-Ks. The Journal of Finance 66, 1 (2011), 35–65.
[27] Loughran, T., and McDonald, B. Textual analysis in accounting and finance: A survey. Journal of Accounting Research 54, 4 (2016), 1187–1230.
[28] Lu, C. J., Chang, C. H., Chen, C. Y., Chiu, C. C., and Lee, T. S. Stock index prediction: A comparison of MARS, BPN and SVR in an emerging market. In 2009 IEEE International Conference on Industrial Engineering and Engineering Management (2009), IEEE, pp. 2343–2347.
[29] Ma, F., Zhang, Y., Wahab, M., and Lai, X. The role of jumps in the agricultural futures market on forecasting stock market volatility: New evidence. Journal of Forecasting (2019).
[30] Nguyen, T. H., Shirai, K., and Velcin, J. Sentiment analysis on social media for stock movement prediction. Expert Systems with Applications 42, 24 (2015), 9603–9611.
[31] Nicholson, W. B., Matteson, D. S., and Bien, J. VARX-L: Structured regularization for large vector autoregressions with exogenous variables. International Journal of Forecasting 33, 3 (2017), 627–651.
[32] Oliveira, N., Cortez, P., and Areal, N. The impact of microblogging data for stock market prediction: Using twitter to predict returns, volatility, trading volume and survey sentiment indices. Expert Systems with Applications 73 (2017), 125–144.
[33] Patel, J., Shah, S., Thakkar, P., and Kotecha, K. Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques. Expert Systems with Applications 42, 1 (2015), 259–268.
[34] Peng, Y., and Jiang, H. Leverage financial news to predict stock price movements using word embeddings and deep neural networks. arXiv preprint arXiv:1506.07220 (2015).
[35] Pfaff, B. VAR, SVAR and SVEC models: Implementation within R package vars. Journal of Statistical Software 27, 4 (2008), 1–32.
[36] Ren, R., Wu, D. D., and Liu, T. Forecasting stock market movement direction using sentiment analysis and support vector machine. IEEE Systems Journal, 99 (2018), 1–11.
[37] Schumaker, R. P., and Chen, H. Textual analysis of stock market prediction using breaking financial news: The AZFin text system. ACM Transactions on Information Systems 27, 2 (2009), 12–19.
[38] Shynkevich, Y., McGinnity, T. M., Coleman, S. A., Belatreche, A., and Li, Y. Forecasting price movements using technical indicators: Investigating the impact of varying input window length. Neurocomputing 264 (2017), 71–88.
[39] Siao, J. S., Hwang, R. C., and Chu, C. K. Predicting recovery rates using logistic quantile regression with bounded outcomes. Quantitative Finance 16, 5 (2016), 777–792.
[40] Sims, C. A. Macroeconomics and reality. Econometrica: Journal of the Econometric Society (1980), 1–48.
[41] Su, C. H., Cheng, C. H., and Tsai, W. L. Fuzzy time series model based on fitting function for forecasting TAIEX index. International Journal of Hybrid Information Technology 6, 1 (2013), 111–122.
[42] Tibshirani, R. Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society: Series B 58, 1 (1996), 267–288.
[43] Tsai, C. F., and Hsiao, Y. C. Combining multiple feature selection methods for stock prediction: Union, intersection, and multi-intersection approaches. Decision Support Systems 50, 1 (2010), 258–269.
[44] Usman, M., Fatin, D. F., Barusman, M. Y. S., Elfaki, F. A. M., and Widiarti. Application of vector error correction model and impulse response function for analysis data index of farmers’ terms of trade. Indian Journal of Science and Technology 10, 19 (2017), 1–14.
[45] Wang, W. Y., and Hua, Z. A semiparametric gaussian copula regression model for predicting financial risks from earnings calls. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (2014), vol. 1, pp. 1155–1165.
[46] Weng, B., Lu, L., Wang, X., Megahed, F. M., and Martinez, W. Predicting short-term stock prices using ensemble methods and online data sources. Expert Systems with Applications 112 (2018), 258–273.
[47] Xiong, R., Nichols, E. P., and Shen, Y. Deep learning stock volatility with google domestic trends. arXiv preprint arXiv:1512.04916 (2015).
[48] Yen, S. M., and Chen, M. H. Open interest, volume, and volatility: evidence from Taiwan futures markets. Journal of Economics and Finance 34, 2 (2010), 113–141.
[49] Yu, L., Chen, H., Wang, S., and Lai, K. K. Evolving least squares support vector machines for stock market trend mining. IEEE Transactions on Evolutionary Computation 13, 1 (2008), 87–102.
[50] Zhang, J., Lai, Y., and Lin, J. The day-of-the-week effects of stock markets in different countries. Finance Research Letters 20 (2017), 47–62.
[51] Zhang, L., Fu, S., and Li, B. Research on stock price forecast based on news sentiment analysis—a case study of Alibaba. In Computational Science – ICCS 2018 (2018), pp. 429–442.
[52] Zivot, E., and Wang, J. Vector autoregressive models for multivariate time series. Modeling Financial Time Series with S-Plus® (2006), 385–429.
校內:2024-08-01公開