| 研究生: |
劉如祥 Liou, Ru-Siang |
|---|---|
| 論文名稱: |
應用新聞與市場資料預測股價之波動 Using Financial News and Market Data to Predict Stock volatility |
| 指導教授: |
王惠嘉
Wang, Hei-Chia |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2017 |
| 畢業學年度: | 105 |
| 語文別: | 中文 |
| 論文頁數: | 64 |
| 中文關鍵詞: | 股價波動預測 、中文特徵選取 、市場資料 、多核心學習演算法 |
| 外文關鍵詞: | Stock Price Prediction, Chinese Feature Selection, Market Data, Multiple-kernel Learning Algorithm |
| 相關次數: | 點閱:166 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著網際網路的快速發展,財經資訊網站的重要性逐漸超越傳統傳播媒體,許多投資人會藉由財經資訊網站觀看股市行情與閱讀財經新聞,網路資訊讓投資人便於瞭解市場狀況,但是過量的內容造成資訊過載的問題,使投資人難以快速而全面的瞭解市場狀況,透過文字探勘的相關技術應可解決此一問題。
運用文字探勘技術分析新聞資訊時,第一步需先對文字斷詞,本研究分別以兩種斷詞系統對新聞進行中文斷詞,斷詞之後為選取與股價波動相關的字詞,須先依市場行情為新聞標記意見分數,不當的標記結果會影響特徵選取的品質,致使後續的學習無法達到理想的預測結果。本研究設計不同的新聞標記方法,希望辨別出與股價波動最相關的字詞。
此外,過去利用文字探勘技術預測股價波動的相關研究中,大多只著重於財經新聞的分析,而忽略其他面向的股價影響因素,導致預測無法靈敏反應市場價格。而市場交易資料雖然對價格變化的反應靈敏,卻也容易因為雜訊讓預測產生誤差,因此本研究期望整合不同面向之財經資訊(包含新聞、技術指標及法人交易資訊),以市場交易資料,改善傳統文字分析方法的預測結果。另外,考慮到多面向財經資訊整合的困難性,本研究選擇多核心學習演算法,期望在整合學習的過程中,瞭解不同資料隱含的重要資訊,得到貼近現實股價波動的預測結果,實驗結果以Naïve Combination組合消息面及技術面特徵得到之RMSE表現最佳。
With the development of internet, there’s been a noticeable improvement in the financial website. Online information makes investors easier to understand the market, but excessive information makes it hard for them to understand the market quickly and comprehensively. Using text mining approaches to analyze information from news could solve the problem.
To preprocess news for text mining approaches, this thesis uses the two different Chinese word segmentation system. Next, the proposed method identifies the opinion polarity of the news since the improper identification will influence the quality of feature selection and even lower the accuracy of the prediction. This thesis designs three news opinion labeling methods with the expectation of finding the words related to the stock volatility.
The interaction between stock price and financial news has been widely analyzed by researchers. However, most previous work focus only on financial news, and ignore the analysis of how past market data can affect the future stock return. This thesis adds in the chip and technical factors source that based on the past market data. Although the market data is sensitive to stock volatility, error can easily occur because of the noise. To take a step forward, this thesis integrates information from both financial news and market data by means of Multiple-kernel Support Vector Regression. In the hope of understanding the hidden information and improving the accuracy of prediction on future stock return. The result shows that this study can predict the stock volatility with RMSE 0.802.
Barber, B., Lee, Y., Liu, Y., & Odean, T. (2004). Who Gains from Trade? Evidence from Taiwan. University of California at Berkeley, California, USA.
Bennett, J., Sias, R., & Starks, L. (2003). Greener Pastures and the Impact of Dynamic Institutional Preferences. Review of Financial Studies, 16(4), 1203-1238.
Cai, F., & Zheng, L. (2004). Institutional Trades and Stock Returns. Finance Research Letters, 1(3), 178-189.
Chen, K., & Bai, M. (1998). Unknown Word Detection for Chinese by a Corpus-based Learning Method. International Journal of Computational Linguistics and Chinese Language Processing, 3(1), 27-44.
Chen, K., & Liu, S. (1992). Word Identification for Mandarin Chinese Sentences. Proceedings of the 14th Conference on Computational Linguistics-Volume 1. Nantes, France: Association for Computational Linguistics.
Chen, K., & Ma, W. (2002). Unknown Word Extraction for Chinese Documents. Proceedings of the 19th International Conference on Computational Linguistics-Volume 1. Taipei, Taiwan: Association for Computational Linguistics.
Chong, T. T.-L., & Ng, W.-K. (2008). Technical analysis and the London stock exchange: testing the MACD and RSI rules using the FT30. Applied Economics Letters, 15(14), 1111-1114.
Cristianini, N., & Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, England, United Kingdom.
Dumais, S., & Chen, H. (2000). Hierarchical Classification of Web Content. Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Athens, Greece: ACM.
Fletcher, T., Hussain, Z., & Shawe-Taylor, J. (2010). Multiple Kernel Learning on the Limit Order Book. JMLR Workshop and Conference Proceedings 11. Cumberland Lodge, Windsor, UK: WAPA.
Forman, G. (2002). Choose Your Words Carefully: An Empirical Study of Feature Selection Metrics for Text Classification. European Conference on Principles of Data Mining and Knowledge Discovery. Helsinki, Finland: Springer.
Fung, G., Yu, J., & Lam, W. (2002). News Sensitive Stock Trend Prediction. Pacific-Asia Conference on Knowledge Discovery and Data Mining. Taipei, Taiwan: Springer.
Fung, G., Yu, J., & Lu, H. (2005). The Predicting Power of Textual Information on Financial Markets. IEEE Intelligent Informatics Bulletin, 5(1), 1-10.
Griffin, J., Harris, J., & Topaloglu, S. (2003). The Dynamics of Institutional and Individual Trading. The Journal of Finance, 58(6), 2285-2320.
Grossman,S., & Stiglitz, J. (1980). On the Impossibility of Informationally Efficient Market. The American Economic Review, 70(3), 393-408.
Hagenau, M., Liebmann, M., & Neumann, D. (2013). Automated News Reading: Stock Price Prediction Based on Financial News Using Context-Capturing Features. Decision Support Systems, 55(3), 685-697.
Kaya, M., & Karsligil, M. (2010). Stock Price Prediction Using Financial News Articles. Information and Financial Engineering (ICIFE), 2010 2nd IEEE International Conference on. Chengdu, China: IEEE.
Kyle, A. (1985). Continuous Auctions and Insider Trading. Econometrica: Journal of the Econometric Society, 53(6), 1315-1335.
Leinweber, D. (2011). Event-Driven Trading and the “New News”. Journal of Portfolio Management, 38(1), 110-114.
Li, X., Huang, X., Deng X., & Zhu S. (2014). Enhancing Quantitative Intra-Day Stock Return Prediction by Integrating Both Market News and Stock Prices Information. Neurocomputing, 142, 228-238.
Ma, W., & Chen, K. (2003). A Bottom-Up Merging Algorithm for Chinese Unknown Word Extraction. Proceedings of the Second SIGHAN Workshop on Chinese Language Processing-Volume 17. Sapporo, Japan: Association for Computational Linguistics.
Mittermayer, M. (2004). Forecasting Intraday Stock Price Trends With Text Mining Techniques. System Sciences (SS), 2004 37th Annual Hawaii International Conference on. Washington, DC, USA: IEEE.
Nofsinger, J., & Sias, R. (1999). Herding and Feedback Trading by Institutional and Individual Investors. The Journal of Finance, 54(6), 2263-2295.
Richards, A. (2005). Big Fish in Small Ponds: The Trading Behavior and Price Impact of Foreign Investors in Asian Emerging Equity Markets. Journal of Financial and Quantitative Analysis, 40(1), 1-27.
Salton, G., Wong, A., & Yang, C. (1975). A Vector Space Model for Automatic Indexing. Communications of the ACM, 18(11), 613-620.
Schumaker, R., & Chen, H. (2006). Textual Analysis of Stock Market Prediction Using Financial News Articles. Paper Presented at the 12th Americas Conference on Information Systems (AMCIS-2006), Acapulco, Mexico.
Schumaker, R., & Chen, H. (2009a). A Quantitative Stock Prediction System Based on Financial News. Information Processing & Management, 45(5), 571-583.
Schumaker, R., & Chen, H. (2009b). Textual Analysis of Stock Market Prediction Using Breaking Financial News: The AZFin Text System. ACM Transactions on Information Systems (TOIS), 27(2), 12:1-12:19.
Schumaker, R., & Chen, H. (2010). A Discrete Stock Price Prediction Engine Based on Financial News. Computer, 43(1), 51-56.
Sias, R., Starks, L., & Titman, S. (2002). The Price Impact of Institutional Trading. The University of Texas at Austin, Texas, USA.
Sias, R., Starks, L., & Titman, S. (2006). Changes in Institutional Ownership and Stock Returns: Assessment and Methodology. The Journal of Business, 79(6), 2869-2910.
Tetlock, P. (2007). Giving Content to Investor Sentiment: The Role of Media in the Stock Market. The Journal of Finance, 62(3), 1139-1168.
Tetlock, P. (2011). All the News That's Fit to Reprint: Do Investors React to Stale Information? Review of Financial Studies, 24(5), 1481-1512.
Tetlock, P., SAAR-TSECHANSKY, M., & Macskassy, S. (2008). More Than Words: Quantifying Language to Measure Firms’ Fundamentals. The Journal of Finance, 63(3), 1437-1467.
Wang, F., Liu, L., & Dou, C. (2012). Stock Market Volatility Prediction: A Service-Oriented Multi-Kernel Learning Approach. Services Computing (SCC), 2012 IEEE Ninth International Conference on. Honolulu, Hawaii, USA: IEEE.
Wermers, R. (1999). Mutual Fund Herding and the Impact on Stock Prices. The Journal of Finance, 54(2), 581-622.
Yeh, C., Huang, C., & Lee, S. (2011). A Multiple-Kernel Support Vector Regression Approach for Stock Market Price Forecasting. Expert Systems with Applications, 38(3), 2177-2186.
Yixin, Z., & Zhang, J. (2010). Stock data analysis based on BP neural network. Communication Software and Networks (ICCSN), 2010 2nd International Conference on. Singapore, Singapore: IEEE.
林美珍、馬麗菁 (2002)。投資機構交易資訊與市場報酬之互動關係。證券市場發展季刊,14(3),113-143。
倪衍森、廖怡晴、黃寶玉 (2016)。技術指標之實證研究:以台灣 50 成分股為例。中華管理評論,19(1),1-30。
盧陽正、翁振益、方豪 (2008)。臺灣股市三大法人持股調整、群聚效應、回饋交易、串流行為與群聚之動量持續性。管理與系統,15(4),523-543。
李蕙欣 (2011)。結合多辭典與常識網路的情緒分析系統 (未出版之碩士論文)。國立臺灣大學資訊工程學系,台北市。
創市際雙週刊第十三期 (2014)。理財調查暨財經商務網站網友習性調查。From http://news.ixresearch.com/?p=7488
Jieba 斷詞工具。From
https://github.com/fxsjy/jieba
CMoney投資網誌 (2014)。用均線看出多空走勢,幫助我判斷進出場點。From
https://www.cmoney.tw/notes/note-detail.aspx?nid=15977
CMoney投資小學堂。每當外資大買,台股有超過 7 成機率會上漲,而且隔天繼續買的機率高達 74 %。From
http://www.cmoney.tw/learn/course/michelle/topic/665
校內:2022-07-10公開