研究生: |
陳麗娟 Chen, Li-Jyuan |
---|---|
論文名稱: |
有效探勘不同長度之部份週期樣式 Efficiently mining partial periodic patterns with different length |
指導教授: |
李昇暾
Li, Sheng-Tun |
學位類別: |
碩士 Master |
系所名稱: |
管理學院 - 工業與資訊管理學系碩士在職專班 Department of Industrial and Information Management (on the job class) |
論文出版年: | 2011 |
畢業學年度: | 99 |
語文別: | 英文 |
論文頁數: | 81 |
中文關鍵詞: | 資料探勘 、部分週期樣式 、刪減策略 、趨勢分析 |
外文關鍵詞: | data mining, partial periodic pattern, pruning strategy, trend analysis |
相關次數: | 點閱:95 下載:1 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
為了在不同類型的資料庫中發現有用的規則和樣式,最近資料探勘技術廣泛地被研究。一般應用在長期事件資料集的趨勢分析。部分週期樣式探勘的方法特別適合這類的分析,因為部分週期可以在一些位置中忽略事件發生與否。因此,與那些以全週期樣式探勘所發現的樣式相比,這種方法使更多的樣式可以被發現。過去,有學者提出一個著名的部分週期樣式探勘方法,其為最大子樣式法 (the max-subpattern mining algorithm),但在一個事件序列中,每一次執行它都只找出一個特定週期長度的部分週期樣式。在本研究中,為了同時找出在最大週期長度以內的全部部分周期樣式,我們提出一個有效率且有效的多部分週期樣式 (Multi-partial Periodic Pattern, MPP) 探勘演算法。為了有效地刪減不可能的候選樣式的數量,特別在所提出的演算法中使用了兩個架構,包括一棵上限樹 (upper-bound tree) 和一個距離表,及一些刪減策略,因此加速了從單一事件序列中發現部分週期樣式的執行效率。在實驗評估中,黃金價格的歷史資料庫被用來評估所提出的演算法的效能。結果顯示我們所提出的演算法在刪減不可能的候選樣式過程中有好的效果。所提出的演算法在不同參數設定下的執行效率也勝過最大子樣式法。
Data mining techniques have recently been widely studied to find useful rules and patterns in various types of databases. A common application is the trend analysis from a set of long-term event data. The partial periodic pattern mining approaches are especially suitable to this kind of analysis since a partial pattern may include uncared events in some positions. Thus, more patterns may be found in this way than those obtained by full periodic pattern mining. In the past, a famous partial periodic pattern mining approach called the max-subpattern mining algorithm was proposed, but it only found partial periodic patterns with a specific periodic length from an event sequence in a batch way. In this study, we propose an efficient and effective multi-partial periodic pattern mining (MPP) algorithm to discover all partial periodic patterns within a pre-defined maximum interesting periodic length at the same time. In particular, two structures including an upper-bound tree and a distance table, and some pruning strategies, are used in the proposed algorithm to effectively prune the number of unpromising candidate patterns for mining, thus speeding up execution efficiency in finding partial periodic patterns from a single event sequence. In the experimental evaluation, a set of historic gold price data is used to evaluate the performance of the proposed algorithm. The results show the proposed algorithm has a good effect in pruning unpromising candidate patterns. The proposed algorithm also outperforms the max-subpattern mining algorithm in execution efficiency under different parameter settings.
Agrawal, R., Imielinski, T., Swami, A., 1993, Mining association rules between sets of items in large databases, In: Special Interest Group on Management Of Data Conference, 207-216.
Agrawal, R., Srikant, R., 1994, Fast algorithms for mining association rules in large databases, In: Proceedings of the 20th International Conference on Very Large Data Bases, 487-499.
Agrawal, R., Srikant, R., 1995, Mining sequential patterns, In: Proceedings of The International Conference on Data Engineering, 3-14.
Atallah, M. J., Gwadera, R., Szpankowski, W., 2004, Detection of significant sets of episodes in event sequences, In: Proceedings of the 4th IEEE International Conference on Data Mining, 3-10.
Baker, S. A., Van Tassel R. C., 1985, Forecasting the price of gold: a fundamentalist approach, Atlantic Economic Journal, 13(4), 43-51.
Baumgarten, M., Büchner, A. G., Hughes, J. G., 2003, Tree growth based episode mining without candidate generation, In: Proceedings of the International Conference on Artificial Intelligence.
Berberidis, C., Vlahavas, I. P., Aref, W. G., Atallah, M. J., Elmagarmid, A. K., 2002a, On the discovery of weak periodicities in large time series, In: Proceedings of the 6th European Conference on Principles and Practice of Knowledge Discovery in Databases, 51–61.
Berberidis, C., Walid, A. G., Atallah, M., Vlahavas, I., Elmagarmid, A. K., 2002b. Multiple and partial periodicity mining in time series databases, In: Proceedings of the 15th European Conference on Artificial Intelligence.
Casas-Garriga, G., 2003, Discovering unbounded episodes in sequential data. In: Proceedings of the 7th European Conference on Principles of Data Mining and Knowledge Discovery, 83-94.
Elfeky, M. G., Aref, W. G., Elmagarmid, A. K., 2004. Using convolution to mine obscure periodic patterns in one pass, In: Proceedings of the Ninth International Conference on Extending Database Technology, 605-620.
Elfeky, M. G., Aref, W. G., Elmagarmid, A. K., 2005, Periodicity detection in time series databases, IEEE Transactions on Knowledge and Data Engineering, 17(7), 875-887.
Grudnitski, G., Osburn, L., 1993, Forecasting S&P and gold futures prices: an application of neural networks, The Journal of Futures Markets, 631-643.
Han, J., Dong, G., and Yin, Y., 1999, Efficient minging of partial periodic patterns in time series database. In: Proceedings of Fifteenth International Conference on Data Engineering. IEEE Computer Society, Sydney, Australia, 106-115.
Han, J., Pei, H., Yin, Y., 2000, Mining frequent patterns without candidate generation, In: Proceedings of the Management of Data Conference.
Han, J., Kamber, M., 2001, Data mining: concepts and techniques, Morgan Kaufmann, San Francisco, CA.
Hand, D., Mannila, H., Smyth, P., 2001, Principles of data mining, MIT Press, MA.
Huang, K. Y., Chang, C. H., 2008, Efficient mining of frequent episodes from complex sequences, Information Systems, 33(1), 96-114.
Deng, J. P., 2004, An Application of Neural Network for Finding the Important Factors in Price Prediction, Master's academic dissertation, YuDa Institute of Technology.
Katoh, T., Hirata, K., Arimura, H., Yokoyama, S., Matsuoka, K., 2009, Extracting sequential episodes representing replacements of bacteria from bacterial culture data, In: IEEE International Conference on Multimedia and Expo, 1-4.
Kitov, I., 2009, Predicting gold ores price, Munich Personal RePEc Archive, Article 15873, Retrieved June 23, 2009, from http://ssrn.com/abstract=1409342.
Laxman, S., Sastry, P. S., Unnikrishnan, K. P., 2004, Fast algorithms for frequent episode discovery in event sequences, Technical Report CL-2004-04/MSR, GM R&D Center, Warren.
Laxman, S., Sastry, P. S., 2006, A survey of temporal data mining, The Indian Academy of Sciences, 31(2), 173-198.
Lin, D., Kedem, Z., 1998, Pincer-Search: A new algorithm for discovering the maximum frequent set, In: Proceedings of 6th Extending Database Technology.
Lo, S. C., Lin, C. C., Chuang, Y. C., 2008, Using support vector machine and sequential pattern mining to construct financial prediction model, In: Asia-Pacific Services Computing Conference, Yilan.
Mannila, H., Toivonen, H., Verkamo, A.I., 1995, Discovering frequent episodes in sequences, In: Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD'95), 210-215.
Mannila, H., Toivonen, H., Verkamo, A.I., 1997, Discovering frequent episodes in sequences, Data Mining and Knowledge Discovery, 1(3), 259-289
Mui, H. W., Chu, C. W., 1993, Forecasting the spot price of gold: combined forecast approaches versus a composite forecast approach, Journal of Applied Statistics, 20(1), 13-23.
Onwubolu, G. C., Buryan, P., Garimella, S., Ramachandran, V., Buadromo, V., Abraham, A., 2007, Self-organizing data mining for weather forecasting, Paper presented at the meeting of IADIS European Conference Data Ming.
Parisi, A., Parisi, F., Díaz D., 2008, Forecasting gold price changes: rolling and recursive neural network models, Journal of Multinational Financial Management, 18(5), 477-487.
Pei, J., Han, J., Pinto, H., Chen, Q., Dayal, U., Hsu, M. C., 2001, Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth, In: IEEE International Conference on Data Engineering.
Pinto, H., Han, J., Pei, J., Wang, K., Chen, Q., Dayal, U., 2001, Multidimensional sequential pattern mining. In: Information and Knowledge Management Conference, 81-88.
Powell, J. W., Spring, C., 2010, Climate data mining paper analysis a-summaries, In: Canadian Society for Civil Engineering Conference.
Shafiee, S., Topal, E., 2010, An overview of global gold market and gold price forecasting, Resources Policy, In Press.
Srikant, R., Agrawal, R., 1996, Mining sequential patterns: generalizations and performance improvements. In: Proceedings of 5th International Conference Extending Database Technology, 1057, 3-17.
Tatti, N., Cule, B., 2010, Mining closed strict episodes, IEEE International Conference on Data Mining.
Vlachos, M., Meek, C., Vagena, Z., Gunopulos, D., 2004, Identifying similarities, periodicities and bursts for online search queries, In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, 131–142.
Weisberg, A. M., 2000, Gold Plating, Products Finishing Magazine, http://www.pfonline.com/articles/pfd0022.html
Witten, I. H., Frank, E., 2000, Data mining: practical machine learning tools and techniques with JAVA implementations, Morgan Kaufmann, San Francisco, CA.
Wu, B. L., 1995, The Introduction of Time Series Analysis, Hwatai publisher.
Yao, J., Kong, S., 2008, The application of stream data time-series pattern reliance mining in stock market analysis, In: IEEE International Conference on Service Operations and Logistics, and Informatics, Beijing.