| 研究生: |
陳泊亨 Chen, Bo-Heng |
|---|---|
| 論文名稱: |
基於概念演變評估技術之串流資料概念漂移偵測方法 Concept Evolution Inference for Concept Drift without Supplementary Labeled Data |
| 指導教授: |
莊坤達
Chuang, Kun-Ta |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2013 |
| 畢業學年度: | 101 |
| 語文別: | 英文 |
| 論文頁數: | 39 |
| 中文關鍵詞: | 概念漂移 、資料串流 、類別資料 、無標示資料 |
| 外文關鍵詞: | Concept Drift, Data Stream, Categorical Data, Unlabeled Data |
| 相關次數: | 點閱:142 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於目前日常生活中的資料,幾乎都是以連續不斷的形式出現,因此資料串流的探勘越來越受到重視。傳統的分類器(classifiers)主要功能在於從資料特性不會變動的(stationary)資料中做出預測,但對於串流資料(data stream)而言,資料的特性會隨著時間不斷地變動,此種情況稱為概念漂移(concept drift),當概念漂移(concept drift) 發生時,傳統的分類器便無法有效地預測接下來的資料,一般在解決概念漂移的問題,大部分都是以偵測概念漂移是否發生,若發生則去調整我們的模型來預測接下來的資料,且大部分的方法都是基於漸進式學習的方式發展,在漸進式學習中,假設今天進來的資料都是已標示的資料,因此我們可以在下一個時間點即得到此筆資料正確的類別(class),但在日常生活中的串流資料,我們若要得到一筆資料的正確類別,得依靠人工的方式來做標記,這是一個成本非常高的工作,並且我們也無法在短時間內就得到正確的類別,這樣子的方法較不具有彈性,因此我們提出了ICE(Inference of Concept Evolution)這個方法來推斷概念漂移是否發生,ICE是一個利用概念演變來評估概念漂移是否發生的方法,且此方法不需要有依靠已標示資料的協助來完成,在實驗結果中,我們也證實了ICE演算法在合成資料以及真實資料上,都能夠正確的偵測出概念漂移的發生
There has been an increasing interest in mining data streams. Conventional classification approaches are used to mine stationary datasets. These traditional frameworks are not suitable to discover valuable patterns from data streams since the characteristic of data changes over time. When the concept drift occurs, traditional classification algorithms cannot classify the data accurately. To solve the concept drift, the way of detecting concept drift and adapting the classification model to classify the new arrived data is used in most previous works. Most previous works are developed base on incremental learning. Incremental learning assumes that the data arrived with labels. However, if we want to obtain correct label, we must spend much cost to obtain it. So this approach is not applicable in the real life. Hence, we proposed a novel method called Inference of Concept Evolution (abbreviated as ICE). ICE utilizes the concept evolution to infer whether concept drift occur or not without labeled data. In experimental results, we observe that ICE can infer the appearance of concept drift in both synthetic and real data.
[1] C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, “On demand classification of data streams,” ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 503–508, 2004.
[2] C. C. Aggarwal, and P. S. Yu, “On string classification in data streams,” ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 36–45, 2007.
[3] S. H. Bach, and M. A. Maloof, “Paired Learners for Concept Drift,” IEEE International Conference on Data Mining, pp. 23–32, 2008.
[4] K. Bache, and M. Lichman, "UCI Machine Learning Repository," University of California, Irvine, School of Information and Computer Sciences, 2013.
[5] A. Bifet, and R. Gavaldà, “Mining adaptively frequent closed unlabeled rooted trees in data streams,” ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 34–42, 2008.
[6] A. Bifet, and R. Gavaldà, “Learning from Time-Changing Data with Adaptive Windowing,” SIAM International Conference on Data Mining, 2007.
[7] A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer, “MOA: Massive Online Analysis,” Journal of Machine Learning Research, 11, pp. 1601–1604, 2010.
[8] A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavaldà, “New ensemble methods for evolving data streams,” ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 139–148, 2009.
[9] G. Cormode, S. Muthukrishnan, and I. Rozenbaum, “Summarizing and Mining Inverse Distributions on Data Streams via Dynamic Inverse Sampling,” Proceedings of international conference on Very large data bases, pp. 25–36, 2005.
[10] P. Domingos, and G. Hulten, “Mining high-speed data streams,” ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 71–80, 2000.
[11] P. Domingos, and G. Hulten, “Catching up with the Data: Research Issues in Mining Data Streams,” Research Issues on Data Mining and Knowledge Discovery, 2001.
[12] A. Dries, and U. Rückert, “Adaptive Concept Drift Detection,” SIAM International Conference on Data Mining, pp. 233–244, 2009.
[13] J. Gama, P. Medas, G. Castillo, and P. P. Rodrigues, “Learning with Drift Detection,” Proceedings of Brazilian Symposium on Artificial Intelligence, pp. 286–295, 2004.
[14] J. Gama, R. Rocha, and P. Medas, “Accurate decision trees for mining high-speed data streams,” ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 523–528, 2003.
[15] M. N. Garofalakis, and J. Gehrke, “Querying and Mining Data Streams: You Only Get One Look,” Proceedings of international conference on Very large data bases, 2002.
[16] J. Han, and M. Kamber, "Data Mining: Concepts and Techniques," Morgan Kaufmann, 2000.
[17] S. Hashemi, Y. Yang, Z. Mirzamomen, and M. R. Kangavari, “Adapted One-versus-All Decision Trees for Data Stream Classification,” IEEE Transactions on Knowledge and Data Engineering, 21(5), pp. 624–637, 2009.
[18] N. Jiang, and L. Gruenwald, “CFI-Stream: mining closed frequent itemsets in data streams,” ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 592–597, 2006.
[19] J. Z. Kolter, and M. A. Maloof, “Dynamic Weighted Majority: A New Ensemble Method for Tracking Concept Drift,” IEEE International Conference on Data Mining, pp. 123–130, 2003.
[20] K. Nishida, and K. Yamauchi, “Detecting Concept Drift Using Statistical Testing,” Proceedings of international conference on Discovery science, pp. 264–269, 2007.
[21] J. P. Patist, “Optimal Window Change Detection,” IEEE International Conference on Data Mining Workshops, pp. 557–562, 2007.
[22] M. Salganicoff, “Tolerating Concept and Sampling Shift in Lazy Learning Using Prediction Error Context Switching,” Artificial Intelligence Review, 11(1)–(5), pp. 133–155, 1997.
[23] J. C. Schlimmer, and R. H. Granger, “Beyond Incremental Processing: Tracking Concept Drift,” National Conference on Artificial Intelligence, pp. 502–507, 1986.
[24] C. E. Shannon, “A mathematical theory of communication,” ACM SIGMOBILE Mobile Computing and Communications Review, 5(1), pp. 3–55, 2001.
[25] B. Su, Y.-D. Shen, and W. Xu, “Modeling concept drift from the perspective of classifiers,” IEEE Conference on Cybernetics and Intelligent Systems, pp. 1055–1060, 2008.
[26] N. A. Syed, H. Liu, and K. K. Sung, “Handling Concept Drifts in Incremental Learning with Support Vector Machines,” ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 317–321, 1999.
[27] H. Wang, W. Fan, P. S. Yu, and J. Han, “Mining concept-drifting data streams using ensemble classifiers,” ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 226–235, 2003.
[28] P. Wang, H. Wang, X. Wu, W. Wang, and B. Shi, “On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams,” IEEE International Conference on Data Mining, pp. 474–481, 2005.
[29] G. Widmer, and M. Kubat, “Effective Learning in Dynamic Environments by Explicit Context Tracking,” Proceedings of European Conference on Machine Learning, pp. 227–243, 1993.
[30] G. Widmer, and M. Kubat, “Learning in the Presence of Concept Drift and Hidden Contexts,” Machine Learning, 23(1), pp. 69–101, 1996.
[31] Y. Xu, K. Wang, A. W.-C. Fu, R. She, and J. Pei, “Classification spanning correlated data streams,” ACM international conference on Information and knowledge management, pp. 132–141, 2006.
[32] I. Zliobaite, “Learning under Concept Drift: an Overview,” Computing Research Repository, abs/1010.4784, 2010.