| 研究生: |
蘇暐傑 Su, Wei-Jie |
|---|---|
| 論文名稱: |
應用於混合離散化的分類方法之特色測度 Characteristic measures of classification algorithms for hybrid discretization |
| 指導教授: |
翁慈宗
Wong, Tzu-Tsung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2016 |
| 畢業學年度: | 104 |
| 語文別: | 中文 |
| 論文頁數: | 58 |
| 中文關鍵詞: | 特色測度 、分類方法 、動態規劃 、混合型離散化 |
| 外文關鍵詞: | Characteristic measure, classification algorithm, dynamic programming, hybrid discretization method |
| 相關次數: | 點閱:150 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
分類是資料探勘領域中處理資料的一種方法,根據資料的屬性,經過運算處理而得到每筆資料的分類結果。大多數資料檔內的屬性都包含了連續型屬性,在適用於離散型屬性的分類器中,一般會先將連續型屬性進行離散化處理,將資料轉換為離散型屬性。因此,離散化方法的挑選有可能影響到分類器的預測結果。混合型離散化方法係指每個連續型屬性可以使用不同的離散化方法,以求得最適合的混合離散化組合,相較於將同一資料檔內的屬性皆採用同一種離散化方法而言,混合型離散化方法更能提升分類正確率。在其文獻中,已經探討出不同分類器的混合型離散化方法之一致性明顯偏低,表示不同分類器所挑選的混合型化方法組合較不一致,因此本研究的目的在於針對各分類方法的運作特性,以設計出屬性與類別值之間的特色測度,並搭配三種權重設定方式以平衡不同離散化方法的可能值個數,再來本研究將結合作業研究中的網路最佳化問題,並將混合型離散化問題轉換成網路最佳化模型圖,再根據不同分類器所設計的特色測度作為評估指標,使用動態規劃以各別找出其最佳的路徑,此路徑亦代表著不同分類器最適合的混合離散化組合,此作法係希望在於資料前置處理步驟時即可完成所有的離散化動作。最後再依所屬的分類器進行分類,以得出分類正確率來驗證本研究方法之效果。本研究使用20個資料檔分別使用決策樹、簡易貝氏分類器與基於規則分類器進行分類驗證,並且與基本離散化方法作比較,此研究針對各別分類方法求出混合型離散化組合,並將該組合就不同分類器得出分類結果,其結果顯示在於簡易貝氏分類器與基於規則分類器時,大部分的資料檔的分類正確率皆有所提升,在決策樹的部份,特色測度與基本離散化方法的分類結果大多持平。
Discretization is one of the major approaches for processing continuous attributes for classification. However, the resulting accuracies for a data set discretized by various discretization methods may be greatly different. Hybrid discretization method was proposed recently, and it can generally achieve a better performance for naïve Bayesian classifier than unified discretization. A past study has been concluded that the consistency of hybrid discretization results among classification algorithms is low, and hence it is unlikely that the same hybrid discretization result will improve the accuracy for all algorithms. This study therefore considers the learning strategy of each classification algorithm to design its characteristic measure for evaluating the association between a discretized continuous attribute and the class. When the hybrid discretization problem is transformed into a network model, the characteristic measure will be used to calculate the payoff of each arc in the network. Then dynamic programming is employed to find the path in the network that has the largest payoff, and this path indicates the discretization method for each continuous attribute. Three ways for calculating the weights of the characteristic measure are presented to balance the number of possible values resulting from various discretization methods. The classification tools considered in this study for designing characteristic measures are naïve Bayesian classifiers, decision trees and rule-based classifiers. The experimental results on 20 data sets show that the computational cost of our method is low, and that in general, the hybrid discretization method have a better performance in naïve Bayesian classifiers and rule-based classifiers, but not in decision trees.
伍碧那, (2014)。適用於不同分類器的混合型離散化方法。國立成功大學資訊管理研究所碩士論文。
黃柏翰, (2015)。不同分類器的混合型離散化方法之一致性分析。國立成功大學資訊管理研究所碩士論文。
Ahmed, P. (2014). A hybrid-based feature selection approach for IDS. Networks and Communications, 284, 195-211.
Bellman, R. (1957). Dynamic Programming. Princeton, NJ: Princeton University Press.
Cannas, L. M., Dessi, N. and Pes, B. (2013). Assessing similarity of feature selection techniques in high-dimensional domains. Pattern Recognition Letters, 34(12), 1446-1453.
Fayyad, U. and Irani, K. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. The 13th International Joint Coference on Artificial Intelligence (IJCAI), 1022-1029.
Garcia, S., Luengo, J., Saez, J. A., Lopez, V. and Herrera, F. (2013). A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Transactions on Knowledge and Data Engineering, 25(4), 734-750.
Hu, Q., Pedrycz, W., Yu, D. and Lang, J. (2010). Selecting discrete and continuous features based on neighborhood decision error minimization. IEEE Transactions on Systems, Man, and Cybernetics─Part B: Cybernetics, 40(1), 137-150.
Jung, Y. G., Kim, K. M. and Kwon, Y. M. (2012). Using weighted hybrid discretization method to analyze climate changes. Computer Applications for Graphics, Grid Computing, and Industrial Environment. Springer Berlin Heidelberg, Communications in Computer and Information Science, 351, 189-195.
Kerber, R. (1992). Chimerge: discretization of numeric attributes. Ninth National Confrerence Articial Intelligence, 123-128
Liu, H. W., Sun, J., Liu, L. and Zhang, H. J. (2009). Feature selection with dynamic mutual information. Pattern Recognition, 42(7), 1330-1339.
Lustgarten, J. L., Visweswaran, S., Gopalakrishnan, V. and Cooper, G. F. (2011). Application of an efficient Bayesian discretization method to biomedical data. BMC Bioinformatics, 12, 309.
Nelwamondo, F. V., Golding, D. and Marwala, T. (2013). A dynamic programming approach to missing data estimation using neural networks. Information Sciences, 237, 49-58.
Sakar, C. O., Kursun, O. and Gurgen, F. (2012). A feature selection method based on kernel canonical correlation analysis and the minimum redundancy-maximum relevance filter method. Expert Systems with Applications, 39(3), 3432-3437.
Shen, C. C. and Chen, Y. L. (2008). A dynamic-programming algorithm for hierarchical discretization of continuous attributes. European Journal of Operational Research, 184(2), 636-651.
Wong, D. F., Chao, L. S. and Zeng, X. D. (2014). A supportive attribute-assisted discretization model for medical classification. Bio-Medical Materials and Engineering, 24(1), 289-295.
Wong, T. T. (2012). A hybrid discretization method for naive Bayesian classifiers. Pattern Recognition, 45(6), 2321-2325.
Wu, B., Zhang , L. P. and Zhao, Y. D. (2014). Feature selection via cramer's v-test discretization for remote-sensing image classification. IEEE Transactions on Geoscience and Remote Sensing, 52(5), 2593-2606.
Yan, D. Q., Liu, D. S. and Sang, Y. (2014). A new approach for discretizing continuous attributes in learning systems. Neurocomputing, 133, 507-511.
Yang, Y. and Webb, G. I. (2009). Discretization for naive-Bayes learning: managing discretization bias and variance. Machine Learning, 74(1), 39-74.
Yu, L. and Liu, H. (2003). Feature selection for high-dimensional data: a fast correlation-based filter solution. Advanced Materials Research, 403-408, 1834-1838.
Zhao, J., Han, C. Z., Wei, B. and Han, D. Q. (2012). A UMDA-based discretization method for continuous attributes. Advanced Materials Research, 403, 1834-1838.
Zou, L., Yan, D., Karimi, H. R. and Shi, P. (2013). An algorithm for discretization of real value attributes based on interval similarity. Journal of Applied Mathematics, 1-8.