| 研究生: | 黃柏翰 Huang, Bo-Han | 
|---|---|
| 論文名稱: | 不同分類器的混合型離散化方法之一致性分析 Consistency Analysis of Hybrid Discretization Method among Classification Algorithms | 
| 指導教授: | 翁慈宗 Wong, Tzu-Tsung | 
| 學位類別: | 碩士 Master | 
| 系所名稱: | 管理學院 - 資訊管理研究所 Institute of Information Management | 
| 論文出版年: | 2015 | 
| 畢業學年度: | 103 | 
| 語文別: | 中文 | 
| 論文頁數: | 50 | 
| 中文關鍵詞: | 混合型離散化方法 、一致性 、一致性測度 、分類器 | 
| 外文關鍵詞: | Classifier, consistency, consistency measure, hybrid discretization method | 
| 相關次數: | 點閱:101 下載:2 | 
| 分享至: | 
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 | 
分類是資料探勘領域中處理資料的一種方法,根據資料的屬性,經過運算處理而得到每筆資料的分類結果。大多數資料檔內的屬性都包含了連續型屬性,在適用於離散型屬性的分類器中,一般會先將連續型屬性進行離散化處理,將資料轉換為離散型屬性。因此,離散化方法的挑選有可能影響到分類器的預測結果。混合型離散化將連續型屬性個別進行離散化處理,來搜尋最適合的離散化方法。相較於將同一資料檔內的屬性皆採用同一種離散化方法來說,混合型離散化方法更能提升分類正確率。在混合型離散化的文獻中,已經建立出一個適用於多種處理離散型屬性分類器的混合型離散化方法,且在資料前置處理步驟時即可完成所有的離散化動作。然而,在決策樹的分類結果上,使用混合型離散化方法與統一離散化方法的結果持平。因此本研究的目的在於探討不同分類器最佳混合型離散化方法的一致性,希望在了解一致性高低後,能提供混合型離散化方法修正的方向,改善適用於不同分類器的混合型離散化方法。本研究將利用交替採用最佳混合型離散化方法與新提出的一致性測度,來衡量不同分類器最佳混合型離散化方法的一致程度。本研究使用30個資料檔分別於決策樹、簡易貝氏分類器、與基於規則分類器進行分類驗證,相較於交替採用最佳混合型離散化方法,分類器本身的最佳混合型離散化方法已能達不錯的正確率,但仍有些結果優於本身的最佳混合型離散化方法,且一致性測度結果明顯偏低,表示不同分類器的混合型離散方法不一致。因此若想求解適用不同分類器的最佳混合型離散化組合,可能需重新考量各別分類器之特性,並將特性加入計算當中,才有機會於前置處理階段即求得適用於不同分類器的最佳混合型離散化方法。
Discretization is one of the major approaches for processing continuous attributes for classification. However, the resulting accuracies for a data set discretized by various discretization methods may be greatly different. Hybrid discretization method was proposed recently, and it can generally achieve a better performance for naïve Bayesian classifier than unified discretization. A study has developed a hybrid discretization method applicable for classifiers such that it can determine the discretization method for each attribute in data preprocessing step. However, the results of that study demonstrated that it cannot improve the performance of decision trees. Therefore, the objective of this study is to investigate the consistency of hybrid discretization results among classification algorithms. This study proposes two approaches to perform consistency analysis. The first approach is to identify whether the best hybrid discretization results for a classification algorithm can improve the performance of the others. A new measure is also proposed to evaluate the consistency of the best hybrid discretization results of two classification algorithms. The classification tools for testing our methods are decision trees, naïve Bayesian classifiers, and rule-based classifiers. The experimental results on 30 data sets show that the best hybrid discretization results for an algorithm seldom improve the performance of the others. Moreover, most of the values of the consistency measure are low. These results suggest that the characteristics of a classification algorithm should be considered in designing a hybrid discretization method in data preprocessing.
伍碧那, (2014)。適用於不同分類器的混合型離散化方法。國立成功大學資訊管理研究所碩士論文。
Ahmed, P. (2014). A Hybrid-Based Feature Selection Approach for IDS Networks and Communications (NetCom2013) (pp. 195-211): Springer.
Bache, K. and Lichman, M. (2013). UCI machine learning repository http://www.ics.uci.edu/~mlearn/MLRepository.html.
Bellman, R. (1957). Dynamic Programming, Princeton. NJ: Princeton UP, 18. 
Cannas, L. M., Dessi, N., & Pes, B. (2013). Assessing similarity of feature selection techniques in high-dimensional domains. Pattern Recognition Letters, 34(12), 1446-1453. 
Cao, F., Ge, Y., & Wang, J. F. (2014). Spatial data discretization methods for geocomputation. International Journal of Applied Earth Observation and Geoinformation, 26, 432-440. 
Engle, K. M., & Gangopadhyay, A. (2010). An Efficient Method for Discretizing Continuous Attributes. International Journal of Data Warehousing and Mining, 6(2), 1-21. 
Fayyad, U., & Irani, K. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. The 13th International Joint Coference on Artificial Intelligence (IJCAI), 1022-1029. 
Garcia, S., Luengo, J., Saez, J. A., Lopez, V., & Herrera, F. (2013). A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning. IEEE Transactions on Knowledge and Data Engineering, 25(4), 734-750. 
Hu, Q., Pedrycz, W., Yu, D., & Lang, J. (2010). Selecting discrete and continuous features based on neighborhood decision error minimization. IEEE Transactions on Systems, Man, and Cybernetics─Part B: Cybernetics, 40(1), 137-150. 
Jiang, S.-y., Li, X., Zheng, Q., & Wang, L.-x. (2009). Approximate equal frequency discretization method. Intelligent Systems, GCIS ’09. WRI Global Congress, 3, 514-518.
Jung, Y.-G., Kim, K. M., & Kwon, Y. M. (2012). Using Weighted Hybrid Discretization Method to Analyze Climate Changes. Computer Applications for Graphics, Grid Computing, and Industrial Environment. Springer Berlin Heidelberg, Communications in Computer and Information Science, 351, 189-195.
Li, M., Deng, S. B., Feng, S. Z., & Fan, J. P. (2011). An effective discretization based on Class-Attribute Coherence Maximization. Pattern Recognition Letters, 32(15), 1962-1973.
Liu, H. W., Sun, J., Liu, L., & Zhang, H. J. (2009). Feature selection with dynamic mutual information. Pattern Recognition, 42(7), 1330-1339.
Lustgarten, J. L., Visweswaran, S., Gopalakrishnan, V., & Cooper, G. F. (2011). Application of an efficient Bayesian discretization method to biomedical data. BMC Bioinformatics, 12, 15. 
Mitchell, T. M. (1997). Machine learning : McGraw-Hill
Nelwamondo, F. V., Golding, D., & Marwala, T. (2013). A dynamic programming approach to missing data estimation using neural networks. Information Sciences, 237, 49-58. 
Park, C. H., & Lee, M. (2009). A SVM-based discretization method with application to associative classification. Expert Systems with Applications, 36(3), 4784-4787. 
Pisica, I., Taylor, G., & Lipan, L. (2013). Feature selection filter for classification of power system operating states. Computers & Mathematics with Applications, 66(10), 1795-1807. 
Sakar, C. O., Kursun, O., & Gurgen, F. (2012). A feature selection method based on kernel canonical correlation analysis and the minimum Redundancy-Maximum Relevance filter method. Expert Systems with Applications, 39(3), 3432-3437. 
Sang, Y., Jin, Y. W., Li, K. Q., & Qi, H. (2013). UniDis: a universal discretization technique. Journal of Intelligent Information Systems, 40(2), 327-348. 
Shen, C. C., & Chen, Y. L. (2008). A dynamic-programming algorithm for hierarchical discretization of continuous attributes. European Journal of Operational Research, 184(2), 636-651.
Wong, D. F., Chao, L. S., & Zeng, X. D. (2014). A Supportive Attribute-Assisted Discretization Model for Medical Classification. Bio-Medical Materials and Engineering, 24(1), 289-295.
Wong, T. T. (2012). A hybrid discretization method for naive Bayesian classifiers. Pattern Recognition, 45(6), 2321-2325. 
Yan, D. Q., Liu, D. S., & Sang, Y. (2014). A new approach for discretizing continuous attributes in learning systems. Neurocomputing, 133, 507-511.
Yang, Y., & Webb, G. I. (2009). Discretization for naive-Bayes learning: managing discretization bias and variance. Machine Learning, 74(1), 39-74.
Yu, L., & Liu, H. (2003). Feature selection for high-dimensional data: A fast correlation-based filter solution. Advanced Materials Research, 403-408, 1834-1838.
Zhao, J., Han, C. Z., Wei, B., & Han, D. Q. (2012). A UMDA-based discretization method for continuous attributes. Advanced Materials Research, 403, 1834-1838.