| 研究生: |
王柏傑 Wang, Bo-Jie |
|---|---|
| 論文名稱: |
用AUC比較兩分類方法於不平衡資料檔分類效能之有母數檢定方法 Parametric Statistical Methods for Comparing the Performance of Classification Algorithms on Imbalanced Data by AUC Measure |
| 指導教授: |
翁慈宗
Wong, Tzu-Tsung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 工業與資訊管理學系 Department of Industrial and Information Management |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 中文 |
| 論文頁數: | 53 |
| 中文關鍵詞: | ROC 、AUC 、Wilcoxon Test 、偏態檢定 、成對檢定 |
| 外文關鍵詞: | ROC, AUC, Wilcoxon Test, Skewness Test, Pair Z Test |
| 相關次數: | 點閱:141 下載:7 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
分類演算法之衡量測度,一直都是資料探勘和機器學習中不可或缺的一部份。然而,部分衡量測度如ROC,因為其原先的圖形特性,使得分類演算法在採用此測度衡量分類效能後,卻難以在多種分類方法間,做有效的有母數統計檢定來比較優劣。因此,本研究先探討現今分類演算法間,在不平衡資料檔下有哪些常用的衡量測度,並闡述了測度ROC特有的視覺化特性,然而當兩分類演算法在視覺化上的ROC曲線,若非呈現支配的現象,反而產生交叉的趨勢,則容易讓使用者難以利用此曲線測度比較分類方法間之優劣。因此計算曲線下佔比面積AUC,便是另一數值化測度,得以讓使用者以數值化的型態比較兩分類方法的優劣。但在現今多數文獻中比較兩分類演算法之分類效能AUC時,因為其數值化的統計分配難以得知,只能以無母數檢定法如Wilcoxon Test來比較其AUC優劣,造成了在統計檢定力相較於有母數檢定法有不足的現象,容易造成顯著性檢定的判斷失誤。因此,本研究在多種分類方法於不平衡資料檔做AUC之分類效能評估時,透過交叉驗證的方式,切割等分數至中央極限定理所需之樣本數,來發展出一套成對有母數檢定法,使統計檢定之檢定力上升,進一步對兩分類方法之AUC分類效能,做顯著性統計檢定。其中切割等分數的部分,本研究更搭配了偏態檢定,檢驗在等分數小於30的情況下,是否呈現對稱,以減少中央極限定理所需切割等分數。最後便以此等分數,搭配本研究所發展之成對有母數檢定法,來比較兩分類方法於不平衡資料檔,AUC之分類效能是否有顯著差異,同時與傳統無母數檢定法比較,是否呈現檢定力上的差異,及顯著性判斷的不同。最後透過實證後發現,本研究之成對有母數檢定法相較於無母數Wilcoxon Test,的確有檢定力上的優勢,然而在顯著性判別上的結果,在大多資料檔下卻無明顯不同。
The evaluation measures of classification algorithms have always been an indispensable part of data mining and machine learning. Some evaluation measures such as ROC is difficult to do an effective parametric statistical test for comparing different classification algorithms in terms of its’ original graphical characteristics. Therefore, this study discusses the common evaluation measures first under the imbalanced data among the widespread classification algorithms, and explains the unique visual characteristics of the measure ROC. Besides, the study also explain that calculating the AUC (Area under the Curve of ROC) is also another numerical way allows users to evaluate and compare two classification algorithms. However, due to the population distribution of AUC in different classification algorithms’ performance are too hard to obtain, users only could test algorithms’ classification performance through nonparametric statistical method. But when it compares to nonparametric statistical method, parametric statistical method could have more statistical power in hypothesis test. As a result, this paper introduces a parametric statistical method based on CLT (Central Limit Theorem) to evaluate two classification algorithms’ performance on imbalanced data. In conclusion, the experiment revealed that although the parametric statistical method could improve the statistical power in hypothesis test, it is still no significant discriminate differences in hypothesis test between parametric and nonparametric statistical method when users compare two classification algorithms’ performance on imbalanced data.
Airola, A., Pahikkala, T., Waegeman, W., De Baets, B., & Salakoski, T. (2011). An experimental comparison of cross-validation techniques for estimating the area under the ROC curve. Computational Statistics & Data Analysis, 55(4), 1828-1844.
Anderson, D. R., Sweeney, D. J., Williams, T. A., Camm, J. D., & Cochran, J. J. (2016). Statistics for Business & Economics. Nelson Education.
Aurelio, Y. S., de Almeida, G. M., de Castro, C. L., & Braga, A. P. (2019). Learning from imbalanced data sets with weighted cross-entropy function. Neural Processing Letters, 50(2), 1937-1949.
Bach, F. R., Heckerman, D., & Horvitz, E. (2005, January). On the Path to an Ideal ROC Curve: Considering Cost Asymmetry in Learning Classifiers. In AISTATS (pp. 9-16).
Batuwita, R., & Palade, V. (2012). Adjusted geometric-mean: a novel performance measure for imbalanced bioinformatics datasets learning. Journal of Bioinformatics and Computational Biology, 10(04), 1250003.
Boughorbel, S., Jarray, F., & El-Anbari, M. (2017). Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PloS one, 12(6).
Branco, P., Torgo, L., & Ribeiro, R. P. (2016). A survey of predictive modeling on imbalanced domains. ACM Computing Surveys, 49(2), 31.
Chen, H., Li, T., Fan, X., & Luo, C. (2019). Feature selection for imbalanced data based on neighborhood rough sets. Information Sciences, 483, 1-20.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297.
Douzas, G., & Bacao, F. (2018). Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Systems with Applications, 91, 464-471.
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861-874.
Forman, G., & Scholz, M. (2010). Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement. ACM SIGKDD Explorations Newsletter, 12(1), 49-57.
Gu, Q., Zhu, L., & Cai, Z. (2009, October). Evaluation measures of the classification performance of imbalanced data sets. In International Symposium on Intelligence Computation and Applications (pp. 461-471). Springer, Berlin, Heidelberg.
Huang, J., & Ling, C. X. (2005). Using AUC and accuracy in evaluating learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 17(3), 299-310.
Joanes, D. N., & Gill, C. A. (1998). Comparing measures of sample skewness and kurtosis. Journal of the Royal Statistical Society: Series D (The Statistician), 47(1), 183-189.
Ling, C. X., Huang, J., & Zhang, H. (2003). AUC: a better measure than accuracy in comparing learning algorithms. In Conference of the Canadian Society for Computational Studies of Intelligence (pp. 329-341).
Maratea, A., Petrosino, A., & Manzo, M. (2014). Adjusted F-measure and kernel scaling for imbalanced data learning. Information Sciences, 257, 331-341.
Marzban, C. (2004). The ROC curve and the area under it as performance measures. Weather and Forecasting, 19(6), 1106-1114.
Montoya Perez, I., Airola, A., Boström, P. J., Jambor, I., & Pahikkala, T. (2019). Tournament leave-pair-out cross-validation for receiver operating characteristic analysis. Statistical Methods in Medical Research, 28(10-11), 2975-2991.
Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J. C., & Müller, M. (2011). pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics, 12(1), 77.
Tan, P. N., Steinbach, M., & Kumar, V. (2016). Introduction to Data Mining. Pearson Education India.
Whitley, E., & Ball, J. (2002). Statistics review 4: sample size calculations. Critical Care, 6(4), 335.
Wong, T. T., & Yang, N. Y. (2017). Dependency analysis of accuracy estimates in k-fold cross validation. IEEE Transactions on Knowledge and Data Engineering, 29(11), 2417-2427.
Wu, S., Flach, P., & Ferri, C. (2007, September). An improved model selection heuristic for AUC. In European Conference on Machine Learning (pp. 478-489). Springer, Berlin, Heidelberg.
Xie, Y., Peng, L., Chen, Z., Yang, B., Zhang, H., & Zhang, H. (2019). Generative learning for imbalanced data using the Gaussian mixed model. Applied Soft Computing, 79, 439-451.
Yijing, L., Haixiang, G., Xiao, L., Yanan, L., & Jinling, L. (2016). Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data. Knowledge-Based Systems, 94, 88-104.
Zhang, H., & Sheng, S. (2004, November). Learning weighted naive Bayes with accurate ranking. In Fourth IEEE International Conference on Data Mining (ICDM'04) (pp.567-570). IEEE.