簡易檢索 / 詳目顯示

研究生: 邱品晴
CHIOU, PIN-CHING
論文名稱: 以MCC評估分類演算法效能之有母數方法
Parametric Methods for Evaluating the Performance of Two Classification Algorithms by Matthew Correlation Coefficient
指導教授: 翁慈宗
Wong, Tzu-Tsung
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 86
中文關鍵詞: MCC有母數檢定不平衡資料分類抽樣分配
外文關鍵詞: Matthew correlation coefficient, parametric method, imbalanced data, Classification, sampling distribution
相關次數: 點閱:7下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著資料探勘與機器學習的迅速發展,分類演算法效能的評估變得越來越重要,尤其是在比較不平衡資料集中的演算法性能時更是如此。雖然F-measure、G-mean和ROC 曲線下面積AUC等指標常被用於評估不平衡資料集的性能,但因為這些指標都是由兩個或多個指標組合而成,因此這些指標間的相依性會增加抽樣分配的複雜性。相較之下,馬修斯相關係數MCC因同時考量混淆矩陣中的四個元素,不僅能提供較全面的評估結果,也較能推導出其抽樣分配。然而,現有研究較少談論到MCC的抽樣分配,使得研究者難以確定其分配型態,決定相對應的有母數檢定方法,因此常使用無母數檢定,如Wilcoxon 符號排序檢定來比較分類性能的優劣。針對此問題,本研究透過MCC相關公式進行,透過模擬抽樣來驗證其分配型態,並探討MCC在不同假設條件下的分佈型態。接著,本研究依據模擬得出之分配型態,設計出以等分或資料集為聚合層級的成對有母數檢定方法,以檢驗兩分類演算法在不平衡資料集上的MCC是否存在顯著差異。最後,本研究透過14個資料集進行實證分析,比較所提出的有母數檢定方法與傳統無母數檢定方法在檢定力與顯著性判斷上的差異。最終的實證結果顯示,當資料符合檢定條件時,本研究提出的有母數檢定法相較於無母數檢定方法Wilcoxon 符號排序檢定具備更高的檢定力,能夠更有效地辨識分類演算法間的效能差異。

    With the rapid development of data mining and machine learning, evaluating the performance of classification algorithms is more important than before, especially for imbalanced datasets. Nonparametric methods, such as Wilcoxon signed-rank test, are generally applied for performance comparison when evaluation metrics are F-measure, G-mean, and AUC, because of the complexity in deriving their sampling distributions. In contrast, Matthew correlation coefficient (MCC) is calculated from all of the four elements in a confusion matrix, and deriving its sampling distribution is likely to be possible. To address this issue, this study first simulates the sampling distribution of MCC under various assumptions to set the large-sample conditions for this sampling distribution. Then the parametric methods for evaluating the performance of a classification algorithm on an imbalanced data set by MCC are proposed for fold-level and dataset-level aggregation. Those parametric methods are extended to compare the performance of two algorithms one multiple imbalanced data sets. An empirical study on 14 datasets shows that when the large-sample conditions hold, the proposed parametric methods are more powerful than Wilcoxon signed-rank test in determining whether two classification algorithms have significantly different performance on multiple imbalanced data sets.

    摘要 I 目錄 1 表目錄 4 圖目錄 5 第一章 緒論 6 1.1 研究背景與動機 6 1.2 研究目的 7 1.3 研究架構 7 第二章 文獻探討 9 2.1 不平衡資料集 9 2.2 分類方法 11 2.2.1 簡易貝氏分類器 11 2.2.2 袋裝法 12 2.2.3 隨機森林 12 2.2.4 自適應提升法 12 2.3 不平衡資料集中常用的評估指標 13 2.3.1 F-measure、G-mean、AUC 16 2.3.2 MCC 17 2.4 不平衡資料集中的統計方法 18 2.5 小結 20 第三章 研究方法 21 3.1 MCC抽樣模擬之做法 21 3.1.1 MCC為0之模擬做法 22 3.1.2 產生母體混淆矩陣 23 3.1.3 抽樣 26 3.1.4 分佈檢定 26 3.1.5 隨機模擬 30 3.2 MCC不為0的模擬方法 31 3.3 單一分類方法性能的有母數檢定 34 3.3.1 以等分為聚合層級 35 3.3.2 以資料集為聚合層級 37 3.4 兩分類方法的有母數檢定 38 3.4.1 以等分為聚合層級之匹配樣本 39 3.4.2 以等分為聚合層級之獨立樣本 40 3.4.3 以資料集為聚合層級 41 3.5 分類方法於多個資料集之分類效能比較 45 3.5.1 以等分為聚合層級之匹配樣本 46 3.5.2 以等分為聚合層級之獨立樣本 48 3.5.3 以資料集為聚合層級 50 3.6 實驗方法評估與比較 51 第四章 實證研究 52 4.1 資料集介紹 53 4.2 單一資料集 54 4.2.1 以等分為聚合層級之匹配樣本 54 4.2.2 以等分為聚合層級之獨立樣本 58 4.2.3 以資料集為聚合層級 60 4.3 多個資料集 63 4.3.1 以等分為聚合層級之匹配樣本 64 4.3.2 以等分為聚合層級之獨立樣本 66 4.3.3 以資料集為聚合層級 68 4.4 小結 70 第五章 結論與建議 72 5.1 結論 72 5.2 未來研究與發展 73 參考文獻 74

    Alamri, M.& Ykhlef, M. (2024). Hybrid undersampling and oversampling for handling imbalanced credit card data. IEEE Access, 12, 14050-14060.
    Ampomah, E. K., Qin, Z., Nyame, G., & Botchey, F. E. (2021). Stock market decision support modeling with tree-based AdaBoost ensemble machine learning models. Informatica, 44(4), 477-489.
    Bach, M., Werner, A., Żywiec, J., & Pluskiewicz, W. (2017). The study of under-and over-sampling methods’ utility in analysis of highly imbalanced data on osteoporosis. Information Sciences, 384, 174-190.
    Bagui, S.& Li, K. (2021). Resampling imbalanced data for network intrusion detection datasets. Journal of Big Data, 8(1), 6.
    Baldi, P., Brunak, S., Chauvin, Y., Andersen, C. A., & Nielsen, H. (2000). Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics, 16(5), 412-424.
    Batista, G. E., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter, 6(1), 20-29.
    Batuwita, R.& Palade, V. (2012). Adjusted geometric-mean: a novel performance measure for imbalanced bioinformatics datasets learning. Journal of Bioinformatics and Computational Biology, 10(04), 1250003.
    Bekkar, M., Djemaa, H. K., & Alitouche, T. A. (2013). Evaluation measures for models assessment over imbalanced data sets. Journal of Information Engineering and Applications, 3(10), 27-38.
    Boughorbel, S., Jarray, F., & El-Anbari, M. (2017). Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PLoS One, 12(6), 0177678.
    Branco, P., Torgo, L., & Ribeiro, R. P. (2016). A survey of predictive modeling on imbalanced domains. ACM Computing Surveys, 49(2), 1-50.
    Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123-140.
    Breiman, L. (2001). Random forests. Machine Learning, 45, 5-32.
    Chabalala, Y., Adam, E., & Ali, K. A. (2023). Exploring the effect of balanced and imbalanced multi-class distribution data and sampling techniques on fruit-tree crop classification using different machine learning classifiers. Geomatics, 3(1), 70-92.
    Chen, Y. (2009). Learning Classifiers from Imbalanced, only Positive and Unlabeled Data Sets. Department of Computer Science, Iowa State University.
    Chicco, D. & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21, 1-13.
    Chicco, D. & Jurman, G. (2023). A statistical comparison between Matthews correlation coefficient (MCC), prevalence threshold, and Fowlkes–Mallows index. Journal of Biomedical Informatics, 144, 104426.
    Cortina-Borja, M. (2012). Handbook of parametric and nonparametric statistical procedures, 5th Edition. CRC Press.
    Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 7, 1-30.
    Devi, D., Biswas, S. K., & Purkayastha, B. (2022). Correlation-based oversampling aided cost sensitive ensemble learning technique for treatment of class imbalance. Journal of Experimental & Theoretical Artificial Intelligence, 34(1), 143-174.
    Ding, Z. (2011). Diversified Ensemble Classifiers for Highly Imbalanced Data Learning and Their Application in Bioinformatics. PhD Thesis. Georgia State University.
    Dunlap, W. P., Brody, C. J., & Greer, T. (2000). Canonical correlation and chi-square: Relationships and interpretation. The Journal of General Gsychology, 127(4), 341-353.
    Elyan, E., Moreno-Garcia, C. F., & Jayne, C. (2021). CDSMOTE: class decomposition and synthetic minority class oversampling technique for imbalanced-data classification. Neural Computing and Applications, 33, 2839-2851.
    Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861-874.
    Foody, G. M. (2023). Challenges in the real world use of classification accuracy metrics: From recall and precision to the Matthews correlation coefficient. PLoS One, 18(10), 0291908.
    Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119-139.
    Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., & Herrera, F. (2011). A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(4), 463-484.
    Gholampour, S. (2024). Impact of Nature of Medical Data on Machine and Deep Learning for Imbalanced Datasets: Clinical Validity of SMOTE Is Questionable. Machine Learning and Knowledge Extraction, 6(2), 827-841.
    Helmy, M., Eldaydamony, E., Mekky, N., Elmogy, M., & Soliman, H. (2022). Predicting Parkinson disease related genes based on PyFeat and gradient boosted decision tree. Scientific Reports, 12, 10004.
    Hido, S., Kashima, H., & Takahashi, Y. (2009). Roughly balanced bagging for imbalanced data. Statistical Analysis and Data Mining: The ASA Data Science Journal, 2(5‐6), 412-426.
    Krawczyk, B. (2016). Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, 5(4), 221-232.
    Krawczyk, B., Galar, M., Jeleń, Ł., & Herrera, F. (2016). Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy. Applied Soft Computing, 38, 714-726.
    Lango, M. (2019). Tackling the problem of class imbalance in multi-class sentiment classification: an experimental study. Foundations of Computing and Decision Sciences, 44(2), 151-178.
    Lobo, J. M., Jiménez‐Valverde, A., & Real, R. (2008). AUC: a misleading measure of the performance of predictive distribution models.Global Ecology and Biogeography, 17(2), 145-151.
    Malek, N. H. A., Yaacob, W. F. W., Wah, Y. B., Nasir, S. A. M., Shaadan, N., & Indratno, S. W. (2023). Comparison of ensemble hybrid sampling with bagging and boosting machine learning approach for imbalanced data. Indonesian Journal of Electrical Engineering and Computer Science, 29(1), 598-608.
    Matthews, B. W. (1975). Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure, 405(2), 442-451.
    Naglik, I. & Lango, M. (2023). GMMSampling: a new model-based, data difficulty-driven resampling method for multi-class imbalanced data. Machine Learning, 113, 5183-5202
    Pecorelli, F., Di Nucci, D., De Roover, C., & De Lucia, A. (2020). A large empirical assessment of the role of data balancing in machine-learning-based code smell detection. Journal of Systems and Software, 169, 110693.
    Provost, F. J., Fawcett, T., & Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. Proceedings of the 15th International Conference on Machine Learning, 445-453.
    Raeder, T., Forman, G., & Chawla, N. V. (2012). Learning from imbalanced data: Evaluation matters. Data Mining: Foundations and Intelligent Paradigms, 23, 315-331.
    Rencher, A. C. & Schaalje, G. B. (2008). Linear Models in Statistics, 2nd Edition. John Wiley & Sons.
    Sachs, L. (2012). Applied Statistics: A Handbook of Techniques, 2nd Edition. Springer Science & Business Media.
    Siegel, S. (1957). Nonparametric statistics. The American Statistician, 11(3), 13-19.
    Sitarz, M. (2022). Extending F1 metric, probabilistic approach. Advances in Artificial Intelligence and Machine Learning, 3(2), 1025-1038.
    Stefanowski, J. (2013). Overlapping, rare examples and class decomposition in learning classifiers from imbalanced data. Emerging Paradigms in Machine Learning, 13, 277-306.
    Sun, Y., Wong, A. K., & Kamel, M. S. (2009). Classification of imbalanced data: A review. International Journal of Pattern Recognition and Artificial Intelligence, 23(4), 687-719.
    Sun, Z., Song, Q., Zhu, X., Sun, H., Xu, B., & Zhou, Y. (2015). A novel ensemble method for classifying imbalanced data. Pattern Recognition, 48(5), 1623-1637.
    Taft, L. M., Evans, R. S., Shyu, C.-R., Egger, M. J., Chawla, N., Mitchell, J. A., Thornton, S. N., Bray, B., & Varner, M. (2009). Countering imbalanced datasets to improve adverse drug event predictive models in labor and delivery. Journal of Biomedical Bnformatics, 42(2), 356-364.
    Takahashi, K., Yamamoto, K., Kuchiba, A., Shintani, A., & Koyama, T. (2023). Hypothesis testing procedure for binary and multi‐class F1‐scores in the paired design. Statistics in Medicine, 42(23), 4177-4192.
    Tanha, J., Abdi, Y., Samadi, N., Razzaghi, N., & Asadpour, M. (2020). Boosting methods for multi-class imbalanced data classification: an experimental review. Journal of Big data, 7, 70.
    Wah, Y. B., Rahman, H. A. A., He, H., & Bulgiba, A. (2016). Handling imbalanced dataset using SVM and k-NN approach. Proceedings of the 23rd Malaysian National Symposium of Mathematical Sciences, 1750, 020023(2016).
    Warghade, S., Desai, S., & Patil, V. (2020). Credit card fraud detection from imbalanced dataset using machine learning algorithm. International Journal of Computer Trends and Technology, 68(3), 22-28.
    Warner, R. M. (2013). Applied Statistics: from Bivariate through Multivariate Techniques. Sage.
    Weng, C. G. & Poon, J. (2008). A new evaluation measure for imbalanced datasets. Proceedings of the 7th Australasian Data Mining Conference, 87, 27-32.
    Yahaya, M., Guo, R., Fan, W., Bashir, K., Fan, Y., Xu, S., & Jiang, X. (2021). Bayesian networks for imbalance data to investigate the contributing factors to fatal injury crashes on the Ghanaian highways. Accident Analysis & Prevention, 150, 105936.
    Yao, J. & Shepperd, M. (2020). Assessing software defection prediction performance: Why using the Matthews correlation coefficient matters. Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering, 120-129.
    Zhang, H. & Sheng, S. (2004). Learning weighted naive Bayes with accurate ranking. Proceedings of the 4th IEEE International Conference on Data Mining, 567-570.
    Zhang, J. & Chen, L. (2019). Clustering-based undersampling with random over sampling examples and support vector machine for imbalanced classification of breast cancer diagnosis. Computer Assisted Surgery, 24, 62-72.
    Zhu, R., Guo, Y., & Xue, J.-H. (2020). Adjusting the imbalance ratio by the dimensionality of imbalanced data. Pattern Recognition Letters, 133, 217-223.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE