簡易檢索 / 詳目顯示

研究生: 鍾佩真
Chung, Pei-Chen
論文名稱: 不平衡資料分類效能衡量指標之性質分析
The Characteristic Analysis of the Metrics for Evaluating Classification Performance on Imbalanced Data
指導教授: 翁慈宗
Wong, Tzu-Tsung
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 67
中文關鍵詞: 分類效能混淆矩陣不平衡資料集評估指標
外文關鍵詞: Classification, Confusion Matrix, Evaluation Metric, Imbalanced Data
相關次數: 點閱:118下載:15
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 面對不平衡的資料檔時,因資料本身在類別值分布上具有很大的差異,易使分類模型預測的結果偏重於多數類別,此時若使用了錯誤的評估指標,則可能對分類結果產生錯誤的解讀。在同時使用多項評估指標為模型預測結果做評估時,常會出現結果不一致的情況,然而目前對於應主要依據哪一項指標作為模型最後的評估結果仍沒有定論或廣泛的共識。因此,本研究針對不平衡資料,從F測度、G-mean、MCC及AUC這四項不平衡資料分類效能的熱門評估指標,其基本性質著手,探討指標各自的性質及其適用的情形。本研究依據指標類型討論指標各自的特性及彼此的關聯,並利用四種方法對40個實務資料檔的測試結果來與模擬結果進行驗證,最後得到三點結論:由於不同原始條件下,提升三項閾值型指標分數所代表的意義,及需要耗費的努力程度不同,因此在解讀評估結果時除了比較最終的數值及改善幅度之外,應同時留意原始數值及改善的方式;三項閾值型指標彼此間的相關性及一致性都很高,因此同時使用的必要性不高,可從中挑選一個做為評估指標即可;若有推薦或者排序等特殊需求,可使用AUC做為評估指標,但是由於AUC對於模型衡量的面向與另三者不同,因此在使用上應更留意。

    Several evaluation metrics other than accuracy have been proposed in imbalanced fields, while there is still no widespread consensus on which metric is more suitable for performance evaluation. This study focuses on four popular evaluation metrics for the classification performance on imbalanced data, F-measure, G-mean, MCC and AUC. Simulation studies are first performed to investigate the characteristics of the metrics and their relationships. Then the observations resulting from 40 real datasets classified by four algorithms were collected to verify the simulation results. The conclusions drawn from the simulation and experimental studies are: (1) For the three threshold metrics, the effort for increasing a specific amount highly depends on the original value of the metric. Attention should thus be paid to both the original value of a metric and the way for performance improvement. (2) F-measure, G-mean, MCC have very strong linear relationship and very high consistency rate. Analysts can arbitrarily choose one of them as the evaluation metric of an application. (3) The inconsistency rate between AUC and each of the three threshold metrics is larger than 30%, and it should thus be used only for special needs such as recommendation or ranking.

    摘要I 表目錄VIII 圖目錄IX 第一章 緒論 1 1.1研究背景與動機 1 1.2研究目的 3 1.3研究架構 3 第二章 文獻探討 4 2.1不平衡資料集 4 2.2不平衡資料分類的評估指標 6 2.2.1 混淆矩陣 7 2.2.2 F測度 8 2.2.3 G-mean 9 2.2.4 MCC 10 2.2.5 AUC 11 2.3 不平衡指標之比較 13 2.4 小結 15 第三章 閾值型指標性質與分析 16 3.1 平均數之基本關係與證明 16 3.2 實驗方法與指標相關性 18 3.2.1 實驗方法 18 3.2.2指標相關性 22 3.3 F測度值上升之性質分析 24 3.4 MCC值受不平衡率影響之分析 34 3.5 一致性檢測 36 3.5.1 指標結果一致性 36 3.5.2 不平衡率對一致性之影響 37 3.6小結 40 第四章 排序型指標性質與分析 41 4.1 排序與分類 41 4.2 混淆矩陣與AUC區間 45 4.3 小結 49 第五章 實證研究 50 5.1 資料集介紹 50 5.2 實驗結果 52 5.2.1 相關性實驗結果 52 5.2.2一致性實驗結果 53 5.3小結 54 第六章 結論與建議 56 6.1結論 56 6.2未來發展與建議 57 參考文獻 58 附錄一、資料集詳細輸出結果 63

    Bach, F., Heckerman, D., & Horvitz, E. (2005). On the path to an ideal ROC curve: Considering cost asymmetry in learning classifiers. Proceedings of the International Workshop on Artificial Intelligence and Statistics, 9-16. Savannah Hotel, Barbados.
    Baldi, P., Brunak, S., Chauvin, Y., Andersen, C. A., & Nielsen, H. (2000). Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics, 16(5), 412-424.
    Batista, G. E., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter, 6(1), 20-29.
    Bekkar, M., Djemaa, H. K., & Alitouche, T. A. (2013). Evaluation measures for models assessment over imbalanced data sets. Journal of Information Engineering and Applications, 3(10).
    Brown, J. B. (2018). Classifiers and their metrics quantified. Molecular Informatics, 37(1-2), 1700127.
    Brzezinski, D., Stefanowski, J., Susmaga, R., & Szczech, I. (2019). On the dynamics of classification measures for imbalanced and streaming data. IEEE Transactions on Neural Networks and Learning Systems, 31(8), 2868-2878.
    Chicco, D. & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC genomics, 21(1), 1-13.
    de Oliveira, R. G. & Guedes, D. P. (2018). Performance of anthropometric indicators as predictors of metabolic syndrome in Brazilian adolescents. BMC Pediatrics, 18(1), 1-9.
    Domingos, P. (1999). Metacost: A general method for making classifiers cost-sensitive. Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 155-164, San Diego, California USA
    Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861-874.
    Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17(3), 37-37.
    Fernández, A., García, S., Galar, M., Prati, R. C., Krawczyk, B., & Herrera, F. (2018). Learning From Imbalanced Data Sets. Springer, Berlin.
    García, V., Sánchez, J. S., & Mollineda, R. A. (2012). On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowledge-Based Systems, 25(1), 13-21.
    Gu, Q., Zhu, L., & Cai, Z. (2009). Evaluation measures of the classification performance of imbalanced data sets. Proceedings of the International Symposium on Intelligence Computation and Applications, 461-471, Springer, Berlin, Heidelberg.
    Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., & Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 73, 220-239.
    Halimu, C., Kasem, A., & Newaz, S. S. (2019). Empirical comparison of area under ROC curve (AUC) and Mathew correlation coefficient (MCC) for evaluating machine learning algorithms on imbalanced datasets for binary classification. Proceedings of the 3rd International Conference on Machine Learning and Soft Computing, 1-6, Da Lat, Vietnam.
    Hu, B. G. & Dong, W. M. (2014). A study on cost behaviors of binary classification measures in class-imbalanced problems. Computer Science, 8(11), e79774-e79774.
    Huang, J. & Ling, C. X. (2005). Using AUC and accuracy in evaluating learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 17(3), 299-310.
    Jiménez‐Valverde, A. (2012). Insights into the area under the receiver operating characteristic curve (AUC) as a discrimination measure in species distribution modelling. Global Ecology and Biogeography, 21(4), 498-507.
    Joshi, M. V. (2002). On evaluating performance of classifiers for rare classes. Proceedings 2002 IEEE International Conference on Data Mining, 641-644, Maebashi City, Japan.
    Kubat, M., & Matwin, S. (1997). Addressing the curse of imbalanced training sets: one-sided selection. Proceedings of the 14th International Conference on Machine Learning, 179-186, San Francisco, United States.
    Ling, C. X., Huang, J., & Zhang, H. (2003). AUC: a statistically consistent and more discriminating measure than accuracy. Proceedings of the 18th international joint conference on Artificial intelligence, 519-524, San Francisco, United States.
    Liu, Y., Zhou, Y., Wen, S., & Tang, C. (2014). A strategy on selecting performance metrics for classifier evaluation. International Journal of Mobile Computing and Multimedia Communications (IJMCMC), 6(4), 20-35.
    López, V., Fernández, A., García, S., Palade, V., & Herrera, F. (2013). An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences, 250, 113-141.
    Luque, A., Carrasco, A., Martín, A., & de las Heras, A. (2019). The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition, 91,216-231.
    Matthews, B. W. (1975). Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure, 405(2), 442-451.
    Parker, C. (2013). On measuring the performance of binary classifiers. Knowledge and Information Systems, 35(1), 131-152.
    Pontius, R. G. & Parmentier, B. (2014). Recommendations for using the relative operating characteristic (ROC). Landscape Ecology, 29(3), 367-382.
    Tsai, M. F., & Yu, S. S. (2015). Data mining for bioinformatics: Design with oversampling and performance evaluation. Journal of Medical and Biological Engineering, 35(6), 775-782.
    Van Rijsbergen, C. J. (1979). Information Retrieval, 2nd edition, London: Butterworths.
    Wainer, J., & Franceschinell, R. A. (2018). An empirical evaluation of imbalanced data strategies from a practitioner's point of view. Retrieved from https://arxiv.org/abs/1810.07168.
    Wardhani, N. W. S., Rochayani, M. Y., Iriany, A., Sulistyono, A. D., & Lestantyo, P. (2019). Cross-validation metrics for evaluating classification performance on imbalanced data. Proceedings 2019 International Conference on Computer, Control, Informatics and its Applications (IC3INA 2019), 14-18, Tangerang, Indonesia.
    Weiss, G. M. & Provost, F. (2003). Learning when training data are costly: The effect of class distribution on tree induction. Journal of Artificial Intelligence Research, 19, 315-354.
    Yao, J. & Shepperd, M. (2020). Assessing software defection prediction performance: Why using the Matthews correlation coefficient matters. Proceedings of the Evaluation and Assessment in Software Engineering, 120-129, Trondheim, Norway.
    Zadrozny, B., & Elkan, C. (2001). Learning and making decisions when costs and probabilities are both unknown. Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, 204-213, San Francisco, California.
    Zhu, Q. (2020). On the performance of Matthews correlation coefficient (MCC) for imbalanced dataset. Pattern Recognition Letters, 136, 71-80.

    下載圖示 校內:2023-07-01公開
    校外:2023-07-01公開
    QR CODE