| 研究生: |
謝承翰 Hsieh, Cheng-Han |
|---|---|
| 論文名稱: |
三種集成式分類方法的正確率與多樣性之比較 A Comparison on the Accuracy and Diversity of Three Ensemble Algorithms. |
| 指導教授: |
翁慈宗
Wong, Tuz-Tsung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 中文 |
| 論文頁數: | 63 |
| 中文關鍵詞: | 集成分類方法 、多樣性測度 、袋裝法 、提升法 、堆疊法 、模型間相關性 |
| 外文關鍵詞: | Bagging, boosting, diversity, ensemble algorithm, stacking |
| 相關次數: | 點閱:113 下載:26 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
根據以往的研究文獻得知,集成分類方法的研究在資料探勘領域中是熱門的議題之一,其概念為透過整合許多單一分類方法生成的基礎模型,來形成一個最終模型,因此該分類方法相較於單一分類方法會有較好的分類效能,但是在使用這類分類方法時,通常僅是以分類正確率來做為比較依據,對於基礎模型間的相關性比較並無太多的討論,因此本研究以袋裝法、提升法及堆疊法這三種常見的集成分類方法進行比較,在比較上除了以分類正確率來代表分類效能外,同時也會以多樣性測度來衡量基礎模型間的相關性,透過這三種集成分類方法取得多個資料檔下的分類正確率及多樣性數值,將這些數據以統計檢定的方式進行比較,檢驗這三種分類方法是否有顯著差異,目的在於希望透過該比較結果,提供欲使用集成分類方法的使用者一選擇基準做為參考,除了以分類正確率當作選擇基準外,也能以基礎模型間的相關性來衡量分類方法的選用。透過15個資料檔,分別在單一資料檔下及多個資料檔下取得所需的數據,結果顯示基礎模型間的相關性對於集成模型的分類效能並未有直接的影響,因此對於集成分類方法的選擇上,基礎模型間的相關性相較於分類效能,便不是需優先考量的因素,而在分類效能上,袋裝法、提升法和堆疊法則並無明顯的差別。
關鍵字:集成分類方法、多樣性測度、袋裝法、提升法、堆疊法、模型間相關性
Ensemble learning is one of the popular issues in data mining. The concept of ensemble learning is to build an ensemble model by integrating multiple base models. Previous studies showed that ensemble algorithms generally have better performance than traditional classification algorithms. Most of ensemble algorithms are evaluated by accuracy, while the diversity among base models is suggested to be a critical factor for the performance of an ensemble model. In this thesis, three popular ensemble algorithms bagging, boosting, and stacking are considered for performance comparison. Statistical methods are proposed to compare the accuracy and the diversity resulting from the three ensemble algorithms. The experimental results on 15 data sets show that the mean accuracies resulting from the three ensemble algorithms are not significantly different. However, their resulting values of pairwise and non-pairwise diversity metrics are generally significantly different. These suggest that the diversity among base models may be not a good indicator for the performance of an ensemble model.
Key words: Bagging, boosting, diversity, ensemble algorithm, stacking
Bache, K., & Lichman, M. (2013). UCI Machine Learning Repository. Retrieved from University of California, School of Information and Computer Science. Web site: http://archive.ics.uci.edu/ml
Banfield, R. E., Hall, L. O., Bowyer, K. W., & Kegelmeyer, W. P. (2005). “Ensemble diversity measures and their application to thinning.” Information Fusion, 6(1), 49–62.
Bian, S., & Wang, W. (2007). “On diversity and accuracy of homogeneous and heterogeneous ensembles.” International Journal of Hybrid Intelligent Systems, 4, 103–128.
Breiman, L. (1996). “Bagging predictors.” Machine Learning, 24(2), 123–140.
Breiman, L. (2001). “Random forest.” Machine Learning, 45(1), 5–32.
Dietterich, T. (2000). “An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization.” Machine Learning, 40(2), 139–157.
Dietterich, T. (2000). “Ensemble methods in machine learning.” Proceeding of the Multiple Classifiers Systems Workshop, 1–15.
Dzeroski, S., & Zenko, B. (2004). “Is combining classifiers with stacking better than selecting the best one?” Machine Learning, 54(3), 255–273.
Freund Y. (1995). “Boosting a weak learning algorithm by majority.” Information and Computation, 121(2), 256–285.
Freund, Y., & Schapire, R. E. (1996). “Experiments with a new boosting algorithm.” Proceeding of the 13th International Conference on Machine Learning, 148–156.
Giacinto, G., & Roli, F. (2001). “Design of effective neural network ensembles for image classification processes.” Image and Vision Computing, 19, 699–707.
Hu, X. (2001). “Using rough sets theory and database operations to construct a good ensemble of classifiers for data mining applications.” Proceedings 2001 IEEE International Conference on Data Mining, 233–240.
Jiang, W. L., Chen, Z. H., Xiang, Y., Shao, D. G., Ma, L., & Zhang J. P. (2019). “SSEM: a novel self-adaptive stacking ensemble model for classification.” IEEE Access, 7,120337–120349.
Jurek, A., Bi, Y., Wu, S., & Nugent, C. (2011). “Classification by clusters analysis—an ensemble technique in a semi-supervised classification.” 23rd IEEE International Conference on Tools with Artificial Intelligence, Boca Raton, FL, USA.
Jurek, A., Bi, Y., Wu, S., & Nugent, C. (2014). “A survey of commonly used ensemble-based classification techniques.” Knowledge Engineering Review, 29(5), 551–581.
Kadkhodaei, H. R., Moghadam, A. M. E., & Dehghan, M. (2020). “HBoost: a heterogeneous ensemble classifier based on the boosting method and entropy measurement.” Expert Systems with Applications, 157, 113482.
Kang, S., Cho, S., & Kang P. (2015). “Multi-class classification via heterogeneous ensemble of one-class classifiers.” Engineering Applications of Artificial Intelligence, 43, 35–43.
Kohavi, R., & Wolpert, D. (1996). “Bias plus variance decomposition for zero-one loss functions.” Proceeding of the 13th International Conference on Machine Learning, 275–283. Morgan Kaufmann, USA.
Kuncheva, L. I., & Whitaker, C. J. (2003). “Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy.” Machine Learning, 51(2),181–207.
Rodriguez, J. J., & Kuncheva, L. I. (2006). “Rotation forest: a new classifier ensemble method.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1619–1630.
Rokach, L. (2010). “Ensemble-based classifiers.” Artificial Intelligence Review, 33(1), 1–39.
Sagi, O., & Rokach, L. (2018). “Ensemble learning: a survey.” Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery, 8(4), 1–18.
Shunmugapriya P., & Kanmani S. (2013). “Optimization of stacking ensemble configurations through artificial bee colony algorithm.” Swarm and Evolutionary Computation, 12, 24–32.
Skalak, D. (1996). “The sources of increased accuracy for two proposed boosting algorithms.” Proceeding of American Association for Artificial Intelligence, AAAI 96, Integrating Multiple Learned Models Workshop. Portland, USA.
Tukey J. W. (1977). Exploratory Data Analysis, Addison-Wesley.
Tuysuzoglu G., & Birant D. (2020). “Enhanced Bagging (eBagging): A novel approach for ensemble learning.” International Arab Journal of Information Technology,17(4), 515–528.
Wickramaratna, J., Holden, S., & Buxton, B. (2001). “Performance degradation in boosting.” Proceeding of the Multiple Classifiers Systems Workshop, 11–21.
Wolpert D. H. (1992). “Stacked generalization.” Neural Networks, 5(2), 241–259.
Wong, T. T. (2015). “Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation.” Pattern Recognition, 48(9), 2839–2846.
Wong, T. T. (2017). “Parametric methods for comparing the performance of two classification algorithms evaluated by k-fold cross validation on multiple data sets.” Pattern Recognition, 65, 97¬¬–107.
Yang, D. H., Lee, H. J., & Lim, D. J. (2020). “RolexBoost: A rotation-based boosting algorithm with adaptive loss functions.” IEEE Access, 8, 41037–41044.
Yule. G. (1900). “On the association of attributes in statistics.” Philosophical Transactions of the Royal Society A, 194, 257–319.
Zhang C. X., & Zhang J. S. (2010). “A variant of rotation forest for constructing ensemble classifiers.” Pattern Analysis and Applications, 13(1), 59–77.