簡易檢索 / 詳目顯示

研究生: 廖家德
Liao, Jia-De
論文名稱: 袋裝法和提升法分類準確率的差異分析
An Analysis on the Accuracy Difference Between Bagging and Boosting Algorithms
指導教授: 翁慈宗
Wong, Tzu-Tsung
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 74
中文關鍵詞: 集成學習袋裝法提升法統計檢定迴歸分析
外文關鍵詞: Bagging, boosting, ensemble learning, regression analysis, statistical hypothesis testing
相關次數: 點閱:76下載:25
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在資料探勘領域,特別是在分類領域,單一分類模型的表現往往難以應對複雜資料,為因應這一挑戰,集成學習技術應運而生。集成學習通過結合多個分類模型,提高分類準確率,其中,袋裝法和提升法是常用的集成學習演算法,廣泛應用於各個領域。過去的研究主要關注兩種方法的效能比較,從演算法性質到實際應用領域。然而,這些研究尚未深入探討效能差異的根本原因,因此,本研究採用統計檢定方法,深入研究袋裝法和提升法的效能差異,接著根據其結果,進一步比較兩集成演算法基本模型和子集成模型的準確率變化趨勢,以及分析集成演算法將測試資料預測正確所需的平均基本模型數量,以深入探討效能差異的根本原因。研究結果顯示,自適應提升法優於基礎袋裝法的可能原因為基本模型準確率下降趨勢越緩慢,或上升趨勢越快時,以及子集成模型準確率上升趨勢越快等因素。最後兩集成演算法針對測試資料,其預測正確所需的平均基本模型數量難以作為評估兩集成演算法有顯著差異的背後因素。

    In the field of data mining, particularly in classification, the performance of single classification models often struggles to handle complex data. Ensemble learning techniques have emerged to address this challenge. Ensemble learning improves classification accuracy by aggregating the predictions of multiple classification models. Bagging and boosting are two commonly used algorithms applied across various domains. Previous studies that compared the performance of these two methods have primarily focused on algorithmic properties or practical applications. However, those studies did not explore the fundamental reasons for performance differences deeply. This study thus employs statistical hypothesis testing to investigate the causes for the performance differences between bagging and boosting algorithms. Data sets are first divided into three groups based on whether the two ensemble algorithms have significantly different performance. Then the accuracy trends of base models and sub-ensemble models are analyzed by linear regression to explore whether their accuracy trends are largely different. The results indicate that the possible reasons for the superiority of adaptive boosting over basic bagging include a slower decline or faster increase in base model accuracy trends, as well as a faster increase in sub-ensemble model accuracy trends. Finally, the average number of base models required for correct prediction of testing data by the two ensemble algorithms should not be a factor for determining whether the two ensemble algorithms have significantly different performance.

    摘要 I 致謝 VI 目錄 VII 表目錄 IX 圖目錄 X 第一章 緒論 1 1.1 研究動機 1 1.2 研究目的 2 1.3 研究架構 2 第二章 文獻探討 3 2.1 集成學習 3 2.2 袋裝法 4 2.3 提升法 6 2.4 袋裝法與提升法效能之比較 9 2.5 小節 13 第三章 研究方法 14 3.1 檢驗袋裝法及提升法的分類效能 14 3.2 分析集成模型與基本模型的準確率變化 17 3.3 基本模型預測正確平均數分析 21 3.4 資料集檢驗結果的比較分析 23 第四章 實證研究 24 4.1 資料集介紹 24 4.2 集成模型與基本模型準確率變化之評估 28 4.2.1 基本模型的迴歸分析檢定結果 28 4.2.2 子集成模型的迴歸分析檢定結果 30 4.3 基本模型預測正確平均數分析結果 33 4.4 小結 34 第五章 結論與建議 36 5.1 結論 36 5.2 建議 36 參考文獻 38 附錄 42

    Abbas, S. A., Rehman, A. U., Majeed, F., Majid, A., Malik, M. S. A., Kazmi, Z. H., & Zafar, S. (2020). Performance analysis of classification algorithms on birth dataset. IEEE Access, 8, 102146-102154.

    Adnan, A., Yolanda, A. M., & Natasya, F. (2021). A comparison of bagging and boosting on classification data: Case study on rainfall data in Sultan Syarif Kasim II meteorological station in Pekanbaru. In Journal of Physics: Conference Series, 2049, 1, 12-53.

    Ali, H. A., Mohamed, C., Abdelhamid, B., Ourdani, N., & El Alami, T. (2022). A comparative evaluation use bagging and boosting ensemble classifiers. In 2022 International Conference on Intelligent Systems and Computer Vision (ISCV), 1-6.

    Balakrishnan, B. (2021). A comprehensive performance analysis of various classifier models for coronary artery disease prediction. International Journal of Cognitive Informatics and Natural Intelligence, 15(4), 1-14.

    Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123-140.

    Breiman, L. (2001). Random forests. Machine Learning, 45, 5-32.

    Dewangan, S., Rao, R. S., Mishra, A., & Gupta, M. (2022). Code smell detection using ensemble machine learning algorithms. Applied Sciences, 12(20), 10321.

    Dietterich, T. G. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40, 139-157.

    Dong, X., Yu, Z., Cao, W., Shi, Y., & Ma, Q. (2020). A survey on ensemble learning. Frontiers of Computer Science, 14, 241-258.

    Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. Proceeding of the Thirteenth International Conference on Machine Learning, 96, 148-156.

    González, S., García, S., Del Ser, J., Rokach, L., & Herrera, F. (2020). A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities. Information Fusion, 64, 205-237.

    Grimaldi, P., Lorenzati, M., Ribodino, M., Signorino, E., Buffo, A., & Berchialla, P. (2023). Predicting Astrocytic Nuclear Morphology with Machine Learning: A Tree Ensemble Classifier Study. Applied Sciences, 13(7), 4289.

    Hassen, O.-A., Abu, N.-A., Abidin, Z.-Z., & Darwish, S.-M. (2022). Realistic smile expression recognition approach using ensemble classifier with enhanced bagging. Computers, Materials & Continua, 70(2), 2453-2469.

    Hossain, M. A. & Islam, M. S. (2023). Ensuring network security with a robust intrusion detection system using ensemble-based machine learning. Array, 19, 100-306.

    Jafarzadeh, H., Mahdianpari, M., Gill, E., Mohammadimanesh, F., & Homayouni, S. (2021). Bagging and boosting ensemble classifiers for classification of multispectral, hyperspectral and PolSAR data: a comparative evaluation. Remote Sensing, 13(21), 4405.

    Juez-Gil, M., Arnaiz-González, Á., Rodríguez, J. J., & García-Osorio, C. (2021). Experimental evaluation of ensemble classifiers for imbalance in big data. Applied Soft Computing, 108, 107447.

    Khoshgoftaar, T. M., Van Hulse, J., & Napolitano, A. (2010). Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 41(3), 552-568.

    Li, N., Yu, Y., & Zhou, Z.-H. (2012). Diversity regularized ensemble pruning. Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2012, Bristol, UK, September 24-28, 2012. Proceedings, Part I 23, 330-345.

    Li, Y., & Chen, W. (2020). A comparative performance assessment of ensemble learning for credit scoring. Mathematics, 8(10), 1756.

    Md, A. Q., Kulkarni, S., Joshua, C. J., Vaichole, T., Mohan, S., & Iwendi, C. (2023). Enhanced preprocessing approach using ensemble machine learning algorithms for detecting liver disease. Biomedicines, 11(2), 581.

    Mienye, I. D. & Sun, Y. (2022). A survey of ensemble learning: Concepts, algorithms, applications, and prospects. IEEE Access, 10, 99129-99149.

    Sabzevari, M., Martínez-Muñoz, G., & Suárez, A. (2022). Correction to: Building heterogeneous ensembles by pooling homogeneous ensembles. International Journal of Machine Learning and Cybernetics, 13, 551-558.

    Sadorsky, P. (2021). Predicting gold and silver price direction using tree-based classifiers. Journal of Risk and Financial Management, 14(5), 198.

    Sawarn, A., & Gupta, M. (2020). Comparative analysis of bagging and boosting algorithms for sentiment analysis. Procedia Computer Science, 173, 210-215.

    Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5, 197-227.

    Taser, P. Y. (2021). Application of bagging and boosting approaches using decision tree-based algorithms in diabetes risk prediction. Proceedings, 74, 1-6.

    Tiwari, D., Nagpal, B., Bhati, B. S., Mishra, A., & Kumar, M. (2023). A systematic review of social network sentiment analysis with comparative study of ensemble-based techniques. Artificial Intelligence Review, 56, 13407-13461.

    Turan, S. C. & Cengiz, M. A. (2022). Ensemble learning algorithms. Journal of Science & Arts, 22(2), 459-470.

    Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241-259.

    Wong, T.-T. (2017). Parametric methods for comparing the performance of two classification algorithms evaluated by k-fold cross validation on multiple data sets. Pattern Recognition, 65, 97-107.

    Yaman, M. A., Rattay, F., & Subasi, A. (2021). Comparison of bagging and boosting ensemble machine learning methods for face recognition. Procedia Computer Science, 194, 202-209.

    Zhao, C., Wu, D., Huang, J., Yuan, Y., Zhang, H.-T., Peng, R., & Shi, Z. (2022). BoostTree and BoostForest for ensemble learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(7), 8110-8126.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE