簡易檢索 / 詳目顯示

研究生: 胡雅婷
Hu, Ya-Ting
論文名稱: 考量資料切割方式與基本模型間多樣性之集成分類方法
Ensemble Learning Methods with Data Partitioning and the Diversity among Base Models
指導教授: 翁慈宗
Wong, Tzu-Tsung
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 68
中文關鍵詞: 分類演算法集成學習基本模型多樣性資料切割
外文關鍵詞: base model, classification, data partitioning, diversity, ensemble learning
相關次數: 點閱:114下載:30
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 集成學習是將一種或多種分類方法所訓練出的多個基本模型集合而成的,而影響集成方法的準確度包含了每個基本模型的準確度和多樣性,本研究將以多樣性的觀點來改善集成學習模型的準確度。先前的文獻大多藉由對訓練資料集抽樣產生不同的子訓練資料集來訓練基本模型,但在眾多抽樣方法中大多是採取隨機抽樣,這不保證每個基本模型都是合適的,且幾乎沒有人藉由對每個基本模型進行檢測來提升整體的集成學習模型的分類準確度,因此本研究將提出一種新的流程,利用多種切割方法來挑選子訓練資料集並對訓練出來的基本模型進行檢測來提升集成學習模型的分類準確度。本研究使用UCI中的資料集進行實驗,數據結果發現加入檢測的流程來選擇基本模型所得到的集成學習模型分類準確度較佳,且可以使用較少的基本模型數量就可以達到跟不進行檢測的流程一樣的分類效能,而在切割方法中,同時使用縱向加橫向切割可以有較佳的分類表現,其中又以使用袋裝法加上隨機子空間方法抽取80%的特徵作為切割方法並使用檢測流程能夠達到最好的準確度。

    Ensemble learning is a technique for collecting multiple base models to build an ensemble model that can generally have a better performance than each individual base model. Two main factors that affect the accuracy of an ensemble model are the diversity among base models and the accuracy of each base model. Several methods have been proposed to enhance the diversity, and most of them achieve this goal by designing a way to generate the training sets for base models. This thesis focuses on generating diverse base models to improve the accuracy of ensemble learning. Previous studies had proposed sampling methods to enhance the diversity, while none of them have a mechanism to evaluate the diversity for ensuring that the accuracy of an ensemble model can be improved. This study first introduces several data partitioning methods to generate training sets for inducing base models. Then three processes are proposed to filter base models for improving the accuracy of an ensemble model. The data partitioning methods and the processes for filtering base models are tested on 15 data sets. The experimental results show that the filtering processes can generally find an ensemble model with higher accuracy and less base models. A partitioning method including both attribute selection and instance sampling should be adopted in inducing ensemble models.

    摘要Ⅰ Abstract Ⅱ 致謝Ⅴ 目錄 Ⅵ 表目錄Ⅷ 圖目錄Ⅸ 第一章 緒論1 1.1研究背景與動機1 1.2研究目的2 1.3論文架構2 第二章 文獻探討3 2.1集成學習3 2.2集成學習的多樣性與準確度的關係5 2.3資料切割9 2.3.1樣本切割9 2.3.2特徵切割11 2.4小結12 第三章 研究方法13 3.1研究流程13 3.2挑選子訓練資料集15 3.2.1橫向切割15 3.2.2縱向切割16 3.2.3橫向切割加縱向切割18 3.3基本模型訓練18 3.4檢測方法20 3.4.1多樣性檢測20 3.4.2多樣性加準確度檢測21 3.5結果比較23 第四章 實驗結果24 4.1資料集介紹24 4.2沒有使用檢測來挑選基本模型的流程數據25 4.3以多樣性來挑選基本模型的流程數據30 4.4以多樣性及準確度來挑選基本模型的流程數據36 4.5不同流程間的比較39 4.6小結44 第五章 結論與建議46 5.1結論46 5.2後續發展47 參考文獻48 附錄51

    Bramer, M. (2013). Principles of Data Mining. London: Springer.

    Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123-140.

    Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.

    Coelho, A. L., & Nascimento, D. S. (2010). On the evolutionary design of heterogeneous bagging models. Neurocomputing, 73(16-18), 3319-3322.

    Dara, R. A., Makrehchi, M., & Kamel, M. S. (2009). Filter-based data partitioning for training multiple classifier systems. IEEE Transactions on Knowledge and Data Engineering, 22(4), 508-522.

    Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119-139.

    Frosyniotis, D., Stafylopatis, A., & Likas, A. (2003). A divide-and-conquer method for multi-net classifiers. Pattern Analysis & Applications, 6(1), 32-40.

    García-Pedrajas, N. (2009). Constructing ensembles of classifiers by means of weighted instance selection. IEEE Transactions on Neural Networks, 20(2), 258-277.

    Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832-844.

    Kadkhodaei, H. R., Moghadam, A. M. E., & Dehghan, M. (2020). HBoost: A heterogeneous ensemble classifier based on the Boosting method and entropy measurement. Expert Systems with Applications, 113482.

    Kohavi, R., & Wolpert, D. H. (1996). Bias plus variance decomposition for zero-one loss functions. Machine Learning Proceedings of the Thirteenth International Conference, 275-283. San Francisco: Morgan Kauffman.

    Kuncheva, L. I., Skurichina, M., & Duin, R. P. (2002). An experimental study on diversity for bagging and boosting with linear classifiers. Information Fusion, 3(4), 245-258

    Kuncheva, L. I., & Whitaker, C. J. (2003). Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning, 51(2), 181-207.

    Lee, Y. R., & Kim, H. N. (2018). A data partitioning method for increasing ensemble diversity of an eSVM-based P300 speller. Biomedical Signal Processing and Control, 39, 53-63.

    Masisi, L., Nelwamondo, F. V., & Marwala, T. (2008). The effect of structural diversity of an ensemble of classifiers on classification accuracy. arXiv preprint arXiv:0804.4741.

    Maudes, J., Rodríguez, J. J., García-Osorio, C., & García-Pedrajas, N. (2012). Random feature weights for decision tree ensemble construction. Information Fusion, 13(1), 20-30.

    Mohammed, A. M., Onieva, E., & Woźniak, M. (2019). Vertical and Horizontal Data Partitioning for Classifier Ensemble Learning. International Conference on Computer Recognition Systems, 86-97. Cham, Switzerland: Springer

    Mohammed, A. M., Onieva, E., & Woźniak, M. (2020). Training set selection and swarm intelligence for enhanced integration in multiple classifier systems. Applied Soft Computing, 95, 106568.

    Nascimento, D. S., Canuto, A. M., Silva, L. M., & Coelho, A. L. (2011). Combining different ways to generate diversity in bagging models: An evolutionary approach. Proceedings of International Joint Conference on Neural Networks, San Jose, California, USA, 2235-2242.

    Partridge, D., & Krzanowski, W. (1997). Software diversity: practical statistics for its measurement and exploitation. Information and Software Technology, 39(10), 707-717.

    Rokach, L. (2010). Ensemble-based classifiers. Artificial Intelligence Review, 33(1-2), 1-39.

    Skurichina, M., & Duin, R. P. (2005). Combining feature subsets in feature selection. Multiple Classifier Systems, 165-175. Heidelberg, Berlin: Springer.

    Sluban, B., & Lavrač, N. (2015). Relating ensemble diversity and performance: A study in class noise detection. Neurocomputing, 160, 120-131.

    Tsymbal, A., Puuronen, S., & Patterson, D. (2002). Feature selection for ensembles of simple Bayesian classifiers. International Symposium on Methodologies for Intelligent Systems, 592-600. Heidelberg, Berlin: Springer

    Tuysuzoglu, G., & Birant, D. (2020). Enhanced bagging (eBagging): A novel approach for ensemble learning. International Arab Journal of Information Technology, 17(4), 515-528.

    Wang, G., Ma, J., & Yang, S. (2011). IGF-bagging: Information gain based feature selection for bagging. International Journal of Innovative Computing, Information and Control, 7(11), 6247-6259.

    Wang, G. W., Zhang, C. X., & Guo, G. (2015). Investigating the effect of randomly selected feature subsets on bagging and boosting. Communications in Statistics-Simulation and Computation, 44(3), 636-646.

    Yule, G. U. (1900). On the association of attributes in statistics. Philosophical Transactions of the Royal Society of London, 194, 257-319.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE