| 研究生: |
胡雅婷 Hu, Ya-Ting |
|---|---|
| 論文名稱: |
考量資料切割方式與基本模型間多樣性之集成分類方法 Ensemble Learning Methods with Data Partitioning and the Diversity among Base Models |
| 指導教授: |
翁慈宗
Wong, Tzu-Tsung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 中文 |
| 論文頁數: | 68 |
| 中文關鍵詞: | 分類演算法 、集成學習 、基本模型 、多樣性 、資料切割 |
| 外文關鍵詞: | base model, classification, data partitioning, diversity, ensemble learning |
| 相關次數: | 點閱:114 下載:30 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
集成學習是將一種或多種分類方法所訓練出的多個基本模型集合而成的,而影響集成方法的準確度包含了每個基本模型的準確度和多樣性,本研究將以多樣性的觀點來改善集成學習模型的準確度。先前的文獻大多藉由對訓練資料集抽樣產生不同的子訓練資料集來訓練基本模型,但在眾多抽樣方法中大多是採取隨機抽樣,這不保證每個基本模型都是合適的,且幾乎沒有人藉由對每個基本模型進行檢測來提升整體的集成學習模型的分類準確度,因此本研究將提出一種新的流程,利用多種切割方法來挑選子訓練資料集並對訓練出來的基本模型進行檢測來提升集成學習模型的分類準確度。本研究使用UCI中的資料集進行實驗,數據結果發現加入檢測的流程來選擇基本模型所得到的集成學習模型分類準確度較佳,且可以使用較少的基本模型數量就可以達到跟不進行檢測的流程一樣的分類效能,而在切割方法中,同時使用縱向加橫向切割可以有較佳的分類表現,其中又以使用袋裝法加上隨機子空間方法抽取80%的特徵作為切割方法並使用檢測流程能夠達到最好的準確度。
Ensemble learning is a technique for collecting multiple base models to build an ensemble model that can generally have a better performance than each individual base model. Two main factors that affect the accuracy of an ensemble model are the diversity among base models and the accuracy of each base model. Several methods have been proposed to enhance the diversity, and most of them achieve this goal by designing a way to generate the training sets for base models. This thesis focuses on generating diverse base models to improve the accuracy of ensemble learning. Previous studies had proposed sampling methods to enhance the diversity, while none of them have a mechanism to evaluate the diversity for ensuring that the accuracy of an ensemble model can be improved. This study first introduces several data partitioning methods to generate training sets for inducing base models. Then three processes are proposed to filter base models for improving the accuracy of an ensemble model. The data partitioning methods and the processes for filtering base models are tested on 15 data sets. The experimental results show that the filtering processes can generally find an ensemble model with higher accuracy and less base models. A partitioning method including both attribute selection and instance sampling should be adopted in inducing ensemble models.
Bramer, M. (2013). Principles of Data Mining. London: Springer.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123-140.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
Coelho, A. L., & Nascimento, D. S. (2010). On the evolutionary design of heterogeneous bagging models. Neurocomputing, 73(16-18), 3319-3322.
Dara, R. A., Makrehchi, M., & Kamel, M. S. (2009). Filter-based data partitioning for training multiple classifier systems. IEEE Transactions on Knowledge and Data Engineering, 22(4), 508-522.
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119-139.
Frosyniotis, D., Stafylopatis, A., & Likas, A. (2003). A divide-and-conquer method for multi-net classifiers. Pattern Analysis & Applications, 6(1), 32-40.
García-Pedrajas, N. (2009). Constructing ensembles of classifiers by means of weighted instance selection. IEEE Transactions on Neural Networks, 20(2), 258-277.
Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832-844.
Kadkhodaei, H. R., Moghadam, A. M. E., & Dehghan, M. (2020). HBoost: A heterogeneous ensemble classifier based on the Boosting method and entropy measurement. Expert Systems with Applications, 113482.
Kohavi, R., & Wolpert, D. H. (1996). Bias plus variance decomposition for zero-one loss functions. Machine Learning Proceedings of the Thirteenth International Conference, 275-283. San Francisco: Morgan Kauffman.
Kuncheva, L. I., Skurichina, M., & Duin, R. P. (2002). An experimental study on diversity for bagging and boosting with linear classifiers. Information Fusion, 3(4), 245-258
Kuncheva, L. I., & Whitaker, C. J. (2003). Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning, 51(2), 181-207.
Lee, Y. R., & Kim, H. N. (2018). A data partitioning method for increasing ensemble diversity of an eSVM-based P300 speller. Biomedical Signal Processing and Control, 39, 53-63.
Masisi, L., Nelwamondo, F. V., & Marwala, T. (2008). The effect of structural diversity of an ensemble of classifiers on classification accuracy. arXiv preprint arXiv:0804.4741.
Maudes, J., Rodríguez, J. J., García-Osorio, C., & García-Pedrajas, N. (2012). Random feature weights for decision tree ensemble construction. Information Fusion, 13(1), 20-30.
Mohammed, A. M., Onieva, E., & Woźniak, M. (2019). Vertical and Horizontal Data Partitioning for Classifier Ensemble Learning. International Conference on Computer Recognition Systems, 86-97. Cham, Switzerland: Springer
Mohammed, A. M., Onieva, E., & Woźniak, M. (2020). Training set selection and swarm intelligence for enhanced integration in multiple classifier systems. Applied Soft Computing, 95, 106568.
Nascimento, D. S., Canuto, A. M., Silva, L. M., & Coelho, A. L. (2011). Combining different ways to generate diversity in bagging models: An evolutionary approach. Proceedings of International Joint Conference on Neural Networks, San Jose, California, USA, 2235-2242.
Partridge, D., & Krzanowski, W. (1997). Software diversity: practical statistics for its measurement and exploitation. Information and Software Technology, 39(10), 707-717.
Rokach, L. (2010). Ensemble-based classifiers. Artificial Intelligence Review, 33(1-2), 1-39.
Skurichina, M., & Duin, R. P. (2005). Combining feature subsets in feature selection. Multiple Classifier Systems, 165-175. Heidelberg, Berlin: Springer.
Sluban, B., & Lavrač, N. (2015). Relating ensemble diversity and performance: A study in class noise detection. Neurocomputing, 160, 120-131.
Tsymbal, A., Puuronen, S., & Patterson, D. (2002). Feature selection for ensembles of simple Bayesian classifiers. International Symposium on Methodologies for Intelligent Systems, 592-600. Heidelberg, Berlin: Springer
Tuysuzoglu, G., & Birant, D. (2020). Enhanced bagging (eBagging): A novel approach for ensemble learning. International Arab Journal of Information Technology, 17(4), 515-528.
Wang, G., Ma, J., & Yang, S. (2011). IGF-bagging: Information gain based feature selection for bagging. International Journal of Innovative Computing, Information and Control, 7(11), 6247-6259.
Wang, G. W., Zhang, C. X., & Guo, G. (2015). Investigating the effect of randomly selected feature subsets on bagging and boosting. Communications in Statistics-Simulation and Computation, 44(3), 636-646.
Yule, G. U. (1900). On the association of attributes in statistics. Philosophical Transactions of the Royal Society of London, 194, 257-319.