| 研究生: |
邱于婷 Chiu, Yu-Ting |
|---|---|
| 論文名稱: |
基本模型修剪方式對集成學習效能和效率之影響 The Impact of Pruning Approaches for Base Models on Performance and Efficiency in Ensemble Learning |
| 指導教授: |
翁慈宗
Wong, Tzu-Tsung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 工業與資訊管理學系 Department of Industrial and Information Management |
| 論文出版年: | 2022 |
| 畢業學年度: | 110 |
| 語文別: | 中文 |
| 論文頁數: | 47 |
| 中文關鍵詞: | 集成學習 、基本模型 、集成修剪 、多樣性測度 |
| 外文關鍵詞: | basic model, diversity metric, ensemble learning, ensemble pruning |
| 相關次數: | 點閱:83 下載:31 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
集成模型是將數個基本模型整合起來,以提高預測準確率,而影響集成準確率的因素往往與基本模型間的多樣性及各別基本模型的準確性有關,為了使集成有更好的效能,集成修剪將淘汰一些對集成模型準確率提升沒有幫助的基本模型,以較少數量的基本模型在分類中獲得良好的結果,先前的文獻大多使用後修剪來修剪集成,較少使用預修剪,因此本研究同時考量多樣性及準確性來分別設計出一套預修剪及後修剪流程,預修剪藉由對每一回合抽出的基本模型進行檢測來提升集成效能,而後修剪在生成完多個基本模型後,再進入修剪的動作,修剪時採用基於聚類及基於排序的方式,以增加集成模型的多樣性及準確性,希望透過兩套流程探討兩者之間的差異性。本研究使用UCI中的資料集進行實驗,數據結果發現本研究提出的預修剪及後修剪對集成模型準確性的提升是有效的,在效率方面預修剪所需要的時間最長,後修剪次之,而將後修剪與近幾年提出的EPBD的修剪方式相比,發現在集成準確度上及效率上都比EPBD來的佳。
Ensemble learning combines the predictions of multiple base models to enhance
classification accuracy. Two main factors that can affect the performance of ensemble learning are the diversity among base models and the accuracy of each individual base model. Ensemble pruning is a way to select a set of base models that can achieve a better accuracy with a small amount of base models. Most of previous studies use post-pruning to induce ensemble models. This thesis considers both diversity and accuracy to design pre-pruning and post-pruning processes for inducing ensemble models. Pre-pruning employs either diversity or accuracy to determine whether base models should be put into an ensemble model. Post-pruning first generates a pre-specified amount of base models that will be divide into clusters by bisecting k-means. Then the base models with higher accuracies are chosen for constituting an ensemble model. Twenty data sets are downloaded from the UCI data repository to test the two pruning processes. The experimental results show that both pruning processes can effectively enhance the prediction accuracy of an ensemble model, while the computational cost of pre-pruning is higher. With respect to the algorithm EPBD with post-pruning process proposed by a recent study, the post-pruning process proposed by this study is more effective and more efficient.
Abawajy, J. H., Chowdhury, M., & Kelarev, A. (2015). Hybrid consensus pruning of ensemble classifiers for big data malware detection. IEEE Transactions on Cloud Computing, 8(2), 398-407.
Bhardwaj, M. & Bhatnagar, V. (2015). Towards an optimally pruned classifier ensemble. International Journal of Machine Learning and Cybernetics, 6(5), 699-718.
Bian, Y., & Chen, H. (2021). When does diversity help generalization in classification ensembles?. IEEE Transactions on Cybernetics. https://doi.org/10.1109/TCYB.2021.3053165
Bian, Y., Wang, Y., Yao, Y., & Chen, H. (2019). Ensemble pruning based on objection maximization with a general distributed framework. IEEE Transactions on Neural Networks and Learning Systems, 31(9), 3766-3774.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123-140.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
Cao, J., Li, W., Ma, C., & Tao, Z. (2018). Optimizing multi-sensor deployment via ensemble pruning for wearable activity recognition. Information Fusion, 41, 68-79.
Dai, Q. (2013). A competitive ensemble pruning approach based on cross-validation technique. Knowledge-Based Systems, 37, 394-414.
Dai, Q. & Han, X. (2016). An efficient ordering-based ensemble pruning algorithm via dynamic programming. Applied Intelligence, 44(4), 816-830.
Dai, Q., Ye, R., & Liu, Z. (2017). Considering diversity and accuracy simultaneously for ensemble pruning. Applied Soft Computing, 58, 75-91.
Ding, S., Chen, Z., Zhao, S.-y., & Lin, T. (2018). Pruning the ensemble of ANN based on decision tree induction. Neural Processing Letters, 48(1), 53-70.
Dua, D., & Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
Guo, H., Liu, H., Li, R., Wu, C., Guo, Y., & Xu, M. (2018). Margin & diversity based ordering ensemble pruning. Neurocomputing, 275, 237-246.
Guo, L. & Boukir, S. (2013). Margin-based ordered aggregation for ensemble pruning. Pattern Recognition Letters, 34(6), 603-609.
Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832-844.
Jiang, H., Zheng, W., Luo, L., & Dong, Y. (2019). A two-stage minimax concave penalty based method in pruned AdaBoost ensemble. Applied Soft Computing, 83, 105674.
Kohavi, R. & Wolpert, D. H. (1996). Bias plus variance decomposition for zero-one loss functions. Proceedings of the Machine Learning 13th International Conference. 275-283. Morgan Kaufmann.
Li, D. & Wen, G. (2018). MRMR-based ensemble pruning for facial expression recognition. Multimedia Tools and Applications, 77(12), 15251-15272.
Li, D., Wen, G., Hou, Z., Huan, E., Hu, Y., & Li, H. (2019). RTCRelief-F: an effective clustering and ordering-based ensemble pruning algorithm for facial expression recognition. Knowledge and Information Systems, 59(1), 219-250.
Margineantu, D. D. & Dietterich, T. G. (1997). Pruning adaptive boosting. Proceedings of the Machine Learning 14th International Conference. 211-218. Morgan Kaufmann.
Markatopoulou, F., Tsoumakas, G., & Vlahavas, I. (2015). Dynamic ensemble pruning based on multi-label classification. Neurocomputing, 150, 501-512.
Onan, A. (2018). Biomedical text categorization based on ensemble pruning and optimized topic modelling. Computational and Mathematical Methods in Medicine, 2018.
Onan, A., Korukoğlu, S., & Bulut, H. (2017). A hybrid ensemble pruning approach based on consensus clustering and multi-objective evolutionary algorithm for sentiment classification. Information Processing & Management, 53(4), 814-833.
Partridge, D. & Krzanowski, W. (1997). Software diversity: practical statistics for its measurement and exploitation. Information and Software Technology, 39(10), 707-717.
Tan, P.-N., Steinbach, M., Karpatne, A., & Kumar, V.(2019)Introduction to Data Mining(second Edition). Pearson. United Kingdom.
Udny Yule, G. (1900). On the Association of Attributes in Statistics: With Illustrations from the material of the childhood society. Philosophical Transactions of the Royal Society of London Series A, 194, 257-319.
Xia, X., Lin, T., & Chen, Z. (2018). Maximum relevancy maximum complementary based ordered aggregation for ensemble pruning. Applied Intelligence, 48(9), 2568-2579.
Ykhlef, H., & Bouchaffra, D. (2017). An efficient ensemble pruning approach based on simple coalitional games. Information Fusion, 34, 28-42.
Zhang, H. & Cao, L. (2014). A spectral clustering based ensemble pruning approach. Neurocomputing, 139, 289-297.
Zhu, X., Ni, Z., Ni, L., Jin, F., Cheng, M., & Li, J. (2019). Improved discrete artificial fish swarm algorithm combined with margin distance minimization for ensemble pruning. Computers & Industrial Engineering, 128, 32-46.
胡雅婷(2021)。考量資料切割方式與基本模型間多樣性枝集成分類方法。國立成功大學資訊管理研究所碩士論文。