| 研究生: |
陳昱嘉 Chen, Yu-Chia |
|---|---|
| 論文名稱: |
隨機生成決策樹以進行集成學習之研究 A Study on Ensemble Learning with Randomly-generated Decision Trees |
| 指導教授: |
翁慈宗
Wong, Tzu-Tsung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 工業與資訊管理學系 Department of Industrial and Information Management |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 中文 |
| 論文頁數: | 39 |
| 中文關鍵詞: | 集成學習 、決策樹 、隨機森林 、多樣性 、機器學習 |
| 外文關鍵詞: | Base model, classification, decision tree, ensemble learning, random forest |
| 相關次數: | 點閱:128 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
摘要
集成學習為機器學習中結合多個基本分類模型之預測共同做出決策的方法泛稱,其創造靈感來自於多數決,多名決策者共同討論而做出的判斷往往會比單一決策者做的決策更全面且正確,即使某一筆資料被部分模型分類錯誤,仍有許多基本模型可以正確分類。因此,在集成投票整合所有基本模型的預測結果後,能夠發揮團體優勢,互相輔助,達到比單一模型更好的預測正確率。
而基本模型之正確率及多樣性是影響集成模型效能的主要原因,而現今主流上使用袋裝法、提升法產生基本模型間之多樣性的方式,仍倚賴原始資料產生基本模型,導致基本模型之間存在著相關性,因此本研究將捨棄透過學習訓練資料的方式產生決策樹模型,改以使用資料的架構隨機生成基本模型,跳脫原始資料對多樣性的束縛。為了保證這些未使用資料訓練彼此之間互相獨立的決策樹模型在集成過後可以發揮比單棵決策樹更好的效果,將透過設立門檻值過濾分類正確率的方式,篩選出有適當正確率的決策樹模型進行集成,探討保持了最大程度獨立性以及基本正確率的決策樹模型集成過後是否能發揮比以往文獻主流的集成演算法更好的效能。
在本實驗選擇的資料集結果中,在二類別資料集上單純隨機生成的決策樹在集成過後產生的集成模型可以獲得不錯的預測效果,但在多類別資料集上效果並不佳,然而不論是在二類別還是多類別資料集,隨機生成的決策樹基本模型在選擇與隨機森林集成過後都可以擔任輔助的角色,進而提升隨機森林的正確率。而在運算時間上,本研究提出的隨機生成決策樹基本模型建構集成模型的方式,雖然在運算時間不如其他決策樹集成模型優秀,但透過使用平行運算可以大幅縮短其時間成本,換取更高的預測正確率。
關鍵字:集成學習、決策樹、隨機森林、多樣性、機器學習
Summary:
Ensemble learning is a method in machine learning, that combines the predictions of multiple base models to make decisions. An ensemble model can thus achieve a better prediction accuracy than a single base model. The accuracy and diversity of base models are the primary factors for the performance of ensemble models. The well-known bagging and boosting approaches rely on the same original data to produce base models so that the diversity among base models is limited. This study aims to generate decision tree models without training data. The classification models generated in this way can enhance the independence of their predictions. An ensemble algorithm with randomly generated decision trees is proposed and tested on 30 data sets for performance evaluation. The experimental results showed an ensemble model composed of all randomly generated decision trees can achieve the highest accuracy only in three data sets. However, when an ensemble model contains both randomly generated decision trees and the ones induced from the bagging approach, it can have the highest average accuracy with respect to the other four ensemble algorithms. The computational cost of the proposed ensemble algorithm is high, while parallel computing can be used to greatly enhance its computational efficiency.
邱于婷(2022)。基本模型修剪方式對集成學習效能和效率之影響。國立成功大 學工業與資訊管理研究所碩士班碩士論文。
Bi, Y. (2012). The impact of diversity on the accuracy of evidential classifier ensembles. International Journal of Approximate Reasoning, 53(4), 584-607.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
Chen, W., Li, Y., Xue, W., Shahabi, H., Li, S., Hong, H., ... & Ahmad, B. B. (2020). Modeling flood susceptibility using data-driven approaches of naïve bayes tree, alternating decision tree, and random forest methods. Science of The Total Environment, 701, 134979.
Dai, Q., & Han, X. (2016). An efficient ordering-based ensemble pruning algorithm via dynamic programming. Applied Intelligence, 44(4), 816-830.
Dai, Q., Ye, R., & Liu, Z. (2017). Considering diversity and accuracy simultaneously for ensemble pruning. Applied Soft Computing, 58, 75-91.
Dietterich, T. G. (2000). Ensemble methods in machine learning. Proceedings of the 1st International Workshop on Multiple Classifier Systems. 1-15. Springer.
El Habib Daho, M., Settouti, N., Bechar, M. E. A., Boublenza, A., & Chikh, M. A. (2021). A new correlation-based approach for ensemble selection in random forests. International Journal of Intelligent Computing and Cybernetics, 14(2), 251-268.
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119-139.
Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63, 3-42.
Guo, H., Liu, H., Li, R., Wu, C., Guo, Y., & Xu, M. (2018). Margin & diversity based ordering ensemble pruning. Neurocomputing, 275, 237-246.
Han, S., Kim, H., & Lee, Y.-S. (2020). Double random forest. Machine Learning, 109(8), 1569-1586.
Jan, Z., Munos, J. C., & Ali, A. (2020). A novel method for creating an optimized ensemble classifier by introducing cluster size reduction and diversity. IEEE Transactions on Knowledge and Data Engineering, 34(7), 3072-3081.
Khan, Z., Gul, A., Perperoglou, A., Miftahuddin, M., Mahmoud, O., Adler, W., & Lausen, B. (2020). Ensemble of optimal trees, random forest and random projection ensemble classification. Advances in Data Analysis and Classification, 14, 97-116.
Kuncheva, L. I., & Whitaker, C. J. (2003). Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning, 51(2), 181.
Margineantu, D. D., & Dietterich, T. G. (1997, July). Pruning adaptive boosting. Proceedings of the 14th International Conference on Machine Learning, 97, 211-218, Morgan Kaufmann.
Mitchell, T. M. (1997). Machine Learning. New York: McGraw Hill.
Pang-Ning, T., Steinbach, M., & Kumar, V. (2006). Introduction to data mining: Pearson Addison Wesley Boston.
Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3), 21-45.
Rodriguez, J. J., Kuncheva, L. I., & Alonso, C. J. (2006). Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1619-1630.
Sagi, O., & Rokach, L. (2018). Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4), e1249.
Ting, K. M., Wells, J. R., Tan, S. C., Teng, S. W., & Webb, G. I. (2011). Feature-subspace aggregating: ensembles for stable and unstable learners. Machine Learning, 82(3), 375-397.
Wen, G., Hou, Z., Li, H., Li, D., Jiang, L., & Xun, E. (2017). Ensemble of deep neural networks with probability-based fusion for facial expression recognition. Cognitive Computation, 9(5), 597-610.
Wu, Z., & Chen, Y. (2001, August). Genetic algorithm based selective neural network ensemble. Proceedings of the 17th International Joint Conference on Artificial Intelligence. 2, 797-802.
Xu, G., Liu, M., Jiang, Z., Söffker, D., & Shen, W. (2019). Bearing fault diagnosis method based on deep convolutional neural network and random forest ensemble learning. Sensors, 19(5), 1088.
Yerima, S. Y., Sezer, S., & Muttik, I. (2015). High accuracy android malware detection using ensemble learning. IET Information Security, 9(6), 313-320.
Ying, X. (2019, February). An overview of overfitting and its solutions. Journal of Physics: Conference Series 1168(022022)
Ykhlef, H., & Bouchaffra, D. (2017). An efficient ensemble pruning approach based on simple coalitional games. Information Fusion, 34, 28-42.
Zhang, Y., Burer, S., Nick Street, W., Bennett, K. P., & Parrado-Hernández, E. (2006). Ensemble pruning via semi-definite programming. Journal of Machine Learning Research, 7(48), 1315-1338.
Zyblewski, P., & Woźniak, M. (2020). Novel clustering-based pruning algorithms. Pattern Analysis and Applications, 23(3), 1049-1058.
校內:2028-05-31公開