簡易檢索 / 詳目顯示

研究生: 陳昱嘉
Chen, Yu-Chia
論文名稱: 隨機生成決策樹以進行集成學習之研究
A Study on Ensemble Learning with Randomly-generated Decision Trees
指導教授: 翁慈宗
Wong, Tzu-Tsung
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業與資訊管理學系
Department of Industrial and Information Management
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 39
中文關鍵詞: 集成學習決策樹隨機森林多樣性機器學習
外文關鍵詞: Base model, classification, decision tree, ensemble learning, random forest
相關次數: 點閱:128下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 摘要
    集成學習為機器學習中結合多個基本分類模型之預測共同做出決策的方法泛稱,其創造靈感來自於多數決,多名決策者共同討論而做出的判斷往往會比單一決策者做的決策更全面且正確,即使某一筆資料被部分模型分類錯誤,仍有許多基本模型可以正確分類。因此,在集成投票整合所有基本模型的預測結果後,能夠發揮團體優勢,互相輔助,達到比單一模型更好的預測正確率。
    而基本模型之正確率及多樣性是影響集成模型效能的主要原因,而現今主流上使用袋裝法、提升法產生基本模型間之多樣性的方式,仍倚賴原始資料產生基本模型,導致基本模型之間存在著相關性,因此本研究將捨棄透過學習訓練資料的方式產生決策樹模型,改以使用資料的架構隨機生成基本模型,跳脫原始資料對多樣性的束縛。為了保證這些未使用資料訓練彼此之間互相獨立的決策樹模型在集成過後可以發揮比單棵決策樹更好的效果,將透過設立門檻值過濾分類正確率的方式,篩選出有適當正確率的決策樹模型進行集成,探討保持了最大程度獨立性以及基本正確率的決策樹模型集成過後是否能發揮比以往文獻主流的集成演算法更好的效能。
    在本實驗選擇的資料集結果中,在二類別資料集上單純隨機生成的決策樹在集成過後產生的集成模型可以獲得不錯的預測效果,但在多類別資料集上效果並不佳,然而不論是在二類別還是多類別資料集,隨機生成的決策樹基本模型在選擇與隨機森林集成過後都可以擔任輔助的角色,進而提升隨機森林的正確率。而在運算時間上,本研究提出的隨機生成決策樹基本模型建構集成模型的方式,雖然在運算時間不如其他決策樹集成模型優秀,但透過使用平行運算可以大幅縮短其時間成本,換取更高的預測正確率。
    關鍵字:集成學習、決策樹、隨機森林、多樣性、機器學習

    Summary:
    Ensemble learning is a method in machine learning, that combines the predictions of multiple base models to make decisions. An ensemble model can thus achieve a better prediction accuracy than a single base model. The accuracy and diversity of base models are the primary factors for the performance of ensemble models. The well-known bagging and boosting approaches rely on the same original data to produce base models so that the diversity among base models is limited. This study aims to generate decision tree models without training data. The classification models generated in this way can enhance the independence of their predictions. An ensemble algorithm with randomly generated decision trees is proposed and tested on 30 data sets for performance evaluation. The experimental results showed an ensemble model composed of all randomly generated decision trees can achieve the highest accuracy only in three data sets. However, when an ensemble model contains both randomly generated decision trees and the ones induced from the bagging approach, it can have the highest average accuracy with respect to the other four ensemble algorithms. The computational cost of the proposed ensemble algorithm is high, while parallel computing can be used to greatly enhance its computational efficiency.

    摘要I 英文延伸摘要II 致謝VI 目錄VII 表目錄IX 圖目錄X 第一章 緒論1 1.1研究動機1 1.2研究目的2 1.3研究架構3 第二章 文獻探討4 2.1集成學習4 2.2集成學習準確度和基本模型間多樣性之關係5 2.3決策樹以及隨機森林6 2.3.1 決策樹適合做為集成模型之因素7 2.3.2 隨機森林8 2.4集成修剪9 2.4.1 聚類挑選9 2.4.2 排序挑選10 2.4.3 優化挑選11 2.5小結11 第三章 研究方法12 3.1研究流程12 3.2隨機產生決策樹13 3.3挑選決策樹型形成集成模型及效能評估21 3.4結果比較22 第四章 實驗結果24 4.1資料集介紹24 4.2 BDTR與隨機森林集成之效能探討26 4.3 BDTR與其他集成模型之效能比較28 4.4 隨機生成與訓練生成的運算時間之比較32 4.5小結34 第五章 結論與建議35 5.1結論35 5.2未來展望36 參考文獻37

    邱于婷(2022)。基本模型修剪方式對集成學習效能和效率之影響。國立成功大 學工業與資訊管理研究所碩士班碩士論文。
    Bi, Y. (2012). The impact of diversity on the accuracy of evidential classifier ensembles. International Journal of Approximate Reasoning, 53(4), 584-607.

    Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.

    Chen, W., Li, Y., Xue, W., Shahabi, H., Li, S., Hong, H., ... & Ahmad, B. B. (2020). Modeling flood susceptibility using data-driven approaches of naïve bayes tree, alternating decision tree, and random forest methods. Science of The Total Environment, 701, 134979.

    Dai, Q., & Han, X. (2016). An efficient ordering-based ensemble pruning algorithm via dynamic programming. Applied Intelligence, 44(4), 816-830.

    Dai, Q., Ye, R., & Liu, Z. (2017). Considering diversity and accuracy simultaneously for ensemble pruning. Applied Soft Computing, 58, 75-91.

    Dietterich, T. G. (2000). Ensemble methods in machine learning. Proceedings of the 1st International Workshop on Multiple Classifier Systems. 1-15. Springer.

    El Habib Daho, M., Settouti, N., Bechar, M. E. A., Boublenza, A., & Chikh, M. A. (2021). A new correlation-based approach for ensemble selection in random forests. International Journal of Intelligent Computing and Cybernetics, 14(2), 251-268.

    Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119-139.

    Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63, 3-42.

    Guo, H., Liu, H., Li, R., Wu, C., Guo, Y., & Xu, M. (2018). Margin & diversity based ordering ensemble pruning. Neurocomputing, 275, 237-246.

    Han, S., Kim, H., & Lee, Y.-S. (2020). Double random forest. Machine Learning, 109(8), 1569-1586.

    Jan, Z., Munos, J. C., & Ali, A. (2020). A novel method for creating an optimized ensemble classifier by introducing cluster size reduction and diversity. IEEE Transactions on Knowledge and Data Engineering, 34(7), 3072-3081.

    Khan, Z., Gul, A., Perperoglou, A., Miftahuddin, M., Mahmoud, O., Adler, W., & Lausen, B. (2020). Ensemble of optimal trees, random forest and random projection ensemble classification. Advances in Data Analysis and Classification, 14, 97-116.
    Kuncheva, L. I., & Whitaker, C. J. (2003). Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning, 51(2), 181.

    Margineantu, D. D., & Dietterich, T. G. (1997, July). Pruning adaptive boosting. Proceedings of the 14th International Conference on Machine Learning, 97, 211-218, Morgan Kaufmann.

    Mitchell, T. M. (1997). Machine Learning. New York: McGraw Hill.

    Pang-Ning, T., Steinbach, M., & Kumar, V. (2006). Introduction to data mining: Pearson Addison Wesley Boston.

    Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3), 21-45.

    Rodriguez, J. J., Kuncheva, L. I., & Alonso, C. J. (2006). Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1619-1630.

    Sagi, O., & Rokach, L. (2018). Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4), e1249.

    Ting, K. M., Wells, J. R., Tan, S. C., Teng, S. W., & Webb, G. I. (2011). Feature-subspace aggregating: ensembles for stable and unstable learners. Machine Learning, 82(3), 375-397.

    Wen, G., Hou, Z., Li, H., Li, D., Jiang, L., & Xun, E. (2017). Ensemble of deep neural networks with probability-based fusion for facial expression recognition. Cognitive Computation, 9(5), 597-610.

    Wu, Z., & Chen, Y. (2001, August). Genetic algorithm based selective neural network ensemble. Proceedings of the 17th International Joint Conference on Artificial Intelligence. 2, 797-802.

    Xu, G., Liu, M., Jiang, Z., Söffker, D., & Shen, W. (2019). Bearing fault diagnosis method based on deep convolutional neural network and random forest ensemble learning. Sensors, 19(5), 1088.

    Yerima, S. Y., Sezer, S., & Muttik, I. (2015). High accuracy android malware detection using ensemble learning. IET Information Security, 9(6), 313-320.

    Ying, X. (2019, February). An overview of overfitting and its solutions. Journal of Physics: Conference Series 1168(022022)

    Ykhlef, H., & Bouchaffra, D. (2017). An efficient ensemble pruning approach based on simple coalitional games. Information Fusion, 34, 28-42.

    Zhang, Y., Burer, S., Nick Street, W., Bennett, K. P., & Parrado-Hernández, E. (2006). Ensemble pruning via semi-definite programming. Journal of Machine Learning Research, 7(48), 1315-1338.

    Zyblewski, P., & Woźniak, M. (2020). Novel clustering-based pruning algorithms. Pattern Analysis and Applications, 23(3), 1049-1058.

    無法下載圖示 校內:2028-05-31公開
    校外:2028-05-31公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE