簡易檢索 / 詳目顯示

研究生: 蔡哲倫
Tsai, Che-Lun
論文名稱: 適用於多類別資料的含隨機生成模型之集成嵌套二分法
Ensemble Nested Dichotomies with Randomly Generated Base Models for Multi-Class Data
指導教授: 翁慈宗
Wong, Tzu-Tsung
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 68
中文關鍵詞: 集成學習多類別分類簡易貝氏分類器嵌套二分法
外文關鍵詞: Ensemble learning, Multi-Class Classification, Naive Bayes Classifier, Nested Dichotomy
相關次數: 點閱:60下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 以往有關隨機生成模型的研究中,隨機生成簡易貝氏模型的方法,在二元分類的資料集上表現良好,但面對多類別問題時就表現的較不理想。因此,為了改善隨機生成簡易貝氏模型的方法,在多類別資料集表現不佳的情況,本研究想要透過二元化策略中的集成嵌套二分法,將多類別問題分解成多個不同二元分類問題,並在每個二元分類問題上,利用隨機生成簡易貝氏模型的方法,生成需要的基本模型,期望配合集成嵌套二分法的架構,讓隨機生成簡易貝氏模型的方法,原本在多類別資料集表現不佳的情況能得到改善。本研究還提出了另一種新的集成模型,在過往與嵌套二分法相關的方法,袋裝嵌套二分法中,加入隨機生成的模型,探討袋裝法搭配隨機生成的模型,是否能夠提升分類效能。
    本研究提出的隨機生成簡易貝氏模型建構集成嵌套二分法,比起過往的隨機生成簡易貝氏模型建構集成模型,在多類別的資料集上,其平均分類正確率有大幅的提升,且隨著資料類別值個數的增加,兩個方法的分類正確率差距會越來越大。而與其他集成嵌套二分法進行比較,袋裝法混和隨機生成簡易貝氏模型建構集成嵌套二分法,其表現也最為亮眼,在30個多類別資料集的測試當中,在大多數資料集的分類正確率是表現最為優秀的,且其平均分類正確率也最高。雖然本研究所提出的隨機生成方法,運算時間會不如其他的集成嵌套二分法,但這部分能夠搭配併行處理的方式,縮短其運算時間。

    The ensemble algorithms that can randomly generate base models for naïve Bayesian classifier generally have a better performance on bi-class data sets than the other ensemble algorithms. However, the performance of the ensemble models constituted by randomly generated base models becomes inferior on multi-class data sets. This study aims to decompose a classification task for multi-class data into multiple binary classification tasks by using the nested dichotomy strategy. Every base model in an ensemble algorithm with the nested dichotomy strategy is a binary tree in which every internal node has a classification model. The classification model in a binary tree can be either randomly generated or induced from an instance set. This study first proposes a way to build a binary tree in which every classification model is randomly generated for naïve Bayesian classifier. The way to build binary trees is then used design two ensemble algorithms for multi-class data. The experimental results on 30 multi-class data sets show that the one combining randomly-generated base models with the based models produced by the bagging approach has the best performance among six ensemble algorithms with the nested dichotomy strategy. However, it has a high computational cost with respect to the ensemble algorithms that induce classification models from instance sets.

    第一章 緒論 10 1.1 研究動機 10 1.2 研究目的 11 1.3 研究架構 11 第二章 文獻探討 13 2.1 集成學習 13 2.1.1 集成方法 13 2.2 簡易貝氏分類器 16 2.3 二元分類與多類別分類 18 2.4 常見的二元化策略 20 2.4.1 一對多 20 2.4.2 一對一 22 2.4.3 嵌套二分法 23 2.5 嵌套二分法之變形及其延伸 26 2.6 小結 28 第三章 研究方法 29 3.1 研究方法流程 29 3.2 隨機生成二元樹 32 3.3 隨機生成簡易貝氏模型及隨機模型門檻值 36 3.4 集成嵌套二分法 39 3.5 集成模型效能評估 41 第四章 實證研究 46 4.1 資料集介紹 46 4.2 與過去隨機生成簡易貝氏模型進行比較 48 4.3 與其他集成嵌套二分法模型之分類正確率進行比較 51 4.4 隨機生成相對於資料學習對集成模型之分類正確率的影響 55 4.5 隨機生成與訓練生成的運算時間比較 59 4.6 小結 60 第五章 結論與建議 62 5.1 結論 62 5.2 未來展望 63 參考文獻 65

    黃中立(2023)。以簡易貝氏分類器隨機生成基本模型之集成方法。國立成功大學資訊管理研究所碩士論文,台南市。
    徐心縈(2023)。用邏輯斯回歸建構隨機分類模型之集成方法。國立成功大學資訊管理研究所碩士論文,台南市。
    陳昱嘉(2023)。隨機生成決策樹以進行集成學習之研究。國立成功大學資訊管理研究所碩士論文,台南市。
    Bentéjac, C., Csörgő, A., & Martínez-Muñoz, G. (2021). A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 54, 1937-1967.
    Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123-140.
    Breiman, L. (2001). Random forests. Machine Learning, 45, 5-32.
    Clark, P., & Boswell, R. (1991). Rule induction with CN2: Some recent improvements. Machine Learning—EWSL-91: European Working Session on Learning Porto, Portugal, 1991 Proceedings 5 (pp. 151-163). Springer Berlin Heidelberg.
    Dietterich, T. G. (2000). Ensemble methods in machine learning. Proceeding of the 1st International Workshop on Multiple Classifier Systems (p.1-15) Berlin, Heidelberg.
    Dong, L., Frank, E., & Kramer, S. (2005). Ensembles of balanced nested dichotomies for multi-class problems. Knowledge Discovery in Databases: PKDD 2005(p.84-95) ,Springer Berlin Heidelberg.
    Duarte-Villaseñor, M. M., Carrasco-Ochoa, J. A., Martínez-Trinidad, J. F., & Flores Garrido, M. (2012). Nested dichotomies based on clustering. Progress In Pattern Recognition, Image Analysis, Computer Vision, and Applications (p.162-169) Berlin, Heidelberg.
    El Hindi, K., AlSalman, H., Qasem, S., & Al Ahmadi, S. (2018). Building an ensemble of fine-tuned naive Bayesian classifiers for text classification. Entropy, 20(11), 857.
    Frank, E. & Kramer, S. (2004). Ensembles of nested dichotomies for multi-class problems. Proceedings of the Twenty-First International Conference on Machine Learning (p.39), ACM, New York, NY, USA.
    Freund, Y. & Schapire, R. E. (1995). A desicion-theoretic generalization of on-line learning and an application to boosting. European conference on computational learning theory (p. 23-37) Berlin, Heidelberg:
    Fürnkranz, J. (2002). Round robin classification. The Journal of Machine Learning Research, 2, 721-747.
    Fürnkranz, J. (2003). Round robin ensembles. Intelligent Data Analysis, 7(5), 385-403.
    Galar, M., Fernández, A., Barrenechea, E., Bustince, H., & Herrera, F. (2011). An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes. Pattern Recognition, 44(8), 1761-1776.
    García-Pedrajas, N. & Ortiz-Boyer, D. (2011). An empirical study of binary classifier fusion methods for multiclass classification. Information Fusion, 12(2), 111-130.
    Hsu, C.-W. & Lin, C.-J. (2002). A comparison of methods for multiclass support vector machines. IEEE transactions on Neural Networks, 13(2), 415-425.
    Klement, W., Wilk, S., Michalowski, W., Farion, K. J., Osmond, M. H., & Verter, V. (2012). Predicting the need for CT imaging in children with minor head injury using an ensemble of naive Bayes classifiers. Artificial Intelligence In Medicine, 54(3), 163-170.
    Knerr, S., Personnaz, L., & Dreyfus, G. (1990). Single-layer learning revisited: a stepwise procedure for building and training a neural network. Neurocomputing: Algorithms, Architectures and Applications (p. 41-50). Springer Berlin Heidelberg.
    Ku Abd. Rahim, K. N., Elamvazuthi, I., Izhar, L. I., & Capi, G. (2018). Classification of human daily activities using ensemble methods based on smartphone inertial sensors. Sensors, 18(12), 4132.
    Leathart, T., Pfahringer, B., & Frank, E. (2016). Building ensembles of adaptive nested dichotomies with random-pair selection. Machine Learning and Knowledge Discovery in Databases(p.179-194).
    Melnikov, V. & Hüllermeier, E. (2018). On the effectiveness of heuristics for learning nested dichotomies: An empirical analysis. Machine Learning, 107, 1537-1560.
    Ndirangu, D., Mwangi, W., & Nderu, L. (2019). An ensemble model for multiclass classification and outlier detection method in data mining. Journal of Information Engineering and Applications, 9(2), 38-42.
    Pham, B. T., Bui, D. T., Dholakia, M., Prakash, I., Pham, H. V., Mehmood, K., & Le, H. Q. (2017). A novel ensemble classifier of rotation forest and Naïve Bayer for landslide susceptibility assessment at the Luc Yen district, Yen Bai Province (Viet Nam) using GIS. Geomatics, Natural Hazards and Risk, 8(2), 649-671.
    Pham, B. T., Jaafari, A., Van Phong, T., Mafi-Gholami, D., Amiri, M., Van Tao, N., Duong, V.-H., & Prakash, I. (2021). Naïve Bayes ensemble models for groundwater potential mapping. Ecological Informatics, 64, 101389.
    Rifkin, R. & Klautau, A. (2004). In defense of one-vs-all classification. The Journal of Machine Learning Research, 5, 101-141.
    Rodríguez, J. J., García-Osorio, C., & Maudes, J. (2010). Forests of nested dichotomies. Pattern Recognition Letters, 31(2), 125-132.
    Samat, A., Yokoya, N., Du, P., Liu, S., Ma, L., Ge, Y., Issanova, G., Saparov, A., Abuduwaili, J., & Lin, C. (2019). Direct, ECOC, ND and END frameworks— which one is the best? An empirical study of Sentinel-2A MSIL1C image classification for arid-land vegetation mapping in the Ili river delta, Kazakhstan. Remote Sensing, 11(16), 1953.
    Schapire, R. E. (2013). Explaining adaboost. Empirical Inference: Festschrift In Honor Of Vladimir N. Vapnik (p.37-52). Springer. Scheurer, S., Tedesco, S., Brown, K. N., & O’Flynn, B. (2020). Using domain knowledge for interpretable and competitivemulti-class human activity recognition. Sensors, 20(4), 1208.
    Tama, B. A. & Comuzzi, M. (2019). An empirical comparison of classification techniques for next event prediction using business process event logs. Expert Systems with Applications, 129, 233-245.
    Wang, S. & Yao, X. (2012). Multiclass imbalance problems: Analysis and potential solutions. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 42(4), 1119-1130.
    Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241- 259.
    Zhou, Z.-H. (2012). Ensemble Methods: Foundations and Algorithms. CRC press.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE