簡易檢索 / 詳目顯示

研究生: 何政賢
Ho, Jeng-Shian
論文名稱: 以粒子群最佳化方法優化應用於二類別資料之隨機集成演算法
Random Ensemble Algorithms with Particle Swarm Optimization for Bi-class Data
指導教授: 翁慈宗
Wong, Tzu-Tsung
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 57
中文關鍵詞: 集成學習獨立性羅吉斯迴歸簡易貝氏粒子群優化演算法
外文關鍵詞: Ensemble learning, independence, logistic regression, naive Bayesian, particle swarm optimization algorithm, randomly-generated base model
相關次數: 點閱:46下載:6
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在集成學習中,若能確保基本模型間的獨立性,以及基本模型的分類正確率,可使集成模型獲得更佳的預測效能,然而過去在集成學習的應用中,基本模型主要是由訓練集重複抽樣所產生的,使得無法滿足基本模型之間的獨立性,導致最終集成模型的預測效能受到限制,因此提出隨機生成羅吉斯迴歸分類模型,與隨機生成簡易貝氏分類模型,以維持基本模型之間的獨立性。研究結果得出,結合隨機基本模型的集成模型,相對於過往的集成模型能夠有效的提升分類正確率,但是因為分類模型是隨機產生的,因此集成模型所需搜索的模型空間相當龐大。然而,若在如此廣闊的空間中任意挑選分類模型,容易找到預測能力不佳的分類模型,導致集成模型在訓練過程中,由於分類模型無法通過其所設置的分類正確率門檻值,使得該分類模型無法被視為基本模型,並加入至集成模型中,因為必須過濾掉大量的分類模型,所以集成模型在訓練效率上相對緩慢。
    為了改善上述集成模型訓練效率緩慢的問題,本研究欲在廣大的模型空間中,藉由粒子群優化演算法基於群體智能的優勢,逐步優化隨機分類模型,提高其通過分類正確率門檻值的機會,以減少其所需生成的數量,進而加速集成模型尋找符合條件的基本模型。實驗結果顯示,透過粒子群優化演算法搜尋隨機分類模型,能使集成模型在分類正確率與訓練效率上皆能有所提升。

    An ensemble model is very likely to achieve a relatively high accuracy if all of its base models are independent in classifying instances. The ensemble algorithms that can generate independent base models are thus proposed, while the way to generate those base models are computationally intensive. This study introduces a method that uses particle swarm optimization (PSO) algorithm to filter base models from a set of randomly-generated classification models. By leveraging the cooperation among particles or classification models, the searching process gradually improve the classification accuracy of each particle to increase its chance for satisfying the accuracy threshold. This searching process can save the effort in generating low-accuracy classification models. Logistic regression and naïve Bayesian classifier are chosen to generate classification models randomly in ensemble learning. The experimental results show that the ensemble models constituted by the based models found by PSO is generally superior to the ones that constituted by the randomly-generated base models in both prediction accuracy and computational efficiency. In particular, the improvement on computation efficiency is more significant for logistic regression.

    摘要 I 致謝 V 目錄 VI 表目錄 VIII 圖目錄 IX 第一章緒論 1 1.1研究背景與動機 1 1.2研究目的 2 1.3論文架構 2 第二章文獻探討 3 2.1集成學習 3 2.2集成修剪 5 2.2.1基於排名法 5 2.2.2基於優化法 6 2.3羅吉斯迴歸集成模型 6 2.3.1羅吉斯迴歸模型 6 2.3.2羅吉斯迴歸分類模型之集成應用 7 2.4簡易貝氏集成模型 8 2.4.1簡易貝氏模型 8 2.4.2簡易貝氏分類模型之集成應用 9 2.5啟發式演算法 9 2.5.1粒子群優化演算法 10 2.5.2基因演算法 13 2.6小結 14 第三章研究方法 15 3.1研究方法流程 15 3.2資料前處理與切割 17 3.3隨機生成羅吉斯迴歸分類模型 17 3.4隨機生成簡易貝氏分類模型 18 3.5粒子群優化演算法 18 3.6實驗結果評估 21 第四章羅吉斯迴歸實證研究 25 4.1資料集介紹-羅吉斯迴歸 25 4.2粒子群優化-隨機羅吉斯迴歸分類模型之集成 25 4.3隨機羅吉斯迴歸集成模型之比較 28 4.4 PSO-EMRLR與其他羅吉斯迴歸相關模型之比較 32 4.5小結 33 第五章 簡易貝氏實證研究 34 5.1資料集介紹-簡易貝氏 34 5.2粒子群優化演算法-隨機簡易貝氏分類模型之集成 34 5.3隨機簡易貝氏集成模型之比較 36 5.4 PSO-BRENB與其他簡易貝氏相關模型之比較 38 5.5小結 40 第六章結論與未來展望 41 6.1結論 41 6.2未來展望 42 參考文獻 43

    徐心縈,(2023)用羅吉斯迴歸建構隨機分類模型之集成方法。國立成功大學資訊管理研究所碩士班碩士論文。

    黃中立,(2023)以簡易貝氏分類器隨機生成基本模型之集成方法。國立成功大學資訊管理研究所碩士班碩士論文。

    Abikoye, O. C., Omokanye, S. O., & Aro, T. O. (2017). Binary text classification using an ensemble of naive bayes and support vector machines. Computer Sciences and Telecommunications(2), 37-45.

    Bansal, J. C. (2019). Particle swarm optimization. Evolutionary and Swarm Intelligence Algorithms, 11-23.

    Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123-140.

    Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.

    Bryll, R., Gutierrez-Osuna, R., & Quek, F. (2003). Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets. Pattern Recognition, 36(6), 1291-1302.

    Cavalcanti, G. D. C., Oliveira, L. S., Moura, T. J. M., & Carvalho, G. V. (2016). Combining diversity measures for ensemble pruning. Pattern Recognition Letters, 74, 38-45.

    Chaudhary, A., Thakur, R., Kolhe, S., & Kamal, R. (2020). A particle swarm optimization based ensemble for vegetable crop disease recognition. Computers and Electronics in Agriculture, 178, 105747.

    Dong, X. B., Yu, Z. W., Cao, W. M., Shi, Y. F., & Ma, Q. L. (2020). A survey on ensemble learning. Frontiers of Computer Science, 14(2), 241-258.

    Eberhart, R. C., & Shi, Y. (2000). Comparing inertia weights and constriction factors in particle swarm optimization. Proceedings of the 2000 Congress on Evolutionary Computation., 1, 84-88.

    Flach, P. A. & Lachiche, N. (2004). Naive Bayesian classification of structured data. Machine Learning, 57, 233-269.

    Freund, Y. & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119-139.

    Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189-1232.

    Grefenstette, J. J. (1986). Optimization of control parameters for genetic algorithms. IEEE Transactions on Systems, Man, and Cybernetics, 16(1), 122-128.

    Hasanpour, H., Meibodi, R. G., & Navi, K. (2019). Optimal selection of ensemble classifiers using particle swarm optimization and diversity measures. Intelligent Decision Technologies-Netherlands, 13(1), 131-137.

    Ho, T. K. (1995). Random decision forests. Proceedings of 3rd International Conference on Document Analysis and Recognition, 1, 278-282.

    Johnson, P., Vandewater, L., Wilson, W., Maruff, P., Savage, G., Graham, P., Macaulay, L. S., Ellis, K. A., Szoeke, C., & Martins, R. N. (2014). Genetic algorithm with logistic regression for prediction of progression to Alzheimer's disease. BMC Bioinformatics, 15, 1-14.

    Katoch, S., Chauhan, S. S., & Kumar, V. (2021). A review on genetic algorithm: past, present, and future. Multimedia Tools and Applications, 80, 8091-8126.

    Kennedy, J. & Eberhart, R. (1995). Particle swarm optimization. Proceedings of ICNN'95-International Conference on Neural Networks, 4, 1942-1948.

    Kleinbaum, D. G., Klein, M., Kleinbaum, D. G., & Klein, M. (2010). Introduction to logistic regression. Logistic Regression: ASelf-learning Text, 1-39.

    Kokash, N. (2005). An introduction to heuristic algorithms. Department of Informatics and Telecommunications, 1-8.

    Krawczyk, B., Minku, L. L., Gama, J., Stefanowski, J., & Wozniak, M. (2017). Ensemble learning for data stream analysis: A survey. Information Fusion, 37, 132-156.

    Kuncheva, L. I. (2014). Combining Pattern Classifiers: Methods and Algorithms. John Wiley & Sons.

    Mienye, I. D., & Sun, Y. X. (2022). A survey of ensemble learning: Concepts, algorithms, applications, and prospects. IEEE Access, 10, 99129-99149.

    Mishra, S., Shaw, K., Mishra, D., Patil, S., Kotecha, K., Kumar, S., & Bajaj, S. (2022). Improving the acuracy of ensemble machine learning classification models using a novel bit-fusion algorithm for healthcare AI systems. Frontiers in Public Health, 10, 8588282.

    Polikar, R. (2012). Ensemble learning. Ensemble Machine Learning: Methods and Applications, 1-34.

    Qasim, O. S. & Algamal, Z. Y. (2018). Feature selection using particle swarm optimization-based logistic regression model. Chemometrics and Intelligent Laboratory Systems, 182, 41-46.

    Rokach, L. (2009). Collective-agreement-based pruning of ensembles. Computational Statistics & Data Analysis, 53(4), 1015-1026.

    Sagi, O. & Rokach, L. (2018). Ensemble learning: A survey. Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery, 8(4).

    Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5(2), 197-227.

    Shi, Y. & Eberhart, R. (1998). A modified particle swarm optimizer. 1998 IEEE International Conference on Evolutionary Computation Proceedings, 69-73.

    Stoltzfus, J. C. (2011). Logistic regression: A brief primer. Academic Emergency Medicine, 18(10), 1099-1104.

    Subramanian, R. S. & Prabha, D. (2022). Ensemble variable selection for naive bayes to improve customer behaviour analysis. Computer Systems Science & Engineering,41(1), 339-355.

    Taha, A. & Barukab, O. (2022). Android malware classification using optimized ensemble learning based on genetic algorithms. Sustainability, 14(21), 14406.

    Tang, E. K., Suganthan, P. N., & Yao, X. (2006). An analysis of diversity measures. Machine Learning, 65(1), 247-271.

    Tsoumakas, G., Partalas, I., & Vlahavas, I. (2009). An ensemble pruning primer. Applications of Supervised and Unsupervised Ensemble Methods, 1-13.

    Wagner, D. B. (1995). Dynamic programming. The Mathematica Journal, 5(4), 42-51.

    Wang, H., Xu, Q. S., & Zhou, L. F. (2015). Large unbalanced credit scoring using lasso-logistic regression ensemble. Plos One, 10(2).

    Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241-259.

    Xie, X., Zhang, W. & Yang, L. (2003). Particle swarm optimization. Control and Decision, 18, 129-134.

    Zhang, H. P. & Wang, M. H. (2009). Search for the smallest random forest. Statistics and its Interface, 2(3), 381-388.

    Zhang, Y., Burer, S., & Street, W. N. (2006). Ensemble pruning via semi-definite programming. Journal of Machine Learning Research, 7, 1315-1338.

    Zhou, Z.-H., Wu, J., & Tang, W. (2002). Ensembling neural networks: many could be better than all. Artificial Intelligence, 137(1-2), 239-263.

    Zhu, B., Qian, C., vanden Broucke, S., Xiao, J., & Li, Y. Y. (2023). A bagging-based selective ensemble model for churn prediction on imbalanced data. Expert Systems with Applications, 227, 120223.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE