簡易檢索 / 詳目顯示

研究生: 陳毓潔
Chen, Yu-Chieh
論文名稱: 適用於二類別資料之基於正確率最佳化的集成挑選法
An Ensemble Selection Method Based on Accuracy Optimization for Bi-class data
指導教授: 翁慈宗
Wong, Tzu-Tsung
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業與資訊管理學系
Department of Industrial and Information Management
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 61
中文關鍵詞: 集成學習集成挑選簡易貝氏二元整數規劃
外文關鍵詞: ensemble learning, ensemble selection, naïve Bayes classifier, binary integer programming
相關次數: 點閱:32下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 集成學習是透過特定的集成演算法產生多個基本模型,並將多個基本模型的預測結果整合成一個最終的預測結果。為了使用少量的基本模型就能維持良好的預測性能,集成挑選成為關注的焦點。過往基於最佳化的集成挑選方法,因為欲挑選的基本模型彼此間存在相關性,所以目標式會同時考慮正確率和多樣性。在有了隨機生成基本模型的方法後,能使基本模型彼此間相互獨立,直接滿足多樣性,因此,本研究提出的集成挑選法為基於正確率的最佳化模型,其適用於二類別資料集,從袋裝法和隨機生成演算法生成的基本模型中,挑選出最佳的基本模型組合。由於最佳化模型的限制式數量取決於資料筆數,因此建立資料過濾機制,以提升模型求解的效率。
    從20個二類別資料集的結果表明,袋裝法、隨機生成法與袋裝混和隨機生成皆能透過資料過濾機制,過濾掉至少85%的最佳化模型的限制式。分類效能的部分,在隨機生成法和袋裝混和隨機生成的情況下,本研究的最佳化模型得到的平均分類正確率,皆優於同時考慮正確率和多樣性的比較文獻所得出的平均分類正確率。此外,袋裝混和隨機生成在平均分類正確率的表現最優異,證實了混和基本模型再進行集成挑選能有效提升集成模型的分類效能。即便增加基本模型的數量,本研究的最佳化模型在隨機生成法和袋裝混和隨機生成所得到的分類正確率依然能維持其優勢,證實了在基本模型間完全獨立或部分獨立的條件下,本研究的集成挑選方法之可擴展性。在執行時間方面,最佳化模型的求解時間會受決策變數和限制式數量多寡的影響,導致平均執行時間較長,運算效率相對不穩定,未來可採用啟發式演算法進行求解,以提升模型求解的效率。

    Ensemble learning integrates multiple base models to improve classification performance. Ensemble selection is a popular technique to identify a better subset of base models to reduce model redundancy. Since the base models produced by current ensemble algorithms are dependent, ensemble selection methods need to consider both prediction accuracy and the diversity among base models. The base models generated randomly are independent, and considering diversity in ensemble selection is not necessary for this kind of base models. When there are only two class values in a data set, this study proposes an optimization model that considers only accuracy for ensemble selection. Since the number of constraints in the optimization model depends on the number of training instances, a data filtering mechanism is established to reduce model complexity for improving computational efficiency. The experimental results on twenty bi-class data sets show that when independent base models for naïve Bayesian classifier are involved in ensemble selection, the proposed method outperforms the optimization method proposed by a previous study, that considers both accuracy and diversity. The filtering mechanism can remove at least 85% constraints from the optimization model regardless of the method for ensemble selection.

    摘要 I 致謝 V 目錄 VI 表目錄 VIII 圖目錄 IX 第一章 緒論 1 1.1研究背景與動機 1 1.2研究目的 3 1.3研究架構 3 第二章 文獻探討 5 2.1 集成學習 5 2.1.1 袋裝法 6 2.1.2 提升法 7 2.1.3 隨機生成基本模型之演算法 8 2.2 簡易貝氏集成模型 9 2.2.1 簡易貝氏分類模型 9 2.2.2 簡易貝氏分類模型之集成應用 10 2.3 集成挑選 10 2.3.1 基於最佳化挑選 11 2.3.2 基於聚類挑選 14 2.3.3 基於排序挑選 14 2.4 二元整數規劃的求解與應用 15 2.4.1 求解二元整數規劃的方法 15 2.4.2 二元整數規劃的應用 16 2.5 小結 17 第三章 研究方法 19 3.1 研究方法流程 19 3.2 生成基本模型 21 3.3 資料過濾 24 3.4 數學模型建構 26 3.5 實驗結果評估 28 第四章 實證結果與分析 30 4.1 資料集介紹 30 4.2 資料過濾之占比 32 4.3 集成挑選後分類正確率之比較 34 4.4 集成挑選執行時間之比較 37 4.5 基本模型數量增加對資料過濾占比、分類正確率和執行時間之影響 39 4.5.1 對資料過濾占比之影響 39 4.5.2 對分類正確率之影響 40 4.5.3 對執行時間之影響 42 4.6 小結 44 第五章 結論與未來展望 45 5.1 結論 45 5.2 未來展望 46 參考文獻 47

    徐心縈,(2023) 用羅吉斯迴歸建構隨機分類模型之集成方法。國立成功大學資訊管理研究所碩士班碩士論文。
    黃中立,(2023) 以簡易貝氏分類器隨機生成基本模型之集成方法。國立成功大學資訊管理研究所碩士班碩士論文。
    廖家德,(2024) 袋裝法和提升法分類準確率的差異分析。國立成功大學資訊管理研究所碩士班碩士論文。
    何政賢,(2024) 以粒子群最佳化方法優化應用於二類別資料之隨機集成演算法。國立成功大學資訊管理研究所碩士班碩士論文。
    Adnan, M. N. & Islam, M. Z. (2016). Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm. Knowledge-Based Systems, 110, 86-97.
    Ali, M. A., Üçüncü, D., Ataş, P. K., & Özöğür-Akyüz, S. (2019). Classification of motor imagery task by using novel ensemble pruning approach. IEEE Transactions on Fuzzy Systems, 28(1), 85-91.
    Balas, E., Glover, F., & Zionts, S. (1965). An Additive Algorithm for Solving Linear Programs with Zero-One Variables. Operations Research, 13(4), 517–549.
    Bian, Y., Wang, Y., Yao, Y., & Chen, H. (2019). Ensemble pruning based on objection maximization with a general distributed framework. IEEE Transactions on Neural Networks and Learning Systems, 31(9), 3766-3774.
    Bozuyla, M. (2021). AdaBoost ensemble learning on top of naive Bayes algorithm to discriminate fake and genuine news from social media. European Journal of Science and Technology, (28), 459-462.
    Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
    Breiman, L. (2001). Random forests. Machine learning, 45, 5-32.
    Cover, T. M. (1999). Elements of Information Theory. John Wiley & Sons.
    Cuenca, J. J., Vanin, M., Hashmi, M. U., Koirala, A., Ergun, H., & Hayes, B. P. (2024). Event-informed identification and allocation of distribution network planning candidates with influence scores and binary linear programming. IEEE Transactions on Power Systems, 1-12
    Dai, Q. & Han, X. (2016). An efficient ordering-based ensemble pruning algorithm via dynamic programming. Applied Intelligence, 44, 816-830.
    De Boer, P. T., Kroese, D. P., Mannor, S., & Rubinstein, R. Y. (2005). A tutorial on the cross-entropy method. Annals of Operations Research, 134, 19-67.
    Eberhart, R. C., & Shi, Y. (2000). Comparing inertia weights and constriction factors in particle swarm optimization. Proceedings of the 2000 Congress on Evolutionary Computation., 1, 84-88.
    Freund, Y. & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119-139.
    Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 1189-1232.
    Golalipour, K., Akbari, E., & Motameni, H. (2024). Cluster ensemble selection based on maximum quality-maximum diversity. Engineering Applications of Artificial Intelligence, 131, 107873.
    Guo, H., Liu, H., Li, R., Wu, C., Guo, Y., & Xu, M. (2018). Margin & diversity based ordering ensemble pruning. Neurocomputing, 275, 237-246.
    Hazra, A., Mandal, S. K., & Gupta, A. (2016). Study and analysis of breast cancer cell detection using naïve Bayes, SVM and ensemble algorithms. International Journal of Computer Applications, 145(2), 39-45.
    Ho, T. K. (1995, August). Random decision forests. Proceedings of 3rd International Conference on Document Analysis and Recognition, 1, 278-282.
    Kumari, S., Kumar, D., & Mittal, M. (2021). An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. International Journal of Cognitive Computing in Engineering, 2, 40-46.
    Lysiak, R., Kurzynski, M., & Woloszynski, T. (2014). Optimal selection of ensemble classifiers using measures of competence and diversity of base classifiers. Neurocomputing, 126, 29-35.
    Manousakis, N. M., Psomopoulos, C. S., Ioannidis, G. C., & Kaminaris, S. D. (2021). A binary integer programming method for optimal wind turbines allocation. Clean Technologies, 3(2), 462-473.
    Margineantu, D. D. & Dietterich, T. G. (1997, July). Pruning adaptive boosting. Proceedings of the 14th International Conference on Machine Learning, 97, 211-218.
    Mienye, I. D. & Sun, Y. (2022). A survey of ensemble learning: Concepts, algorithms, applications, and prospects. IEEE Access, 10, 99129-99149.
    Mohammed, A. M., Onieva, E., & Woźniak, M. (2022). Selective ensemble of classifiers trained on selective samples. Neurocomputing, 482, 197-211.
    Mousavi, R. & Eftekhari, M. (2015). A new ensemble learning methodology based on hybridization of classifier ensemble selection approaches. Applied Soft Computing, 37, 652-666.
    Nguyen, T. T., Luong, A. V., Dang, M. T., Liew, A. W. C., & McCall, J. (2020). Ensemble selection based on classifier prediction confidence. Pattern Recognition, 100, 107104.
    Patel, N. & Trivedi , S. (2020). Choosing optimal locations for temporary health care facilities during health crisis using binary integer programming. Sage Science Review of Applied Machine Learning, 3(2), 1–20.
    Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5(2), 197–227.
    Thockchom, N., Singh, M. M., & Nandi, U. (2023). A novel ensemble learning-based model for network intrusion detection. Complex & Intelligent Systems, 9(5), 5693-5714.
    Tsoumakas, G., Partalas, I., & Vlahavas, I. (2008). A taxonomy and short review of ensemble selection. Proceedings of the Workshop on Supervised and Unsupervised Ensemble Methods and Their Applications, 41-46.
    Wu, Y. C., He, Y. X., Qian, C., & Zhou, Z. H. (2022). Multi-objective evolutionary ensemble pruning guided by margin distribution. Proceedings of the International Conference on Parallel Problem Solving from Nature, 427-441.
    Xue, Y., Tang, T., Pang, W., & Liu, A. X. (2020). Self-adaptive parameter and strategy based particle swarm optimization for large-scale feature selection problems with multiple classifiers. Applied Soft Computing, 88, 106031.
    Yaiprasert, C. & Hidayanto, A. N. (2023). AI-driven ensemble three machine learning to enhance digital marketing strategies in the food delivery business. Intelligent Systems with Applications, 18, 200235.
    Zadeh, S., Ghadiri, M., Mirrokni, V., & Zadimoghaddam, M. (2017). Scalable feature selection via distributed diversity maximization. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1), 2876-2883.
    Zhang, S. & Liu, S. (2019). A discrete improved artificial bee colony algorithm for 0–1 knapsack problem. IEEE Access, 7, 104982-104991.
    Zhou, W., Xia, J., Zhou, F., Fan, L., Lei, X., Nallanathan, A., & Karagiannidis, G. K. (2023). Profit maximization for cache-enabled vehicular mobile edge computing networks. IEEE Transactions on Vehicular Technology, 72(10), 13793-13798.
    Zhou, Z. H. (2012). Ensemble Methods: Foundations and Algorithms. CRC press
    Zyblewski, P. & Woźniak, M. (2020). Novel clustering-based pruning algorithms. Pattern Analysis and Applications, 23(3), 1049-1058.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE