| 研究生: |
黃中立 Huang, Chung-Li |
|---|---|
| 論文名稱: |
以簡易貝氏分類器隨機生成基本模型之集成方法 Ensemble Algorithms with Randomly Generated Base Models Induced by Naïve Bayesian Classifier |
| 指導教授: |
翁慈宗
Wong, Tzu-Tsung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 中文 |
| 論文頁數: | 62 |
| 中文關鍵詞: | 簡易貝氏分類器 、集成學習 、集成演算法 、集成挑選 |
| 外文關鍵詞: | Classification algorithm, ensemble Learning, ensemble Selection, naïve Bayesian classifier |
| 相關次數: | 點閱:189 下載:21 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
根據以往的研究文獻得知,主流研究使用的集成演算法建構出的簡易貝氏集成模型,其分類預測效能上與單一模型的簡易貝氏分類器相較起來,並無出現顯著的提升狀況,而造成這種現象的原因,則是因為簡易貝氏分類器其演算法的學習機制,使得簡易貝氏分類器是一種相對穩定的分類方法,也是因為其相較穩定的因素,導致簡易貝氏分類器的訓練情況,往往都會受到原始資料集分布的限制,進而使建構的簡易貝氏集成模型的預測能力與單一模型的簡易貝氏分類器有著相同的分類水準。因此本研究秉持著讓簡易貝氏分類器不受限於原始資料集分布的框架下,使用隨機生成模型的研究方式,並透過過濾分類正確率的方式,找出有著適當分類水準的隨機簡易貝氏模型,建構出簡易貝氏集成模型。最後再根據本研究隨機產生簡易貝氏模型所建構出的簡易貝氏集成模型,去探討此集成模型研究方式,是否可以讓簡易貝氏分類器在建構同質型集成模型的情況下,相較於以往文獻主流的集成演算法所建構的簡易貝氏集成模型,有著分類正確率上的提升與改善。本研究提出隨機生成簡易貝氏模型建構出簡易貝氏集成模型的方式,在對於20個二類別的資料集測試中,總共有15個資料集的分類正確率是最優秀的;對於10個多類別的資料集測試中,總共有8個資料集的分類正確率是最優秀的。雖然本研究提出隨機生成簡易貝氏模型的方式,在運算時間上不如其他訓練生成的方式,不過透過併行處理的特性,可以大幅縮短其中的運算時間。最後本研究提出隨機生成模型的方式,在將其作為簡易貝氏集成模型中的輔助角色時,可以獲得相比於其他訓練生成建構的簡易貝氏集成模型,有著更加優秀的分類表現,並且隨機生成模型時若搭配併行處理的特性之後,最終表現出來的分類效能值得令人期待。
Naïve Bayesian classifier is a well-known classification algorithm, and it has been widely used in numerous studies in the past because of its easy implementation, computational efficiency, and competitive classification performance. Since naïve Bayesian classifier is a robust algorithm, it is generally not the first choice in ensemble learning. The ensemble models built by using naïve Bayesian classifier have limit performance improvement with respect to single model induced by this algorithm.
In this study, a novel approach for constructing a classification model without training instances for naïve Bayes classifier is first proposed. The aim is to build an ensemble model composed of randomly generated base models which are not confined to the instances in a data set. The approach for generating independent classification models is employed to design two ensemble algorithms: Totally Random Ensemble Naïve Bayes (TR-ENB) and Bagging Random Ensemble Naïve Bayes (BR-ENB). The two algorithms are tested on 30 data sets to explore whether classification accuracy can be improved. The experimental results showed that TR-ENB and BR-ENB achieved the highest classification accuracy on 15 out of 20 bi-class datasets and 8 out of 10 multi-class data sets. Performance improvement can generally be achieved when the randomly generated base models are combined with some base models induced by the bagging approach.
Ahmad, A., Abujabal, H., & Kumar, C. A. (2017). Random subclasses ensembles by using 1-nearest neighbor framework. International Journal of Pattern Recognition and Artificial Intelligence, 31(10), 1750031.
Alazzam, I., Alsmadi, I., & Akour, M. (2017). Software fault proneness prediction: A comparative study between bagging, boosting, and stacking ensemble and base learner methods. International Journal of Data Analysis Techniques and Strategies, 9(1), 1-16.
Alzubi, O. A., Alzubi, J. A., Alweshah, M., Qiqieh, I., Al-Shami, S., & Ramachandran, M. (2020). An optimal pruning algorithm of classifier ensembles: Dynamic programming approach. Neural Computing and Applications, 32(20), 16091-16107.
Bhardwaj, U. & Sharma, P. (2019). Email spam detection using ensemble methods. International Journal of Recent Technology and Engineering, 8(3), 4148-4153.
Bian, S. & Wang, W. (2007). On diversity and accuracy of homogeneous and heterogeneous ensembles. International Journal of Hybrid Intelligent Systems, 4(2), 103-128.
Bian, Y. & Chen, H. (2021). When does diversity help generalization in classification ensembles. IEEE Transactions on Cybernetics, 52(9), 2168-2267.
Bian, Y., Wang, Y., Yao, Y., & Chen, H. (2019). Ensemble pruning based on objection maximization with a general distributed framework. IEEE Transactions on Neural Networks and Learning Systems, 31(9), 3766-3774.
Boongoen, T. & Iam-On, N. (2018). Cluster ensembles: A survey of approaches with recent extensions and applications. Computer Science Review, 28, 1-25.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123-140.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
Cao, J., Li, W., Ma, C., & Tao, Z. (2018). Optimizing multi-sensor deployment via ensemble pruning for wearable activity recognition. Information Fusion, 41, 68-79.
Dai, Q. & Han, X. (2016). An efficient ordering-based ensemble pruning algorithm via dynamic programming. Applied Intelligence, 44(4), 816-830.
Dai, Q., Ye, R., & Liu, Z. (2017). Considering diversity and accuracy simultaneously for ensemble pruning. Applied Soft Computing, 58, 75-91.
Dietterich, T. G. (2000). Ensemble methods in machine learning. Proceedings of the 1st International Workshop on Multiple Classifier Systems, 1-15, Springer.
Dietterich, T. G. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40(2), 139-157.
Domingos, P. & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29(2), 103-130.
Freund, Y. (1995). Boosting a weak learning algorithm by majority. Information and Computation, 121(2), 256-285.
Freund, Y. & Schapire, R. E. (1996). Experiments with a new boosting algorithm. Proceedings of the Thirteenth International Conference on Machine Learning, 96, 148-156, Morgan Kaufmann.
Govindarajan, M. (2020). Ensemble of classifiers in text categorization. International Journal of Emerging Trends in Engineering Research, 8(1), 41-45.
Guo, H., Liu, H., Li, R., Wu, C., Guo, Y., & Xu, M. (2018). Margin & diversity based ordering ensemble pruning. Neurocomputing, 275, 237-246.
Han, T., Jiang, D., Zhao, Q., Wang, L., & Yin, K. (2018). Comparison of random forest, artificial neural networks and support vector machine for intelligent diagnosis of rotating machinery. Transactions of the Institute of Measurement and Control, 40(8), 2681-2693.
Hand, D. J. & Yu, K. M. (2001). Idiot's Bayes - not so stupid after all. International Statistical Review, 69(3), 385-398.
Hindi, K. M. (2014). Fine tuning the naïve Bayesian learning algorithm. AI Communications, 27(2), 133-141.
Hindi, K. M., AlSalman, H., Qasem, S., & Al Ahmadi, S. (2018). Building an ensemble of fine-tuned naïve Bayesian classifiers for text classification. Entropy, 20(11), 857.
Hu, X. (2001). Using rough sets theory and database operations to construct a good ensemble of classifiers for data mining applications. Proceedings of the 2001 IEEE International Conference on Data Mining, 233-240, IEEE.
Hutagaol, N. & Suharjito, S. (2019). Predictive modelling of student dropout using ensemble classifier method in higher education. Advances in Science, Technology and Engineering Systems Journal, 4(4), 206-211.
Idrees, F., Rajarajan, M., Conti, M., Chen, T. M., & Rahulamathavan, Y. (2017). PIndroid: A novel android malware detection system using ensemble learning methods. Computers and Security, 68, 36-46.
Jan, M. Z. & Verma, B. (2019). A novel diversity measure and classifier selection approach for generating ensemble classifiers. IEEE Access, 7, 156360-156373.
Klement, W., Wilk, S., Michalowski, W., Farion, K. J., Osmond, M. H., & Verter, V. (2012). Predicting the need for CT imaging in children with minor head injury using an ensemble of naïve Bayes classifiers. Artificial Intelligence in Medicine, 54(3), 163-170.
Kuncheva, L. I. (2014). Combining Pattern Classifiers: Methods and Algorithms (2nd edition.). John Wiley & Sons.
Li, K. & Hao, L. (2009). Naïve Bayes ensemble learning based on oracle selection. Proceeding of the 2009 Chinese Control and Decision Conference, 665-670, IEEE.
Lin, C., Chen, W., Qiu, C., Wu, Y., Krishnan, S., & Zou, Q. (2014). LibD3C: Ensemble classifiers with a clustering and dynamic selection strategy. Neurocomputing, 123, 424-435.
Liu, Q., Lu, J., Chen, S., & Zhao, K. (2014). Multiple naïve Bayes classifiers ensemble for traffic incident detection. Mathematical Problems in Engineering, 2014, 1-16.
Liu, Y., Lin, Y., & Chen, Y. (2008). Ensemble classification based on ICA for face recognition. Proceedings of the 2008 Congress on Image and Signal Processing, 144-148, IEEE.
Liu, Y., Yang, Y., & Carbonell, J. (2002). Boosting to correct inductive bias in text classification. Proceedings of the Eleventh International Conference on Information and Knowledge Management, 348-355, Association for Computing Machinery.
Lu, Z., Wu, X., Zhu, X., & Bongard, J. (2010). Ensemble pruning via individual contribution ordering. Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 871-880, Association for Computing Machinery.
Martínez-Muñoz, G., Hernández-Lobato, D., & Suárez, A. (2008). An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(2), 245-259.
Mohammed, A. M., Onieva, E., & Wozniak, M. (2022). Selective ensemble of classifiers trained on selective samples. Neurocomputing, 482, 197-211.
Onan, A., Korukoğlu, S., & Bulut, H. (2017). A hybrid ensemble pruning approach based on consensus clustering and multi-objective evolutionary algorithm for sentiment classification. Information Processing and Management, 53(4), 814-833.
Pham, B. T., Bui, D. T., Dholakia, M., Prakash, I., Pham, H. V., Mehmood, K., & Le, H. Q. (2017). A novel ensemble classifier of rotation forest and naïve Bayer for landslide susceptibility assessment at the Luc Yen district, Yen Bai province (Viet Nam) using GIS. Geomatics, Natural Hazards and Risk, 8(2), 649-671.
Priasni, T. O. & Oswari, T. (2021). Comparative study of standalone classifier and ensemble classifier. Telecommunication Computing Electronics and Control, 19(5), 1747-1754.
Prinzie, A. & Poel, D. V. d. (2007). Random multiclass classification: Generalizing random forests to random MNL and random NB. Proceedings of the 18th International Conference on Database and Expert Systems Applications, 349-358, Springer.
Quinlan, J. R. (1996). Bagging, boosting, and C4.5. Proceedings of the Thirteenth National Conference on Artificial Intelligence, 725-730, AAAI Press.
Riana, D., Hidayanto, A. N., & Fitriyani. (2017). Integration of bagging and greedy forward selection on image pap smear classification using naïve Bayes. Proceedings of the 5th International Conference on Cyber and IT Service Management, 1-7, IEEE.
Rodriguez, J. J., Kuncheva, L. I., & Alonso, C. J. (2006). Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1619-1630.
Sagi, O. & Rokach, L. (2018). Ensemble learning: A survey. Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery, 8(4), 1249-1266.
Sharkey, A. J., Sharkey, N. E., Gerecke, U., & Chandroth, G. O. (2000). The “test and select” approach to ensemble combination. Proceedings of the 1st International Workshop on Multiple Classifier Systems, 30-44, Springer.
Shi, J., Yu, T., Goebel, K., & Wu, D. (2021). Remaining useful life prediction of bearings using ensemble learning: The impact of diversity in base learners and features. Journal of Computing and Information Science in Engineering, 21(2), 021004.
Sumathi, M. & Poorna, B. (2017). Design and development of ensemble of naïve Bayes classifiers to predict social and communication deficiency among children. International Journal of Applied Engineering Research, 12(24), 14190-14198.
Sun, Y. & Dai, H. (2021). Constructing accuracy and diversity ensemble using Pareto-based multi-objective learning for evolving data streams. Neural Computing and Applications, 33(11), 6119-6132.
Tukey, J. W. (1977). Exploratory Data Analysis. Pearson.
Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241-259.
Wu, Y., Ke, Y., Chen, Z., Liang, S., Zhao, H., & Hong, H. (2020). Application of alternating decision tree with adaboost and bagging ensembles for landslide susceptibility mapping. CATENA, 187, 104396-104454.
Ykhlef, H. & Bouchaffra, D. (2017). An efficient ensemble pruning approach based on simple coalitional games. Information Fusion, 34, 28-42.
Zakzouk, T. S. & Mathkour, H. I. (2012). Comparing text classifiers for sports news. Procedia Technology, 1, 474-480.
Zhou, X., Wang, S., Xu, W., Ji, G., Phillips, P., Sun, P., & Zhang, Y. (2015). Detection of pathological brain in MRI scanning based on wavelet-entropy and naïve Bayes classifier. Proceedings of the 2015 International Conference on Bioinformatics and Biomedical Engineering, 201-209, Springer.