| 研究生: |
莊宜珊 Chuang, Yi-Shan |
|---|---|
| 論文名稱: |
以特徵選取為基礎的混合分類方法 Hybrid Classification Methods Based on Feature Selection |
| 指導教授: |
翁慈宗
Wong, Tzu-Tsung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 中文 |
| 論文頁數: | 61 |
| 中文關鍵詞: | 混合式分類方法 、支持向量機 、羅吉斯回歸 、特徵挑選 、包裝法 |
| 外文關鍵詞: | Feature selection, hybrid classification, logistic regression, support vector machine, wrapper |
| 相關次數: | 點閱:176 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
基礎分類方法為單一模型,常見方法有簡易貝氏分類方法和支持向量機,在對測試樣本進行類別預測時,依據原本測試樣本屬性資料進行分類預測,最終得到最有可能的類別預測值,對分類預測結果容易闡釋,然而分類結果穩定性並不佳;而後學者提出集成式分類方法,常見有掛袋法與推進法,產生多個預測模型,最終由多數決投票決定結果,此方法分類結果較穩定,但無法對分類預測結果闡釋。混合式分類方法是基礎分類方法的延伸,其核心概念是藉由產生多個模型,但每次預測挑只選其中一個來做預測,可改善擔心單一個分類模型結果不穩定問題,以及多組模型做預測而無法對結果具有解釋力的情況。因此本研究考量各分類方法特性,首先依據某分類方法做特徵選取後,產生多個模型,並設計一種機制來選擇單一模型來進行預測,既可以對結果具有闡釋性,也能改善擔心單一個分類模型做決策結果不穩定問題。實證研究中本研究使用UCI中15個資料集進行實證,可以發現本研究的方法相較於基礎分類方法,半數以上資料集上的正確率有所改善;本而研究的方法相較於經過特徵選取之基礎分類方法,半數以上的資料集上有所改善,相對之下改善幅度比較大,但是有些資料檔集分類正確率比較低;而第一階段與第二階段分類方法相同與不相同時,運算出來的正確率差距並不大。 本研究一些數據集的性能無法提高,因為大多數測試資料都是根據從全特徵集導出的模型進行分類的。 這意味著混合分類方法的正確率的改進在很大程度上取決於數據集的特徵。
Basic classification algorithms, such as naïve Bayesian classifier and support vector machine, induce a single model from training data. Every prediction made by the model is easy to interpret, while its accuracy is generally not competitive. Ensemble algorithms induce several models from training to make group decisions for class prediction. An ensemble algorithm can thus achieve a relatively high accuracy, while it is difficult to interpret a prediction made by multiple models. Hybrid classification algorithms induce several models from training data, and only one of the models will be chosen to classify a testing instance. This kind of predictions is still easy to interpret, and the prediction accuracy is likely to be improved. This study proposes a scheme based on feature selection to build hybrid classification algorithms. The attribute set corresponding to a data set is divided into disjoint primary and secondary attribute sets by a wrapper that employs a basic classification algorithm to perform feature selection. Then the full, primary, and secondary attribute sets are used to train a basic algorithm that can be the same as or different from the one for feature selection. Four hybrid algorithms composed of logistic regression and support vector machine are tested on 15 data set downloaded from the UCI data repository. The experimental results show that the hybrid algorithms outperform the corresponding ones in approximately half of the data set. The basic algorithms for feature selection and model induction seem to have little impact on their performance. The performance of some data sets cannot be improved because most of the testing instances are classified by the model induced from the full attribute set. This implied that the performance improvement of hybrid algorithms highly depends on the characteristics of data sets.
陳國鴻,(2017)。以資料二元分割方式為基礎的混合分類方法。國立成功大學資訊管理研究所碩士班論文。
Abed, B. M., Shaker, K., Jalab, H.A., Shaker, H., Mansoor, A. M., Alwan, A.F., &Al-Gburi, I.S.(2016). A hybrid classification algorithm approach for breast cancer diagnosis.Paper presented at the 2016 IEEE Industrial Electronics and Applications Conference (IEACon), 269-274.
Almugren, N., & Alshamlan, H. (2019). A survey on hybrid feature selection methods in microarray gene expression data for cancer classification. IEEE Access, 7, 78533-78548.
Altyeb Altaher, J., Saudi Arabia. (2017). Phishing websites classification using hybrid SVM and KNN approach. (IJACSA) International Journal of Advanced Computer Science and Applications, 8(6), 90-95.
Baraa M. Abed , K. S., Hamid A. Jalab, Hothefa Shaker, Ali Mohammed Mansoor, Ahmad F. Alwan,Ihsan Salman Al-Gburi. (2016). A hybrid classification algorithm approach for breast cancer diagnosis. Paper presented at the 2016 IEEE Industrial Electronics and Applications Conference (IEACon), 269-274.
Breiman, L. (1996). Bagging Predictors. Machine Learning, 24, 123-140.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5-32.
Bron, E. E., Smits, M., Niessen, W. J., & Klein, S. (2015). Feature selection based on the SVM weight vector for classification of dementia. IEEE Journal of Biomedical and Health Informatics, 19(5), 1617-1626.
Chen, C., Zhang, G., Tarefder, R., Ma, J., Wei, H., & Guan, H. (2015). A multinomial logit model-bayesian network hybrid approach for driver injury severity analyses in rear-end crashes. Accident Analysis and Prevention, 80, 76-88.
Chen, C., Zhang, G., Yang, J., Milton, J. C., & Alcantara, A. D. (2016). An explanatory analysis of driver injury severity in rear-end crashes using a decision table/naive bayes hybrid classifier. Accident Analysis and Prevention , 90, 95-107.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Leaming, 20, 273-297.
De Caigny, A., Coussement, K., & De Bock, K. W. (2018). A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees. European Journal of Operational Research, 269(2), 760-772.
Farid, D. M., Zhang, L., Rahman, C. M., Hossain, M. A., & Strachan, R. (2014). Hybrid decision tree and naïve bayes classifiers for multi-class classification tasks. Expert Systems with Applications, 41(4), 1937-1946.
Guyon, I., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46, 389–422.
Han, J., Kamber, M., & Pei, J. (2012). Data Mining Concepts and Techniques. USA: Elsevier.
He, H., Kong, F., & Tan, J. (2016). DietCam: multiview food recognition using a multikernel SVM. IEEE Journal of Biomedical and Health Informatics, 20(3), 848-855.
Jun, Z., Chao, C., Yang, X., Wanlei, Z., & Yong, X. (2013). Internet traffic classification by aggregating correlated naive bayes predictions. IEEE Transactions on Information Forensics and Security, 8(1), 5-15.
Kotsiantis, S. B. (2007). Supervised machine learning: a review of classification techniques. Informatica, 31, 249-268.
Ma, J., Qiao, Y., Hu, G., Huang, Y., Sangaiah, A. K., Zhang, C., Wang, Y., & Zhang, R. (2018). De-anonymizing social networks with random forest classifier. IEEE Access, 6, 10139-10150.
Michalski, R. S., Carbonell, J. G., & Mitchell, T. M. (1983). An overview of machine learning. Machine Leaming:An Artificial Intelligence Approach, 1, 3-23.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 81-106.
Saeys, Y., Inza, I., & Larranaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, 23(19), 2507-2517.
Sanchis, A., Juan, A., & Vidal, E. (2011). A word-based naïve bayes classifier for confidence estimation in speech recognition. IEEE Transactions on Audio, Speech, and Language Processing,20(2),565-574.
Shannon, C. (1948). The lattice theory of information. Transactions of the IRE Professional Group on Information Theory 1(1), 105 - 107.
Shieh, M., & Yang, C. (2008). Multiclass SVM-RFE for product form feature selection. Expert Systems with Applications, 35(1-2), 531-541.
Sikandar, A., Anwar, W., Bajwa, U. I., Wang, X., Sikandar, M., Yao, L., & Chunkai, Z. (2018). Decision tree based approaches for detecting protein complex in protein protein interaction network via link and sequence analysis. IEEE Access, 6, 22108-22120.
Sun, J., Zhong, G., Dong, J., Saeeda, H., & Zhang, Q. (2017). Cooperative profit random forests with application in ocean front recognition. IEEE Access, 5, 1398-1408.
Tan, P.-N., Steinbach, M., & Kumar, V. (2006). Introduction To Data Mining. USA: Pearson Education.
Wang, A., An, N., Chen, G., Li, L., & Alterovitz, G. (2015). Accelerating wrapper-based feature selection with k-nearest-neighbor. Knowledge-Based Systems, 83, 81-91.
Wang, S., Tao, D., & Yang, J. (2016). Relative attribute SVM+ learning for age estimation. IEEE Transactions on Cybernetics, 46(3), 827-839.
Xiao, F., Wang, Y., He, L., Wang, H., Li, W., & Liu, Z. (2019). Motion estimation from surface electromyogram using adaboost regression and average feature values. IEEE Access, 7, 13121-13134.
Yan, C., Wang, Z., & Xu, C. (2019). Gentle adaboost algorithm based on multi-feature fusion for face detection. The Journal of Engineering, 2019(15), 609-612.
Zelenkov, Y., Fedorova, E., & Chekrizov, D. (2017). Two-step classification method based on genetic algorithm for bankruptcy forecasting. Expert Systems with Applications, 88, 393-401.
Zheng, B., Yoon, S. W., & Lam, S. S. (2014). Breast cancer diagnosis based on feature extraction using a hybrid of k-means and support vector machine algorithms. Expert Systems with Applications, 41(4), 1476-1482.
Zheng, J., Yang, S., Wang, X., Xia, X., Xiao, Y., & Li, T. (2019). A decision tree based road recognition approach using roadside fixed 3D LiDAR sensors. IEEE Access, 7, 53878-53890.
校內:2025-06-01公開