研究生: |
陳誌瑋 Chen, Jhih-Wei |
---|---|
論文名稱: |
藉由盒鬚圖產生之虛擬樣本提升拔靴集成法的分類正確率 Improving the Accuracies of Bootstrap Aggregating with Virtual Samples generated by Box Plot |
指導教授: |
利德江
Li, Der-Chiang |
學位類別: |
碩士 Master |
系所名稱: |
管理學院 - 工業與資訊管理學系 Department of Industrial and Information Management |
論文出版年: | 2014 |
畢業學年度: | 102 |
語文別: | 中文 |
論文頁數: | 77 |
中文關鍵詞: | 虛擬樣本 、盒鬚圖 、拔靴集成法 |
外文關鍵詞: | Virtual Sample, Box-Whisker Plot, Bagging |
相關次數: | 點閱:101 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
如何將資料轉換成有意義的資訊,統計理論在過往扮演著重要的角色,然囿於其基本假設限制,已無法因應現實世界中各種不同面向的資料,因此類神經網路以及資料探勘等機械學習方法於近二十年來有著長足的蓬勃發展。其中在分類問題方面,相較於單一分類器的學習程序,集成法的提出可以有效減少過度配適問題的發生,如拔靴集成法、多模激發法等,其藉由拔靴法生成多個子訓練樣本集以建構多個子分類器,並將結果進行整合,雖能增進單一分類器的分類正確率,但其改善效果仍屬有限,乃因此些子分類器係針對屬性值與訓練樣本相同的子訓練樣本集進行重複性的學習之故。為使子分類器能夠對拔靴樣本以外的屬性值進行學習,本研究採用盒鬚圖進行訓練樣本的值域推估並藉以生成虛擬樣本以充實子訓練樣本集。本論文使用公開資料庫UCI上所取得之資料進行測試,經取得之資料集實驗證實,本研究之方法確能有效提升拔靴集成法之分類正確率,並能增進其分類方法的穩定性。
Data mining is a technology that blends traditional data analysis methods with sophisticated algorithms for processing large volumes of data. The technology applied to classification will have errors sometimes, like that in overfitting and underfitting problems. Ensemble methods, such as Bagging (Bootstrap Aggregating) or Boosting, manipulate training sets to reduce the happening of overfitting problems. Bagging does not focus on any particular instance of training data, and is therefore less susceptible to model overfitting when applied to noisy data. Bagging uses bootstrap to generate samples repeatedly, but doesn’t generate sample sets out of the underlined space of training samples. Because the sampling is done with replacement, some instances may appear several times in the same training set, while others may be omitted from the training set. This study intends to use box-whisker plot to generate virtual samples for training data to substitute bootstrap approach. Further, this paper uses the datasets on public database UCI to prove that this study could improve the accuracies of Bagging.
Bühlmann, P. (2003). Bagging, Subagging and Bragging for Improving some Prediction Algorithms. In G. A. Michael & N. P. Dimitris (Eds.), Recent Advances and Trends in Nonparametric Statistics (pp. 19-34). Amsterdam: JAI.
Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123-140.
Breiman, L. (1999). Using adaptive bagging to debias regressions: Technical Report 547, Statistics Dept. UCB.
Bryll, R., Gutierrez-Osuna, R., & Quek, F. (2003). Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets. Pattern Recognition, 36(6), 1291-1302.
Chen, H. Q., & Zeng, Z. G. (2013). Deformation Prediction of Landslide Based on Improved Back-propagation Neural Network. Cognitive Computation, 5(1), 56-62.
Chen, T. (2003). A fuzzy back propagation network for output time prediction in a wafer fab. Applied Soft Computing, 2(3), 211-222.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
Dietterich, T. G. (2000). Ensemble methods in machine learning Multiple Classifier Systems (pp. 1-15): Springer.
Efron, B., & Tibshirani, R. J. (1993). An Introduction to the Bootstrap: New York: Chapmen & Hall.
Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 1-67.
Hansen, L. K., & Salamon, P. (1990). Neural network ensembles. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 12(10), 993-1001.
Hothorn, T., & Lausen, B. (2003). Double-bagging: combining classifiers by bootstrap aggregation. Pattern Recognition, 36(6), 1303-1309.
Huang, C. F. (1997). Principle of information. Fuzzy Sets and Systems, 91(1), 69-90.
Huang, C. F., & Moraga, C. (2004). A diffusion-neural-network for learning from small samples. International Journal of Approximate Reasoning, 35(2), 137-161.
Ivanescu, V. C., Bertrand, J. W. M., Fransoo, J. C., & Kleijnen, J. P. (2006). Bootstrapping to solve the limited data problem in production control: an application in batch process industries. Journal of the Operational Research Society, 57(1), 2-9.
Jang, J.-S. (1993). ANFIS: adaptive-network-based fuzzy inference system. Systems, Man and Cybernetics, IEEE Transactions on, 23(3), 665-685.
Jayadeva, Khemchandani, R., & Chandra, S. (2007). Twin Support Vector Machines for Pattern Classification. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(5), 905-910.
Jia, J. (1993). Pattern classification of RGB color images using a BP neural network classifier. Proc. SPIE 1989, Computer Vision for Industry, 248
Joshi, S., Jayadeva, Ramakrishnan, G., & Chandra, S. (2012). Using Sequential Unconstrained Minimization Techniques to simplify SVM solvers. Neurocomputing, 77(1), 253-260.
Kotsiantis, S., & Pintelas, P. (2004). Combining bagging and boosting. International Journal of Computational Intelligence, 1(4), 324-333.
Krogh, A., & Vedelsby, J. (1995). Neural network ensembles, cross validation, and active learning. Advances in Neural Information Processing Systems, 231-238.
Kuo, R. J., Shieh, M. C., Zhang, J. W., & Chen, K. Y. (2013). The application of an artificial immune system-based back-propagation neural network with feature selection to an RFID positioning system. Robotics and Computer-Integrated Manufacturing, 29(6), 431-438.
Lee, S., & Choi, W. S. (2013). A multi-industry bankruptcy prediction model using back-propagation neural network and multivariate discriminant analysis. Expert Systems with Applications, 40(8), 2941-2946.
Lin, C. F., & Wang, S. D. (2002). Fuzzy support vector machines. Neural Networks, IEEE Transactions on, 13(2), 464-471.
Li, D. C., Chen, L. S., & Lin, Y. S. (2003). Using functional virtual population as assistance to learn scheduling knowledge in dynamic manufacturing environments. International Journal of Production Research, 41(17), 4011-4024.
Li, D. C., & Lin, Y. S. (2006). Using virtual sample generation to build up management knowledge in the early manufacturing stages. European Journal of Operational Research, 175(1), 413-434.
Li, D. C., Wu, C. S., Tsai, T. I., & Chang, F. M. (2006). Using mega-fuzzification and data trend estimation in small data set learning for early FMS scheduling knowledge. Computers & Operations Research, 33(6), 1857-1869.
Li, D. C., Wu, C. S., Tsai, T. I., & Lina, Y. S. (2007). Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Computers & Operations Research, 34(4), 966-982.
Li, D. C., Wu, C., & Chang, F. M. (2005). Using data-fuzzification technology in small data set learning to improve FMS scheduling accuracy. The International Journal of Advanced Manufacturing Technology, 27(3-4), 321-328.
Li, D. C., Chen, C. C., Chang, C. J., & Lin, W. K. (2012a). A tree-based-trend-diffusion prediction procedure for small sample sets in the early stages of manufacturing systems. Expert Systems with Applications, 39(1), 1575-1581.
Li, D. C., Chen, C. C., Chang, C. J., & Chen, W. C. (2012b). Employing box-and-whisker plots for learning more knowledge in TFT-LCD pilot runs. International Journal of Production Research, 50(6), 1539-1553.
Louzada, F., & Ara, A. (2012). Bagging k-dependence probabilistic networks: An alternative powerful fraud detection tool. Expert Systems with Applications, 39(14), 11583-11592.
Luo, S. T., & Cheng, B. W. (2012). Diagnosing Breast Masses in Digital Mammography Using Feature Selection and Ensemble Methods. Journal of Medical Systems, 36(2), 569-577.
Nanni, L., & Lumini, A. (2006). FuzzyBagging: A novel ensemble of classifiers. Pattern Recognition, 39(3), 488-490.
Niyogi, P., Girosi, F., & Poggio, T. (1998). Incorporating prior information in machine learning by creating virtual examples. Proceedings of the IEEE, 86(11), 2196-2209.
Osawa, T., Mitsuhashi, H., Uematsu, Y., & Ushimaru, A. (2011). Bagging GLM: Improved generalized linear model for the analysis of zero-inflated data. Ecological Informatics, 6(5), 270-275.
Peng, X., & Xu, D. (2013). A twin-hypersphere support vector machine classifier and the fast learning algorithm. Information Sciences, 221(0), 12-27.
Platt, J. (1998). Sequential minimal optimization: A fast algorithm for training support vector machines.
Qiao, Y. H., Liu, J. L., Zhang, C. G., Xu, X. H., & Zeng, Y. J. (2005). SVM classification of human intergenic and gene sequences. Mathematical Biosciences, 195(2), 168-178.
Rad, S. J. M., Tab, F. A., & Mollazade, K. (2011, 16-17 Nov. 2011). Classification of Rice Varieties Using Optimal Color and Texture Features and BP Neural Networks. Paper presented at the Machine Vision and Image Processing (MVIP), 2011 7th Iranian.
Rumelhart, D. E., Hintont, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536.
Sartakhti, J. S., Zangooei, M. H., & Mozafari, K. (2012). Hepatitis disease diagnosis using a novel hybrid method based on support vector machine and simulated annealing (SVM-SA). Computer Methods and Programs in Biomedicine, 108(2), 570-579.
Schapire, R. E. (1990). The strength of weak learnability. Machine learning, 5(2), 197-227.
Song, H. S., Xu, R. S., Ma, Y. L., & Li, G. F. (2013). Classification of ETM+ Remote Sensing Image Based on Hybrid Algorithm of Genetic Algorithm and Back Propagation Neural Network. Mathematical Problems in Engineering, 8.
Song, X. F., Chen, W. M., Chen, Y. P. P., & Jiang, B. (2009). Candidate working set strategy based SMO algorithm in support vector machine. Information Processing & Management, 45(5), 584-592.
Sugeno, M., & Kang, G. (1988). Structure identification of fuzzy model. Fuzzy Sets and Systems, 28(1), 15-33.
Suykens, J. A. K., & Vandewalle, J. (1999). Least Squares Support Vector Machine Classifiers. Neural Processing Letters, 9(3), 293-300.
Tao, D., Tang, X., Li, X., & Wu, X. (2006). Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 28(7), 1088-1099.
Tukey, J. W. (1977). Exploratory data analysis. Reading, Ma, 231.
Van Ooyen, A., & Nienhuis, B. (1992). Improving the convergence of the back-propagation algorithm. Neural Networks, 5(3), 465-471.
Yang, B., Liu, Z., Xing, Y., & Luo, C. (2011, August). Remote sensing image classification based on improved BP neural network. In Image and Data Fusion (ISIDF), 2011 International Symposium on (pp. 1-4). IEEE.
Zhu, X., & Yang, Y. (2008). A lazy bagging approach to classification. Pattern Recognition, 41(10), 2980-2992.