簡易檢索 / 詳目顯示

研究生: 林泓暘
Lin, Hong-Yang
論文名稱: 藉由整合權重的生成以改善單一數值預測模式整合法對小樣本之預測準確率
Generating Aggregated Weights to Improve the Predictive Accuracy of Single-Model Ensemble Numerical Predicting Method in Small Datasets
指導教授: 利德江
Li, Der-Chiang
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業與資訊管理學系
Department of Industrial and Information Management
論文出版年: 2016
畢業學年度: 105
語文別: 中文
論文頁數: 45
中文關鍵詞: 拔靴整合法數值預測小樣本學習整合權重
外文關鍵詞: bagging, numerical prediction, small sample learning, aggregated weight
相關次數: 點閱:85下載:8
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在這資訊爆炸的時代,資訊的普及性提升,因此如何從有限的資料裡找尋找並歸納出我們想要且有用的資訊,是小樣本學習中非常重要的課題。近年來,對於整合法之研究主要琢磨於模式整合過程,對於如何整合其結果之研究相對甚少,而資料探勘的方法又可分為分類與預測,整合法在分類預測問題大多藉由投票多數決方式表決,在數值預測問題方面,最後整合階段則大多以簡單平均法為之,但簡單平均法較容易受極端值影響,尤其在樣本數較少的情況下更為明顯,故本研究考量此問題,針對單一模式整合法應用於數值預測問題時,提出一個基於模糊理論與盒鬚圖之整合權重計算方法,冀望能改善此問題。
    本研究將拔靴整合法(Bootstrap Aggregating, Bagging)進行改良,運用支援向量迴歸作為預測模型,依據各預測模型的偏誤值來計算權重,因此每筆預測值都會得到一筆與其對應的權重值,最後再計算各模式於追求最小誤差目標下之妥協預測值,以取得系統之穩健性。在方法的效果驗證上,本研究與簡單平均法、模式預測值進行比較,同時以從面板廠所取得之實務個案,冀望能夠提升多模式整合法的預測準確率。

    In the age of information explosion,it’s easier to reach out to information,so how to explore and conclude some useful information in limited data is a pretty important study in small data learning.nowadays,the studies in ensemble method mostly focus on the process instead of the result.the methods in datamining can be divided into classification and prediction.in ensemble method ,voting is the most common way to deal with classification,but in numerical prediction problem,average method is the most common way to calculate the result,but it can be easily affected by some extreme values,especially in the circumstances of small datasets
    We make an improvement in Bagging.We use SVR as our prediction model ,and calculate the error value based on our prediction model,so we can get a corresponding weight value of each prediction value,and then we can calculate the compromise prediction value under the purpose of getting the smallest error value.Therefore,we can stabilize our system,and we compare our method to average method in order to examine the effect of our study,and we also take the practical case in panel factory to prove the improvement in single-model ensemble method

    目錄 V 圖目錄 VI 表目錄 VII 第一章 緒論 1 1.1 研究背景 1 1.2 研究動機 3 1.3 研究目的 4 1.4 研究流程 5 第二章 文獻探討 7 2.1 整合法 7 2.1.1 單一模式整合法 8 2.1.2 多模式整合法 12 2.2 預測模型-支援向量迴歸 14 2.3資訊擴散概念 16 2.4盒鬚圖 19 第三章 研究方法 21 3.1單一模式整合法 21 3.2預測模型-支援向量迴歸 22 3.3基於盒鬚圖之值域推估 24 3.3.1值域推估 24 3.3.2建構三角隸屬函數 24 第四章 實例驗證 29 4.1 實驗環境 29 4.1.1 實驗方式 29 4.1.2 預測誤差評估指標 29 4.1.3 假設檢定 30 4.1.4建模軟體 30 4.2 個案說明 31 4.3 實驗結果 33 第五章 結論與建議 38 5.1結論 38 5.2 建議 39 參考文獻 40

    (一)中文部分
    陳惠昭 (2014)。使用整體趨勢擴散技術提升多模式整合法之預測準確率。碩士論文,國立成功大學工業與資訊管理學系。
    (二)英文部分
    [1] Q. Yu, "Weighted bagging: a modification of AdaBoost from the perspective of importance sampling," Journal of Applied Statistics, vol. 38, no. 3, pp. 451-463, 2011.
    [2] D.-C. Li, C.-J. Chang, C.-C. Chen, and W.-C. Chen, "A grey-based fitting coefficient to build a hybrid forecasting model for small data sets," Applied Mathematical Modelling, vol. 36, no. 10, pp. 5101-5108, 2012.
    [3] T. G. Dietterich, "Ensemble Methods in Machine Learning," presented at the Proceedings of the First International Workshop on Multiple Classifier Systems, 2000, 2000.
    [4] L. I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, 2004.
    [5] D.-C. Li, C.-W. Liu, and W.-C. Chen, "A multi-model approach to determine early manufacturing parameters for small-data-set prediction," International Journal of Production Research, vol. 50, no. 23, pp. 6679-6690, 2012.
    [6] L. Todorovski and S. Džeroski, "Combining Multiple Models with Meta Decision Trees," in Principles of Data Mining and Knowledge Discovery, vol. 1910, D. Zighed, J. Komorowski, and J. Żytkow, Eds. (Lecture Notes in Computer Science: Springer Berlin Heidelberg, 2000, pp. 54-64.
    [7] L. Breiman, "Bagging predictors," Machine learning, vol. 24, no. 2, pp. 123-140, 1996.
    [8] B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap. New York: Chapmen & Hall, 1993.
    [9] R. Bryll, R. Gutierrez-Osuna, and F. Quek, "Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets," Pattern recognition, vol. 36, no. 6, pp. 1291-1302, 2003.
    [10] T. Osawa, H. Mitsuhashi, Y. Uematsu, and A. Ushimaru, "Bagging GLM: improved generalized linear model for the analysis of zero-inflated data," Ecological Informatics, vol. 6, no. 5, pp. 270-275, 2011.
    [11] M. Browne, "Regularized tessellation density estimation with bootstrap aggregation and complexity penalization," Pattern Recognition, vol. 45, no. 4, pp. 1531-1539, 2012.
    [12] A. Jha, R. Chauhan, M. Mehra, H. R. Singh, and R. Shankar, "miR-BAG: bagging based identification of microRNA precursors," PLoS One, vol. 7, no. 9, p. e45782, 2012.
    [13] R. E. Schapire, "The Strength of Weak Learnability," Mach. Learn., vol. 5, no. 2, pp. 197-227, 1990.
    [14] Y. Freund and R. E. Schapire, "Experiments with a New Boosting Algorithm," presented at the International Conference on Machine Learning, 1996.
    [15] X. Li, L. Wang, and E. Sung, "AdaBoost with SVM-based component classifiers," Engineering Applications of Artificial Intelligence, vol. 21, no. 5, pp. 785-795, 2008.
    [16] Y. Gao and F. Gao, "Edited AdaBoost by weighted kNN," Neurocomputing, vol. 73, no. 16-18, pp. 3079-3088, 2010.
    [17] E. Song, D. Huang, G. Ma, and C.-C. Hung, "Semi-supervised multi-class Adaboost by exploiting unlabeled data," Expert Systems with Applications, vol. 38, no. 6, pp. 6720-6726, 2011.
    [18] J. Cao, S. Kwong, and R. Wang, "A noise-detection based AdaBoost algorithm for mislabeled data," Pattern Recognition, vol. 45, no. 12, pp. 4451-4465, 2012.
    [19] J.-B. Wen, Y.-S. Xiong, and S.-L. Wang, "A novel two-stage weak classifier selection approach for adaptive boosting for cascade face detector," Neurocomputing, vol. 116, pp. 122-135, 2013.
    [20] L. Breiman, "Random forests," Machine Learning, vol. 45, no. 1, pp. 5-32, Oct 2001.
    [21] B. Fellinghauer, P. Bühlmann, M. Ryffel, M. von Rhein, and J. D. Reinhardt, "Stable graphical model estimation with Random Forests for discrete, continuous, and mixed variables," Computational Statistics & Data Analysis, vol. 64, pp. 132-152, 2013.
    [22] M. Carrasco Kind and R. J. Brunner, "TPZ: photometric redshift PDFs and ancillary information by using prediction trees and random forests," Monthly Notices of the Royal Astronomical Society, vol. 432, no. 2, pp. 1483-1501, 2013.
    [23] A. Philibert, C. Loyce, and D. Makowski, "Prediction of N2O emission from local information with Random Forest," Environ Pollut, vol. 177, pp. 156-63, Jun 2013.
    [24] O. Kim, "Fire detection system using random forest classification for image sequences of complex background," Optical Engineering, vol. 52, no. 6, p. 067202, 2013.
    [25] H. M. Hsueh, D. W. Zhou, and C. A. Tsai, "Random forests-based differential analysis of gene sets for gene expression data," Gene, vol. 518, no. 1, pp. 179-86, Apr 10 2013.
    [26] R. Roiger and M. Geatz, Data mining: A tutorial-based primer. Addison Wesley New York, 2003.
    [27] M. Yunqian and V. Cherkassky, "Multiple model classification using SVM-based approach," in Neural Networks, 2003. Proceedings of the International Joint Conference on, 2003, vol. 2, pp. 1581-1586 vol.2.
    [28] M. Reformat and R. Yager, "Building ensemble classifiers using belief functions and OWA operators," (in English), Soft Computing, vol. 12, no. 6, pp. 543-558, 2008/04/01 2008.
    [29] E. Byon, A. K. Shrivastava, and Y. Ding, "A classification procedure for highly imbalanced class sizes," IIE Transactions, vol. 42, no. 4, pp. 288-303, 2010/01/29 2010.
    [30] Y. Chikamoto et al., "An overview of decadal climate predictability in a multi-model ensemble by climate model MIROC," Climate Dynamics, vol. 40, no. 5-6, pp. 1201-1222, Mar 2013.
    [31] N. Acharya, U. C. Mohanty, and L. Sahoo, "Probabilistic multi-model ensemble prediction of Indian summer monsoon rainfall using General Circulation Models: a non-parametric approach," Comptes Rendus Geoscience, vol. 345, no. 3, pp. 126-135, 2013.
    [32] C. Cortes and V. Vapnik, "Support-Vector Networks," Mach. Learn., vol. 20, no. 3, pp. 273-297, 1995.
    [33] H. Drucker, C. J. Burges, L. Kaufman, A. Smola, and V. Vapnik, "Support vector regression machines," Advances in neural information processing systems, pp. 155-161, 1997.
    [34] V. D. Sánchez A, "Advanced support vector machines and kernel methods," Neurocomputing, vol. 55, no. 1–2, pp. 5-20, 9// 2003.
    [35] C. Huang, "Principle of information diffusion," Fuzzy Sets and Systems, vol. 91, no. 1, pp. 69-90, 10/1/ 1997.
    [36] C. Huang and C. Moraga, "A diffusion-neural-network for learning from small samples," International Journal of Approximate Reasoning, vol. 35, no. 2, pp. 137-161, 2// 2004.
    [37] D.-C. Li, C. Wu, and F. M. Chang, "Using data-fuzzification technology in small data set learning to improve FMS scheduling accuracy," (in English), The International Journal of Advanced Manufacturing Technology, vol. 27, no. 3-4, pp. 321-328, 2005/12/01 2005.
    [38] J. S. R. Jang, "ANFIS: adaptive-network-based fuzzy inference system," Systems, Man and Cybernetics, IEEE Transactions on, vol. 23, no. 3, pp. 665-685, 1993.
    [39] D.-C. Li, C.-S. Wu, T.-I. Tsai, and F. M. Chang, "Using mega-fuzzification and data trend estimation in small data set learning for early FMS scheduling knowledge," Comput. Oper. Res., vol. 33, no. 6, pp. 1857-1869, 2006.
    [40] D.-C. Li, C.-S. Wu, T.-I. Tsai, and Y.-S. Lina, "Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge," Computers & Operations Research, vol. 34, no. 4, pp. 966-982, Apr 2007.
    [41] J. W. Tukey, "Exploratory data analysis," 1977.
    [42] D.-C. Li, C.-C. Chen, C.-J. Chang, and W.-C. Chen, "Employing box-and-whisker plots for learning more knowledge in TFT-LCD pilot runs," International Journal of Production Research, vol. 50, no. 6, pp. 1539-1553, 2012.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE