簡易檢索 / 詳目顯示

研究生: 陳弘基
Chen, Hong-Ji
論文名稱: 以虛擬樣本產生法為基礎的隨機林預測模式
Accomplishing Random Forest on the Basis of Virtual Sample Construction
指導教授: 利德江
Li, Der-Chiang
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2014
畢業學年度: 102
語文別: 中文
論文頁數: 72
中文關鍵詞: 虛擬樣本拔靴法整體趨勢擴散技術整合分類法
外文關鍵詞: ensemble, bootstrap process, virtual samples, mega-trend-diffusion (MTD)
相關次數: 點閱:139下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 如何將資料轉換為有意義的資訊,以提供予企業決策者以為參考,具有其實質上應用的價值。在過往針對分類資料的學習問題上,為了改善單一模式之分類正確率而有著如裝袋法、激發法、以及隨機林等整合分類學習程序,然而此些方法在建構子模式時,其所需的子樣本集資料係藉由拔靴法生成,使得此些子模式只能針對相似的資料從事重覆性的訓練,並產生出僅具些微差異的學習結果,雖能改善單一模式的分類準確度,而其結果仍屬有限。為使子模式能夠針對非訓練的樣本資料範圍進行學習,本研究使用虛擬樣本取代拔靴法,並選擇以整體趨勢擴散技術(mega-trend-diffusion, MTD)來進行子樣本集的生成,此種虛擬樣本產生法於近十年的研究中,已被確認能有效增進學習工具對於少量樣本的訓練穩定性以及預測準確度。於資料取得方面,本研究使用公開資料庫UCI上所取得之資料,針對隨機林整合分類法,將拔靴法改使用MTD進行測試,冀能增進隨機林對於測試樣本的分類準確度。實驗結果顯示,本研究之方法能有效提升分類正確法之準確率。

    In order to improve the classification accuracy of single-model, certain ensemble approaches, such as the bagging, the boosting, and the random forest, are developed based on the bootstrap process to generate training subsets to implement their learning procedures. Nevertheless, there only exists slight difference between the training subsets, since these are created by sampling from the same data with replacement. Indeed, integrating the results of the sub-models that built with the training subsets can facilitate improve the classification accuracy of single-model, there still has the possibility to further achieve this by generating the training subsets that have different sample values between them. Therefore, this study employs another sample generation approach that called the mega-trend-diffusion (MTD) technique as a substitute for the bootstrap process in the learning procedures of the random forest, where this kind of the sample generation approaches has been demonstrated to enhance the robustness and the preciseness of learning tools when sample sizes are small in the past decade. In the experiment, the results show that the classification accuracy of the random forest is significantly improved when training subsets are created by the MTD rather than the bootstrap process.

    摘要 I 英文摘要 II 誌謝 VIII 目錄 IX 圖目錄 XI 表目錄 XIII 第一章 緒論 1 1.1 研究背景 1 1.2 研究動機 2 1.3 研究目的 5 1.4 研究架構與流程 6 第二章 文獻探討 8 2.1 決策樹分類預測模式 8 2.2 整合分類法 11 2.2.1 整合分類法之原理 11 2.2.2 袋外資料錯誤率估計 13 2.2.3 常見的整合分類法 14 2.3 樣本產生法 20 2.3.1 拔靴抽樣法 20 2.3.2 資訊擴散技術 24 2.4 小結 27 第三章 研究方法 28 3.1 整體趨勢擴散技術 28 3.1.1 參考點與擴散係數修改 28 3.1.2 偏態設定 29 3.1.3 隸屬函數值制定 30 3.2 樣本生成機制 31 3.3 建構預測模式 33 3.4 本研究方法流程與步驟 40 第四章 實例驗證 42 4.1 實驗環境 42 4.1.1 預測模式建構軟體 42 4.1.2 實驗方式與評估指標 42 4.2 本研究實驗資料說明 44 4.3 實驗結果 46 4.4 實驗發現與結果探討 65 第五章 結論與建議 67 5.1 結論 67 5.2 未來研究建議 68 參考文獻 69

    Breiman, L. (1996). Bagging Predictors. Machine Learning, 24(2), 123-140.
    Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
    Byon, E., Shrivastava, A. K., & Ding, Y. (2010). A classification procedure for highly imbalanced class sizes. IIE Transactions, 42(4), 288-303.
    Díaz-Uriarte, R., & De Andres, S. A. (2006). Gene selection and classification of microarray data using random forest. BMC bioinformatics, 7(1), 3.
    Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap (Vol. 57). CRC press.
    Friedman, Milton. (1980). Free to choose (1st ed.): Harcourt Brace Jovanovich, (New York :).
    Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119-139.
    Huang, C. F. (1997). Principle of information. Fuzzy Sets and Systems, 91(1), 69-90.
    Huang, C. F., & Moraga, C. (2004). A diffusion-neural-network for learning from small samples. International Journal of Approximate Reasoning, 35(2), 137-161.
    Huang, C. J., Wang, H. F., Chiu, H. J., Lan, T. H., Hu, T. M., & Loh, E. W. (2010). Prediction of the Period of Psychotic Episode in Individual Schizophrenics by Simulation-Data Construction Approach. Journal of Medical Systems, 34(5), 799-808.
    Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 119-127.
    Kearns, M. (1988). Thoughts on hypothesis boosting. Unpublished manuscript, December.
    Li, D. C., Wu, C. S., & Chang, F. M. (2005). Using data-fuzzification technology in small data set learning to improve FMS scheduling accuracy. International Journal of Advanced Manufacturing Technology, 27(3-4), 321-328.
    Li, D. C., Wu, C. S., Tsai, T. I., & Chang, F. M. M. (2006). Using mega-fuzzification and data trend estimation in small data set learning for early FMS scheduling knowledge. Computers & Operations Research, 33(6), 1857-1869.
    Li, D. C., Wu, C. S., Tsai, T. I., & Lina, Y. S. (2007). Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Computers & Operations Research, 34(4), 966-982.
    Li, D. C., & Yeh, C. W. (2008). A non-parametric learning algorithm for small manufacturing data sets. Expert Systems with Applications, 34(1), 391-398.
    Li, D. C., Tsai, T. I., & Shi, S. (2009b). A prediction of the dielectric constant of multi-layer ceramic capacitors using the mega-trend-diffusion technique in powder pilot runs: case study. International Journal of Production Research, 47(1), 51-69.
    Li, D. C., Chen, C. C., Chen, W. C., & Chang, C. J. (2012c). Employing dependent virtual samples to obtain more manufacturing information in pilot runs. International Journal of Production Research, 50(23), 6886-6903.
    Li, D. C., Chang, C. C., & Liu, C. W. (2012). Using structure-based data transformation method to improve prediction accuracies for small data sets. Decision Support Systems, 52(3), 748-756.
    Li, D. C., Chen, C. C., Chang, C. J., & Chen, W. C.(2012). Employing Box-and-Whisker plots for learning more knowledge in TFT-LCD pilot runs. International Journal of Production Research, 50(6), 1539-1553.
    Li, D. C., & Liu, C. W. (2012). Extending Attribute Information for Small Data Set Classification. IEEE Transactions on Knowledge and Data Engineering, 24(3), 452-464.
    Li, D. C., Chen, C. C., Chen, W. C., & Chang, C. J. (2012b). Employing dependent virtual samples to obtain more manufacturing information in pilot runs. International Journal of Production Research, 50(23), 6886-6903.
    Lin, Y. S., & Li, D. C. (2010). The Generalized-Trend-Diffusion modeling algorithm for small data sets in the early stages of manufacturing systems.European Journal of Operational Research, 207(1), 121-130.
    Quinlan, J. R. (1986). Induction of Decision Trees. Machine Learning, 1(1), 81-106.
    Quinlan, J. R. (1993). C4.5: programs for machine learning (Vol. 1). Morgan Kaufmann.
    Schapire, R. E. (1999, January). Theoretical views of boosting and applications. In Algorithmic Learning Theory (pp.13-25). Springer Berlin Heidelberg.
    Schwenk, H. and Y. Bengio (2000), Boosting Neural Networks, Neural Computation, 12(8),1869-1887.
    Tsai, T. I., & Li, D. C. (2008). Utilize bootstrap in small data set learning for pilot run modeling of manufacturing systems. Expert Systems with Applications,35(3), 1293-1300.
    Zhen-Rong, L., & Chong-Fu, H. (1990). Information distribution method relevant in fuzzy information analysis. Fuzzy Sets and Systems, 36(1), 67-76.

    無法下載圖示 校內:2016-06-27公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE