簡易檢索 / 詳目顯示

研究生: 吳晉安
Wu, Chin-An
論文名稱: 利用完全連結分群法建構權重式屬性延伸函數提升小樣本預測能力
A weight-based data transformation function by using complete-linkage clustering to improve prediction ability in small data set
指導教授: 利德江
Li, Der-Chang
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業與資訊管理學系
Department of Industrial and Information Management
論文出版年: 2013
畢業學年度: 101
語文別: 中文
論文頁數: 45
中文關鍵詞: 小樣本完全連結分群法整體趨勢擴展模型屬性延伸
外文關鍵詞: Small data set, Complete-linkage, Mega-trend diffusion model
相關次數: 點閱:68下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 現實生活中,往往因為樣本取得困難、過高的成本以及實際發生次數過少等情況,造成我們必須迫於面對少量的資料來做未來進一步的預測。因此,若能從小樣本中挖掘出更多重要的資訊,以提升其預測能力,是現今存在重要的研究議題。另一方面,由於小樣本資料的數量已經過少,每一筆資料的取得都不易,即便資料中存在著離群值,但也可能隱含重要的涵義。故本研究希望在不刪除任何一筆資料的情況下,藉由給予資料權重來重新定義每筆資料的重要程度。本研究方法有兩個部分,首先利用階層式分群法中的完全連結法對原始資料進行分群,透過資料內部的結構來找出重要特徵。第二個步驟,是將分群後產生的群集,用來建立權重式整體趨勢擴展函數,並計算各原始資料所對應的隸屬值,即產生新的屬性。透過原始資料與新產生的屬性結合成全新的資料集。接著利用一般普遍常使用的統計迴歸、倒傳遞類神經網路、支援向量迴歸等三個預測模型做驗證,以比較原始資料集與新資料集之間的預測能力。本研究選用兩個個案進行實驗例證,結果顯示本研究的方法不論從總誤差值、均方誤差等來觀察,都能有效降低,也代表能成功地有效提升小樣本的預測能力。

    In real life, we are often forced to use a quite small amount of information to do further predictions, because of the difficulty to obtain samples from real cases. Therefore, how to find out more important information in a small data set to enhance its forecast ability is an important research issue. Besides, we know the acquisition of each data in small data set is not easy, even there exist a few outliers, thus in this study, we give a new weight to each data to define the importance of it. The proposed methods consists of two parts, the first part is using complete-linkage in hierarchical clustering techniques to separate the original data into clusters. The second parts is to build up the weighted-based data transformations function, in which new attributes are computed using fuzzy membership functions obtained by the corresponding membership grades in each cluster. Two real cases are selected to compare the proposed forecasting model with linear regression, back-propagation neural network, and support vector machine for regression methods. The result shows that the proposed method has better performance than using the raw data with regard to the total error and mean square error.

    中文摘要 I Abstract II 目錄 III 圖目錄 V 表目錄 VI 1. 緒論 1 1.1研究背景與動機 1 1.2研究目的 3 1.3研究架構與流程 4 2. 文獻回顧 6 2.1小樣本文獻 6 2.2群集分析 8 2.2.1凝聚式階層分群 8 2.3預測模型 11 2.3.1線性迴歸 11 2.3.1倒傳遞類神經網路 12 2.3.1支援向量迴歸 14 3. 實驗方法 17 3.1完全連結分群方法 17 3.2整體趨勢擴展技術 19 3.3實驗流程 22 3.4實驗範例 24 4. 實驗例證 27 4.1案例一:薄膜電晶體液晶顯示面板(TFT-LCD) 27 4.1.1資料說明 27 4.1.2軟體選擇 27 4.1.3參數設定 28 4.1.4 TFT-LCD實驗結果 28 4.2案例二:積層陶瓷電容器(MLCC) 27 4.2.1資料說明 31 4.2.2MLCC實驗結果 32 5. 結論與建議 35 參考文獻 37

    Amari, S., & Wu, S. (1999). Improving support vector machine classifiers by modifying kernel functions. Neural Networks, 12(6), 783-789.
    Bresfelean, Vasile Paul. (2007). Analysis and predictions on students' behavior using decision trees in Weka environment. Paper presented at the Information Technology Interfaces, 2007. ITI 2007. 29th International Conference on.
    Donetti, L., & Munoz, M.A. (2004). Detecting network communities: a new systematic and efficient algorithm. Journal of Statistical Mechanics: Theory and Experiment, 2004(10), P10012.
    Gibbons, F.D., & Roth, F.P. (2002). Judging the quality of gene expression-based clustering methods using gene annotation. Genome research, 12(10), 1574-1581.
    Hasegawa, T., Sekine, S., & Grishman, R. (2004). Discovering relations among named entities from large corpora. Paper presented at the Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics.
    Huang, C., & Moraga, C. (2004). A diffusion-neural-network for learning from small samples. International Journal of Approximate Reasoning, 35(2), 137-161.
    Huang, Zheng, Lv, Cong-wei, Yang, Wen-ying, & Zhang, Jian-huan. (2009). Nonlinear modeling of piezoelectric actuator based on SVR. Paper presented at the Piezoelectricity, Acoustic Waves, and Device Applications (SPAWDA) and 2009 China Symposium on Frequency Control Technology, Joint Conference of the 2009 Symposium on.
    Johnson, S.C. (1967). Hierarchical clustering schemes. Psychometrika, 32(3), 241-254.
    Karypis, G., Han, E.H., & Kumar, V. (1999). Chameleon: Hierarchical clustering using dynamic modeling. Computer, 32(8), 68-75.
    Khan, NM, Ksantini, R., Ahmad, IS, & Boufama, B. (2012). A novel SVM+ NDA model for classification with an application to face recognition. Pattern Recognition, 45(1), 66-79.
    Kuiper, F.K., & Fisher, L. (1975). 391: A Monte Carlo comparison of six clustering procedures. Biometrics, 777-783.
    Li, D.C., Chang, C.C., & Liu, C.W. (2011a). Using structure-based data transformation method to improve prediction accuracies for small data sets. Decision Support Systems.
    Li, D.C., Chang, C.J., Chen, C.C., & Chen, W.C. (2011b). Forecasting short-term electricity consumption using the adaptive grey-based approach–an Asian case. Omega.
    Li, D.C., Chen, L.S., & Lin, Y.S. (2003). Using functional virtual population as assistance to learn scheduling knowledge in dynamic manufacturing environments. International Journal of Production Research, 41(17), 4011-4024.
    Li, D.C., Fang, Y.H., Liu, C.W., & Juang, C. (2012a). Using past manufacturing experience to assist building the yield forecast model for new manufacturing processes. Journal of Intelligent Manufacturing, 23(3), 857.
    Li, D.C., & Lin, Y.S. (2006). Using virtual sample generation to build up management knowledge in the early manufacturing stages. European Journal of Operational Research, 175(1), 413-434.
    Li, D.C., & Liu, C.W. (2012). Extending Attribute Information for Small Data Set Classification. Knowledge and Data Engineering, IEEE Transactions on, 24(3), 452-464.
    Li, D.C., Liu, C.W., & Hu, S.C. (2010). A learning method for the class imbalance problem with medical data sets. Computers in Biology and Medicine, 40(5), 509-518.
    Li, D.C., Wu, C.S., Tsai, T.I., & Lina, Y.S. (2007). Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Computers & Operations Research, 34(4), 966-982.
    Li, Der-Chiang, Chen, Wen-Chih, Liu, Chiao-Wen, & Lin, Yao-San. (2012b). A non-linear quality improvement model using SVR for manufacturing TFT-LCDs. Journal of Intelligent Manufacturing, 23(3), 835.
    Mallios, Nikolaos, Papageorgiou, Elpiniki, & Samarinas, Michael. (2011). Comparison of Machine Learning Techniques using the WEKA Environment for Prostate Cancer Therapy Plan. Paper presented at the Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), 2011 20th IEEE International Workshops on.
    McCulloch, W.S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of mathematical biology, 5(4), 115-133.
    Neter, J., Wasserman, W., & Kutner, M.H. (1996). Applied linear regression models (Vol. 3): Irwin Chicago.
    Niyogi, P., Girosi, F., & Poggio, T. (1998). Incorporating prior information in machine learning by creating virtual examples. Proceedings of the IEEE, 86(11), 2196-2209.
    Olson, Clark F. (1995). Parallel algorithms for hierarchical clustering. Parallel computing, 21(8), 1313-1325.
    Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (2002). Learning representations by back-propagating errors. Cognitive modeling, 1, 213.
    Sanchez, A., & David, V. (2003). Advanced support vector machines and kernel methods. Neurocomputing, 55(1), 5-20.
    Sneath, PHA. (1969). Evaluation of clustering methods. Numerical taxonomy, 257-270.
    Tsai, Tung-I, & Li, Der-Chiang. (2008). Utilize bootstrap in small data set learning for pilot run modeling of manufacturing systems. Expert Systems with Applications, 35(3), 1293-1300.
    Vapnik, V. (1999). The nature of statistical learning theory: springer.
    Wang, H.F., & Huang, C.J. (2009). Data Construction Method for the analysis of the spatial distribution of disastrous earthquakes in Taiwan. International Transactions in Operational Research, 16(2), 189-212.
    Yang, Haiyan, Zhou, Yongquan, & Liu, Hongxia. (2010). Chaos optimization svr algorithm with application in prediction of regional logistics demand. Advances in Swarm Intelligence, 58-64.
    Yang, T., & Kecman, V. (2009). Adaptive local hyperplane algorithm for learning small medical data sets. Expert Systems, 26(4), 355-359.

    下載圖示 校內:2018-06-11公開
    校外:2018-06-10公開
    QR CODE