| 研究生: |
陳建頲 Chen, Chien-Ting |
|---|---|
| 論文名稱: |
整合以模糊分群擷取之事前知識的資訊擴散方法學習小樣本資料 Incorporating Prior Knowledge Abstracted by Fuzzy Clustering in Information Diffusion Methods for Learning Small Data Sets |
| 指導教授: |
利德江
Li, De-Jiang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 工業與資訊管理學系碩士在職專班 Department of Industrial and Information Management (on the job class) |
| 論文出版年: | 2013 |
| 畢業學年度: | 101 |
| 語文別: | 中文 |
| 論文頁數: | 75 |
| 中文關鍵詞: | 小樣本資料 、虛擬樣本 、模糊分群 、模糊側影係數 |
| 外文關鍵詞: | Small sample, virtual sample, fuzzy C-means, Silhouette Coefficient |
| 相關次數: | 點閱:104 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
小樣本學習問題,常發生於系統建置初期資料取得困難或以及取得成本過高卻必須從事學習之情況,因此如何從中學習更多有意義的資訊,於近年已成為研究的課題。在過往學習方法中,虛擬樣本產生法已驗證為有效的方法之一,然其中如整體趨勢(mega-trend-diffusion, MTD)技術缺乏對資料進行事前分析,其於母體推論時雖考量其偏態但欠缺母體存在多峰之可能。因此本研究在MTD產生虛擬樣本前,先藉由模糊分群法Fuzzy C-means以及模糊側影係數學習小樣本分布狀況之事前資訊。在實驗階段,本研究以兩筆從電子製造業取得之真實案例進行驗證,結果顯示當訓練樣本加入本研究方法所產生之虛擬樣本後,確實較加入MTD所產生者在倒傳遞類神經網路上有更佳之預測準確度,亦表示此種基於事前資訊所建構的樣本分配,確能改善產生之虛擬樣本品質。
Small dataset learning problems usually occurs when systems are constructed at the beginning stage, where samples are hard to obtain or the collecting cost is extremely high, and there still exists the necessary to achieve the learning tasks. Therefore, learning more meaningful information from small datasets has become an important issue in recent years. Creating more virtual samples to increase the data sizes has been demonstrated to be an effective approach when leaning small datasets. However, the mega-trend-diffusion technique (MTD) lacks the prior analysis of data. Although MTD takes the skewness into account when inferring the population distributions, the possibility that the populations have multiple peaks is absent. Accordingly, this research obtains the prior distribution information of small datasets by using the fuzzy C-means and the fuzzy silhouette coefficients before the MTD begins to generate samples. In the experiments, two real cases taken from the electronics industry are examined. The results show that when the training sets contain the samples created by the proposed method, the predictions of the Back-Propagation Neural Networks (BPN) are more accurate than those when the training sets contain the samples created by MTD. This further demonstrates the quality of virtual samples is thus improved when those are generated from the distributions which are reconstructed based on the prior information
Anthony, M., & Biggs, N. (1997). Computational Learning Theory: Cambridge University Press.
Bezdek, J. C. (1973). Fuzzy Mathematics in Pattern Classification, PhD Thesis, Cornell University, Ithaca, NY.
Campello, R. J. G. B., & Hruschka, E. R. (2006). A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets and Systems, 157(21), 2858-2875.
Efron, B., & Tibshirani, R. J. (1993). An Introduction to the Bootstrap: New York: Chapmen & Hall.
Guo, G. D., & Dyer, C. R. (2005). Learning from examples in the small sample case: Face expression recognition. IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics, 35(3), 477-488.
Hong, T. P., Tseng, L. H., & Chien, B. C. (2010). Mining from incomplete quantitative data by fuzzy rough sets. Expert Systems with Applications, 37(3), 2644-2653.
Huang, C. F. (1997). Principle of information. Fuzzy Sets and Systems, 91(1), 69-90.
Huang, C. F., & Moraga, C. (2004). A diffusion-neural-network for learning from small samples. International Journal of Approximate Reasoning, 35(2), 137-161.
Huang, C. J., Wang, H. F., Chiu, H. J., Lan, T. H., Hu, T. M., & Loh, E. W. (2010). Prediction of the Period of Psychotic Episode in Individual Schizophrenics by Simulation-Data Construction Approach. Journal of Medical Systems, 34(5), 799-808.
Ivănescu, V. C., Bertrand, J. W. M., Fransoo, J. C., & Kleijnen, J. P. C. (2006). Bootstrapping to solve the limited data problem in production control: an application in batch process industries. Journal of the Operational Research Society, 57(1), 2-9.
Jang, J. S. R. (1993). ANFIS: adaptive-network-based fuzzy inference system. IEEE Transactions on Systems, Man and Cybernetics, 23(3), 665-685.
Jennrich, R. I., & Schluchter, M. D. (1986). Unbalanced repeated-measures models with structured covariance matrices. Biometrics, 42(4), 805-820.
Karalic, A. (1992). Employing linear regression in regression tree leaves. Paper presented at the Proceedings of the 10th European Conference on Artificial Intelligence, Vienna, Austria.
Kuo, Y., Yang, T., Peters, B. A., & Chang, I. (2007). Simulation metamodel development using uniform design and neural networks for automated material handling systems in semiconductor wafer fabrication. Simulation Modelling Practice and Theory, 15(8), 1002-1015.
Laird, N. M., & Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics, 38(4), 963-974.
Lanouette, R., Thibault, J., & Valade, J. L. (1999). Process modeling with neural networks using small experimental datasets. Computers & Chemical Engineering, 23(9), 1167-1176.
Li, D. C., Chen, C. C., Chang, C. J., & Chen, W. C. (2012a). Employing Box-and-Whisker plots for learning more knowledge in TFT-LCD pilot runs. International Journal of Production Research, 50(6), 1539-1553.
Li, D. C., Chen, C. C., Chang, C. J., & Lin, W. K. (2012b). A Tree-based-Trend-Diffusion prediction procedure for small sample sets in the early stages of manufacturing systems. Expert Systems with Applications, 39(1), 1575-1581.
Li, D. C., Chen, C. C., Chen, W. C., & Chang, C. J. (2012c). Employing dependent virtual samples to obtain more manufacturing information in pilot runs. International Journal of Production Research, 50(23), 6886-6903.
Li, D. C., Chen, L. S., & Lin, Y. S. (2003). Using Functional Virtual Population as assistance to learn scheduling knowledge in dynamic manufacturing environments. International Journal of Production Research, 41(17), 4011-4024.
Li, D. C., Fang, Y. H., Lai, Y. Y., & Hu, S. C. (2009a). Utilization of virtual samples to facilitate cancer identification for DNA microarray data in the early stages of an investigation. Information Sciences, 179(16), 2740-2753.
Li, D. C., Hsu, H. C., Tsai, T. I., Lu, T. J., & Hu, S. C. (2007a). A new method to help diagnose cancers for small sample size. Expert Systems with Applications, 33(2), 420-424.
Li, D. C., & Lin, Y. S. (2006). Using virtual sample generation to build up management knowledge in the early manufacturing stages. European Journal of Operational Research, 175(1), 413-434.
Li, D. C., & Liu, C. W. (2010). Extending Attribute Information for Small Data Set Classification. IEEE Transactions on Knowledge and Data Engineering, 24(3), 452-464.
Li, D. C., Tsai, T. I., & Shi, S. (2009b). A prediction of the dielectric constant of multi-layer ceramic capacitors using the mega-trend-diffusion technique in powder pilot runs: case study. International Journal of Production Research, 47(1), 51-69.
Li, D. C., Wu, C. S., & Chang, F. M. M. (2005). Using data-fuzzification technology in small data set learning to improve FMS scheduling accuracy. International Journal of Advanced Manufacturing Technology, 27(3-4), 321-328.
Li, D. C., Wu, C. S., Tsai, T. I., & Chang, F. M. M. (2006). Using mega-fuzzification and data trend estimation in small data set learning for early FMS scheduling knowledge. Computers & Operations Research, 33(6), 1857-1869.
Li, D. C., Wu, C. S., Tsai, T. I., & Lina, Y. S. (2007b). Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Computers & Operations Research, 34(4), 966-982.
Li, D. C., & Yeh, C. W. (2008). A non-parametric learning algorithm for small manufacturing data sets. Expert Systems with Applications, 34(1), 391-398.
Li, D. C., Yeh, C. W., & Chang, C. J. (2009c). An improved grey-based approach for early manufacturing data forecasting. Computers & Industrial Engineering, 57(4), 1161-1167.
Lin, Y. S., & Li, D. C. (2010). The Generalized-Trend-Diffusion modeling algorithm for small data sets in the early stages of manufacturing systems. European Journal of Operational Research, 207(1), 121-130.
MacQueen, J. B. (1967). Some Methods for classification and Analysis of Multivariate Observations. Paper presented at the Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley.
Niyogi, P., Girosi, F., & Poggio, T. (1998). Incorporating prior information in machine learning by creating virtual examples. Proceedings of the IEEE, 86(11), 2196-2209.
Oniśko, A., Druzdzel, M. J., & Wasyluk, H. (2001). Learning Bayesian network parameters from small data sets: application of Noisy-OR gates. International Journal of Approximate Reasoning, 27(2), 165-182.
Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics, 20, 53–65.
Sugeno, M., & Kang, G. T. (1988). Structure identification of fuzzy model, Fuzzy Sets and Systems, 28, 15–33.
Thomas, M., Kanstein, A., & Goser, K. (1997). Rare fault detection by possibilistic reasoning. Paper presented at the In Proceedings of Fuzzy Days, Reusch, Bernd, Berlin.
Tsai, T. I., & Li, D. C. (2008a). Approximate modeling for high order non.-linear functions using small sample sets. Expert Systems with Applications, 34(1), 564-569.
Tsai, T. I., & Li, D. C. (2008b). Utilize bootstrap in small data set learning for pilot run modeling of manufacturing systems. Expert Systems with Applications, 35(3), 1293-1300.
Tukey, J. W. (1977). Exploratory data analysis: Reading (MA): Addison-Wesley.
Vapnik, V. N. (2000). The Nature of Statistical Learning Theory: Springer, New York.
Wang, H.-F., & Huang, C.-J. (2009). Data construction method for the analysis of the spatial distribution of disastrous earthquakes in Taiwan. International Transactions in Operational Research, 16(2), 189-212.
Wang, Y., & Witten, I. (1997). Inducing Model Trees for Continuous Classes. Paper presented at the Proceedings of the Poster Papers of the European Conference on Machine Learning, Prague, Czech Republic.
Willemain, T. R., Bress, R. A., & Halleck, L. S. (2003). Enhanced simulation inference using bootstraps of historical inputs. IIE Transactions, 35(9), 851-862.
Zadeh, L. A. (1965). Fuzzy sets, Information and Control, 8, 338-353.