研究生: |
凌偉珊 Ling, Wei-Shan |
---|---|
論文名稱: |
建立一個新的虛擬樣本產生技術學習小樣本資料 Constructing a new virtual sample generation technique for small dataset learning |
指導教授: |
利德江
Li, Der-Chiang |
學位類別: |
碩士 Master |
系所名稱: |
管理學院 - 工業與資訊管理學系碩士在職專班 Department of Industrial and Information Management (on the job class) |
論文出版年: | 2016 |
畢業學年度: | 104 |
語文別: | 中文 |
論文頁數: | 52 |
中文關鍵詞: | 小樣本資料 、虛擬樣本產生法 、軟性DBSCAN 、整體趨勢擴散法 |
外文關鍵詞: | small data, virtual sample generation, soft DBSCAN, Mega-trend diffusion |
相關次數: | 點閱:85 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於網路世代的興起,資訊傳遞快速且更多元,大數據成為這幾年最熱門的討論話題,很多學者提出不同面向的研究,除了大數據問題外,數據缺乏的小樣本問題也常在日常周圍發生,例如新產品導入工程階段、新機台新製程參數制定、傳染病的流行、毀滅性災難的發生、預估氣候變化等,歸納以上的問題都有一些共通的特性,像是資料取得不容易或者取得成本過高,導致讓專家難以做進一步相關的分析與預測。因此在數據缺乏的情況下,如何從取得不易的資料中擷取出更多有意義的資訊提供參考,在近幾年已成為另一個研究的議題。
而虛擬樣本產生法已被驗證是一種有效解決小樣本問題的方法,其中主要的技術為整體趨勢擴散法(Mega-trend diffusion, MTD),其主要的定義是假設資料是一個單峰的分佈並考量偏態的狀況,但真實的資料母體分佈可能為多峰態型態,且資料分佈並非都是簡單分佈。為了解決以上所提到的問題,本研究提出一個無母數多峰態虛擬樣本產生法,利用軟性DBSCAN群集法先對小樣本的數據做資料的前置處理,從中擷取出最大量且有用的前處理資訊,接著利用MTD演算法估計每群資料的範圍,藉以產生虛擬樣本提供後續資料預測時使用。
Since the rise of Generation Network, big data has become the hottest topic issue even small data recently. It is difficult to do further analysis and prediction due to small data is not easy to obtain and high cost. Virtual sample generation method proved an effective way to solve small data problem. The main technique is Mega-trend diffusion (MTD) that defined database on status of uniform distribution and skewness. These studies propose a non-parametric multi-modal virtual sample generation for multi-modal population. After running data preprocess, it will capture the maximum and useful data by using soft DBSCAN cluster method. Using estimated data range by MTD Algorithm and generate virtual sample for prediction.
Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In proceeding of 2nd International Conference on Knowledge Discovery, 8(3), 338-353.
Efron, B. (1979). Bootstrap Methods:Another Look at the Jackknife. The Annals of Statistics, 7,1-26.
Huang, C. F. (1997). Principle of information. Fuzzy Sets and Systems, 91(1), 69-90.
Huang, C. F., & Moraga, C. (2004). A diffusion-neural-network for learning from small samples. International Journal of Approximate Reasoning, 35(2), 137-161.
Ivănescu, V. C., Bertrand, J. W. M., Fransoo, J. C., & Kleijnen, J. P. C. (2006). Bootstrapping to solve the limited data problem in production control: an application in batch process industries. Journal of the Operational Research Society, 57(1), 2-9.
Jang, J. S. R. (1993). ANFIS: adaptive-network-based fuzzy inference system. IEEE Transactions on Systems, Man and Cybernetics, 23(3), 665-685.
Li, D. C., & Liu, C. W. (2010). A class possibility based kernel to increase classification accuracy for small data sets using support vector machines. Expert Systems with Applications, 37(4), 3104-3110.
Li, D. C., Chang, C. C., & Liu, C. W. (2012a). Using structure-based data transformation method to improve prediction accuracies for small data sets. Decision Support Systems, 52, 748-756.
Li, D. C., Chen, C. C., Chen, W. C., & Chang, C. J. (2012b). Employing Box-and-Whisker plots for learning more knowledge in TFT-LCD pilot runs. International Journal of Production Research, 50(6), 1539-1553.
Li, D. C., Chen, L. S., & Lin, Y. S. (2003). Using Functional Virtual Population as assistance to learn scheduling knowledge in dynamic manufacturing environments. International Journal of Production Research, 41(17), 4011-4024.
Li, D. C., Fang, Y. H., Lai, Y. Y., & Hu, S. C. (2009). Utilization of virtual samples to facilitate cancer identification for DNA microarray data in the early stages of an investigation. Information Sciences, 179(16), 2740-2753.
Li, D. C., Wu, C. S., & Chang, F. M. (2005). Using data-fuzzification technology in small data set learning to improve FMS scheduling accuracy. International Journal of Advanced Manufacturing Technology, 27(3-4), 321-328.
Li, D. C., Wu, C. S., Tsai, T. I., & Chang, F. M. (2006). Using mega-fuzzification and data trend estimation in small data set learning for early FMS scheduling knowledge. Computers & Operations Research, 33(6), 1857-1869.
Li, D. C., Wu, C. S., Tsai, T. I., & Lina, Y. S. (2007). Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Computers & Operations Research, 34(4), 966-982.
Li, D. C., & Lin, L. S. (2014a). Generating Information for Small Data Sets with a Multi-modal Distribution. Decision Support Systems, 66, 71–81.
Li, D. C., & Wen, I. H., (2014b). A Genetic Algorithm-Based Virtual Sample Generation Technique to Improve Small Data Set Learning. Neurocomputing, 143(2), 222–230.
Li, D. C., Chen, W. C., Chang, C. J., Chen, C. C., & Wen, I. H., (2015). Practical Information Diffusion Techniques to Accelerate New Product Pilot Runs. International Journal of Production Research, 53(7), 5310-5319
Li, D. C., Wen, I. H., & Chen, W. C., (2016), A Novel Data Transformation Model for Small Dataset Learning. International Journal of Production Research (In press).
MacQueen, J. B. (1967). Some methods for classfication and analysis of multivariate observations, Proceeding of the fifth Berkley Symposium on Mathematical Statistics and Probability, University of California Press, 281-297.
Martin, C. A. & Witt, S.F., (1989b). Accuracy of econometric forecasts of Tourism. Annals of Tourism Research, 16, 407-428.
Niyogi, P., Girosi, F., & Poggio, T. (1998). Incorporating prior information in machine learning by creating virtual examples. Proceedings of the IEEE, 86(11), 2196-2209.
Smiti, A., & Eloudi, Z. (2013). Soft DBSCAN: Improving DBSCAN Clustering Method using fuzzy set theory. Paper presented at the Human System Interaction (HSI), 2013 The 6th International Conference, Sopot.
Tan, P. N., Steinbach, M., & Kumar, V. (2005). Introduction to Data Mining (1st ed.): Addison-Wesley.
Tukey, J. W. (1977). Exploratory data analysis: Reading (MA): Addison-Wesley.
Wang, H. F., & Huang, C. J. (2009). Data construction method for the analysis of the spatial distribution earthquakes in Taiwan. International Transactions in Operational Research, 16(2), 188-212.
Wu, C. W., Shu, M. H., Pearn, W. L., & Liu, K. H. (2008). Bootstrap approach for supplier selection based on production yield. International Journal of Production Research, 46(18), 5211-5230.
Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8, 338-353.