研究生: |
吳則澍 Wu, Tse-Shu |
---|---|
論文名稱: |
基於名目屬性之虛擬樣本產生法 Virtual Sample Generation Based on Nominal Attributes |
指導教授: |
利德江
Li, Der-Chiang |
學位類別: |
碩士 Master |
系所名稱: |
管理學院 - 工業與資訊管理學系 Department of Industrial and Information Management |
論文出版年: | 2016 |
畢業學年度: | 104 |
語文別: | 中文 |
論文頁數: | 45 |
中文關鍵詞: | 小樣本學習 、虛擬樣本產生 、名目屬性 |
外文關鍵詞: | small dataset learning, virtual sample generation, nominal attributes |
相關次數: | 點閱:105 下載:2 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著全球化競爭日益激烈,科技產品生命週期普遍縮短,透過減少試產階段的時間與成本係增加企業競爭力的方法之一,但同時也會導致小樣本學習問題。小樣本學習對於製造系統初期有著至關眾大的影響,然而一般的統計方法在遭遇樣本數量過少時並無法直接對其進行有效的分析與解釋。為了解決此問題,虛擬樣本產生法應運而生,而且已經被證實可以有效地克服小樣本學習問題,無論是在機器學習領域或是實務界的運用皆可看到其蹤影。本論文中同樣是基於虛擬樣本產生的概念,提出一針對名目屬性之新式虛擬樣本產生法,以觀測名目屬性值之出現次數,搭配模糊隸屬函數進行母體值域推估;此方法不同於以往虛擬樣本產生法需假設屬性間相互獨立以及僅能處理數值屬性的限制,更凸顯了其普遍性。研究中分別以純名目屬性資料集以及混合屬性資料集進行平均絕對誤差和分類準確率的評比,實驗結果顯示可有效地降低數值預測問題之誤差與提升分類問題之正確率,並達到統計上的顯著,說明了本研究方法確實對應小樣本學習有更佳的表現。
As the global competition getting more and more intense, it also leads to the shorter product life cycle. Reducing the time and cost of pilot-run can enhance the competitive ability of enterprises effectively, somehow the small dataset learning problems will also occur as the same time. There exists no appropriate statistics tool to evaluate the population when the sample size is too small, but we can fix the problem through virtual sample generation methods, which is widely used in industry and machine learning area. There are very few studies deal with nominal attributes due to the limit on domain estimation methods, therefore, this paper proposes a method that generate virtual sample based on the discrete degree of nominal attributes, then estimate the general population domain by fuzzy membership function. Two learning models will be used to test the efficiency of proposed method, including backpropagation neural network and support vector regression, and then the Wilcoxon-sign test will be used to test the difference with raw dataset. The result shows that the propose method can reduce the mean absolute error (MAE) as well as enhance classification accuracy by generating nominal virtual samples.
洪書帆 (2010)。以潛在樣本提升小樣本學習之正確性。碩士論文。國立成功大學工業與資訊管理學系。
Cortes, C., & Vapnik, V. (1995). Support-Vector Networks. Machine Learning,20(3), 273-297.
Efron, B., & Tibshirani, R. J. (1993). An Introduction to the Bootstrap:New York: Chapman & Hall.
Fisher, R. A. (1935). The design of experiments (1966). Oliver and Boyd, London.
Huang, C. F. (1997). Principle of information diffusion. Fuzzy Sets and Systems, 91(1), 69-90.
Huang, C. F., & Moraga, C. (2004). A diffusion-neural-network for learning from small samples. International Journal of Approximate Reasoning, 35(2), 137-161.
Ivănescu, V. C., Bertrand, J. W. M., Fransoo, J. C., & Kleijnen, J. P. C. (2006). Bootstrapping to solve the limited data problem in production control: an application in batch process industries. Journal of the Operational Research Society, 57(1), 2-9.
Jang, J. S. R. (1993). ANFIS: adaptive-network-based fuzzy inference system. IEEE Transactions on Systems, Man and Cybernetics, 23(3), 665-685.
Li, D. C., Chen, C. C., Chang, C. J., & Chen, W. C. (2012b). Employing Box-and-Whisker plots for learning more knowledge in TFT-LCD pilot runs. International Journal of Production Research, 50(6), 1539-1553.
Li, D.C., Chen, L.S., Lin, Y.S, (2003). Using Functional Virtual Population as assistance to learn scheduling knowledge in dynamic manufacturing environments. International Journal of Production Research, 41(17), 4011-4024.
Li, D.C., Hsu, H.C., Tsai, T.I., Lu, T.J., & Hu, S.C. (2007a). A new method to help diagnose cancers for small sample size. Expert Systems with Applications, 33(2), 420-424.
Li, D. C., Huang, W. T., Chen, C. C., & Chang, C. J. (2014). Employing box plots to build high-dimensional manufacturing models for new products in TFT-LCD plants. Neurocomputing, 142(0), 73-85.
Li, D.C., and Lin, Y.S. (2006a). Learning management knowledge for manufacturing systems in the early stages using time series data. European Journal of Operational Research, 184(1), 169-184.
Li, D.C., and Liu, C.W. (2012a). Extending Attribute Information for Small Data Set Classification. IEEE Transactions on Knowledge and Data Engineering, 24(3), 452-464.
Li, D.C., Liu, C.W., & Hu, S.C. (2011). A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets. Artificial Intelligence in Medicine, 52, 45-52.
Li, D.C., and Wen. I.H. (2014). A genetic algorithm-based virtual sample generation technique to improve small data set learning. Neurocomputing, 143, 220-230.
Li, D.C., Wu, C.S., & Chang, F.M. (2005). Using data-fuzzification technology in small data set learning to improve FMS scheduling accuracy. International Journal of Advanced Manufacturing Technology, 27(3), 321-328.
Li, D.C., Wu, C.S., Tsai, T.I., & Chang, F.M. (2006b). Using mega-fuzzification and data trend estimation in small data set learning for early FMS scheduling knowledge. Computers & Operations Research, 33(6), 1857-1869.
Li, D.C., Wu, C.S., Tsai, T.I., & Lina, Y.S. (2007b). Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Computers & Operations Research, 34(4), 966-982.
Li, D.C. and Yeh, C.W. (2013). A non-parametric learning algorithm for small manufacturing data sets. Expert Systems with Applications, 34, 391-398.
M. Kudo and J. Sklansky. (2000). Comparison of algorithms that select features for pattern classifiers. Pattern Recognition, 33, 25-41
Niyogi, P., Girosi, F., & Poggio, T. (1998). Incorporating prior information in machine learning by creating virtual examples. Proceedings of the IEEE, 86(11), 2196-2209.
Wang, H. F., & Huang, C. J. (2009). Data construction method for the analysis
Of the spatial distribution of disastrous earthquakes in Taiwan. International Transactions in Operational Research, 16(2), 189-212.
Wang, Y. F. (2003). On-demand forecasting of stock prices using a real-time predictor. IEEE Transactions on Knowledge and Data Engineering, 15(4), 1033-1037.