| 研究生: |
陳怡文 Chen, Yi-Wen |
|---|---|
| 論文名稱: |
使用相依合成樣本輔助小樣本學習 Employing Dependently Synthesized Samples to Facilitate Small Data Learning |
| 指導教授: |
利德江
Li, Der-Chiang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 工業與資訊管理學系 Department of Industrial and Information Management |
| 論文出版年: | 2018 |
| 畢業學年度: | 106 |
| 語文別: | 中文 |
| 論文頁數: | 38 |
| 中文關鍵詞: | 小樣本 、虛擬樣本 、名目屬性 、可能性分配 |
| 外文關鍵詞: | Small dataset learning, Virtual sample generation, Nominal attributes |
| 相關次數: | 點閱:92 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
因科技快速進步以及企業全球化影響,如何有效控制製造系統是企業必須面對的重要課題。該如何在新產品開發初期利用有限的樣本量找出重要參數、減少試產次數以降低成本,成為相當重要的議題。針對小樣本問題的學習,過去研究中較常見的處理方法為虛擬樣本產生法(Virtual Sample Generation, VSG),然而過去的產生法大多針對數值屬性的樣本資料,較少處理混合名目屬性及數值屬性之樣本資料。本研究提出名目屬性及數值屬性之間相依合成產生虛擬樣本的方法,首先考慮數值屬性的趨勢相似度,相依生成虛擬樣本中數值屬性部分的值。並依序建構每一個數值屬性和名目類別值之間的模糊關係函數;找出名目屬性的所有類別組合以及組合與每一數值屬性之間的模糊關係。最後在給定數值屬性值的情況下,計算名目屬性的所有類別組合之可能性值,並使用 篩選具有較高可能性值的組合為虛擬樣本中名目屬性的部分。
實驗中將利用五個公開的數據集進行有效性驗證,並將整合拔靴法(Bootstrap Aggregating, Bagging)及合成少數超採樣技術(Synthetic Minority Over-sampling Technique, SMOTE)兩種虛擬樣本產生法做為實驗對照組。利用支援向量迴歸(Support Vector Regression, SVR)以及倒傳遞類神經網路(Back-propagation Neural Network, BPN)兩種演算法進行模型建構。而實驗結果顯示利用本研究提出的方法學習具名目輸入屬性的小樣本資料,優於使用Bagging 及SMOTE方法。
Due to the rapid progress of technology and globalization, how to control the manufacturing system effectively is an important issue that companies must confront. It is important to find pivotal parameters in pilot-run of new product and to decrease the number of trials to reduce costs with limited sample size. Present research deal small data learning problem with virtual sample generation methods (VSG).
This study proposed a VSG method for generating virtual sample data with dependently synthesized samples that contain numerical and nominal attributes. Firstly, generate numerical virtual samples by considering trend similarity of numerical attributes in small dataset. Secondly, construct the fuzzy relations between each numerical attribute and nominal categories sequentially. And find all category combinations of all nominal attributes and their fuzzy relations with each numerical attributes. Calculate the possibility of all category combinations when numerical value is given. And filter the possibility of category combination with to be the nominal virtual samples.
Five public datasets and two learning models will be used to test the efficiency of proposed method, including support vector regression and back-propagation neural network. Take bootstrap aggregating and synthetic minority over-sampling technique as control group. The result of experiment shows that the propose method is better than other VSG methods.
Bertsimas, D., & Dunn, J. (2017). Optimal classification trees. Machine Learning, 106(7), 1039-1082.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123-140. doi:10.1007/bf00058655
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.
Chen, J. Y., Reilly, R. R., & Lynn, G. S. (2012). New Product Development Speed: Too Much of a Good Thing? Journal of Product Innovation Management, 29(2), 288-303. doi:10.1111/j.1540-5885.2011.00896.x
Conroy, B., Eshelman, L., Potes, C., & Xu-Wilson, M. (2016). A dynamic ensemble approach to robust classification in the presence of missing data. Machine Learning, 102(3), 443-463.
Cortes, C., & Vapnik, V. (1995). Support-Vector Networks. Machine Learning, 20(3), 273-297. doi:Doi 10.1023/A:1022627411411
Cost, S., & Salzberg, S. (1993). A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features. Machine Learning, 10(1), 57-78. doi:10.1023/a:1022664626993
Drucker, H., Burges, C. J., Kaufman, L., Smola, A. J., & Vapnik, V. (1997). Support vector regression machines. Paper presented at the Advances in neural information processing systems.
Dubois, D., Foulloy, L., Mauris, G., & Prade, H. (2004). Probability-possibility transformations, triangular fuzzy sets, and probabilistic inequalities. Reliable computing, 10(4), 273-297.
Dubois, D., & Prade, H. (2012). Possibility theory. In Computational complexity (pp. 2240-2252): Springer.
Dubois, D., Prade, H., & Sandri, S. (1993). On possibility/probability transformations. Fuzzy logic, 103-112.
Efron, B., & Tibshirani, R. (1993). An introduction to the bootstrap. New York: Chapman & Hall.
Flage, R., Baraldi, P., Zio, E., & Aven, T. (2013). Probability and Possibility‐Based Representations of Uncertainty in Fault Tree Analysis. Risk analysis, 33(1), 121-133.
Huang, C., & Moraga, C. (2004). A diffusion-neural-network for learning from small samples. International Journal of Approximate Reasoning, 35(2), 137-161.
Huang, C. F. (1997). Principle of information. Fuzzy Sets and Systems, 91(1), 69-90.
Huang, C. F., & Moraga, C. (2004). A diffusion-neural-network for learning from small samples. International Journal of Approximate Reasoning, 35(2), 137-161. doi:10.1016/j.ijar.2003.06.001
Ivănescu, V., Bertrand, J. W. M., Fransoo, J. C., & Kleijnen, J. P. (2006). Bootstrapping to solve the limited data problem in production control: an application in batch process industries. Journal of the Operational Research Society, 57(1), 2-9.
Jang, J. S. R. (1993). Anfis - Adaptive-Network-Based Fuzzy Inference System. Ieee Transactions on Systems Man and Cybernetics, 23(3), 665-685. doi:Doi 10.1109/21.256541
Li, D. C., Chen, C. C., Chang, C. J., & Chen, W. C. (2012). Employing box-and-whisker plots for learning more knowledge in TFT-LCD pilot runs. International Journal of Production Research, 50(6), 1539-1553. doi:10.1080/00207543.2011.555430
Li, D. C., Lin, W. K., Lin, L. S., Chen, C. C., & Huang, W. T. (2017). The attribute-trend-similarity method to improve learning performance for small datasets. International Journal of Production Research, 55(7), 1898-1913. doi:10.1080/00207543.2016.1213447
Li, D. C., Wu, C. S., & Chang, F. M. M. (2005). Using data-fuzzification technology in small data set learning to improve FMS scheduling accuracy. International Journal of Advanced Manufacturing Technology, 27(3-4), 321-328. doi:10.1007/s00170-003-2184-y
Li, D. C., Wu, C. S., Tsai, T. I., & Chang, F. M. M. (2006). Using mega-fuzzification and data trend estimation in small data set learning for early FMS scheduling knowledge. Computers & Operations Research, 33(6), 1857-1869. doi:10.1016/S0305-0548(04)00324-7
Li, D. C., Wu, C. S., Tsai, T. I., & Lina, Y. S. (2007). Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Computers & Operations Research, 34(4), 966-982. doi:10.1016/j.cor.2005.05.019
Li, D. C., Yeh, C. W., & Chang, C. J. (2009). An improved grey-based approach for early manufacturing data forecasting. Computers & Industrial Engineering, 57(4), 1161-1167. doi:10.1016/j.cie.2009.05.005
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning Representations by Back-Propagating Errors. Nature, 323(6088), 533-536. doi:DOI 10.1038/323533a0
Sánchez A, V. D. (2003). Advanced support vector machines and kernel methods. Neurocomputing, 55(1-2), 5-20.
Tukey, J. W. (1977). Exploratory data analysis.
Witten, I. H., & Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques: Morgan Kaufmann Publishers Inc.
Zadeh, L. A. (1978). Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems, 1(1), 3-28.
吳則澍. (2016). 基於名目屬性之虛擬樣本產生法. 成功大學, Available from Airiti AiritiLibrary database. (2016年)