簡易檢索 / 詳目顯示

研究生: 江裕群
Chiang, Yu-Chun
論文名稱: 產生基於模糊規則之屬性改善小樣本學習
Generating fuzzy-rule based attributes to improve small dataset learning
指導教授: 利德江
Li, Der-Chiang
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業與資訊管理學系
Department of Industrial and Information Management
論文出版年: 2015
畢業學年度: 103
語文別: 中文
論文頁數: 48
中文關鍵詞: 小樣本學習延伸樣本屬性模糊規則
外文關鍵詞: small dataset learning, extended sample attributes, fuzzy rules
相關次數: 點閱:67下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於近世紀全球化所造成的衝擊,快速的決策已成為一個挑戰,其所導致的結果為難以產出可靠的依據,其原因在於建模資料過於稀少,而此為常見之小樣本學習問題。小樣本學習主要是從少筆數樣本中獲取更多的訊息以提升學習的準確度,近年來多以虛擬樣本的產生來提升資料筆數;然而學習模型之表現不僅限於資料筆數的影響,資料的維度亦是另一個重要的因素。過往有學者單以資料在各屬性中的隸屬值作為新屬性的產生依據;然如此可能產生過多屬性而造成學習表現的反效果。因此本研究提出一套能整合與考量多屬性的方法流程,基於模糊規則的架構,在各屬性中建構模糊語意並視各資料點為模糊規則,如此能進行模糊集合的運算以取得模糊規則前鑑部隸屬值並視為樣本的新屬性。最後將產生的延伸屬性與原始資料之屬性統合以形成一個新的資料集,接著輸入該資料集於學習模型包含倒傳遞類神經網路(back propagation neural network)與支援向量迴歸(support vector regression)以進行預測模型的學習;實驗的部分則是以公司兩個個案資料進行,並以paired t檢定驗證本研究改善小樣本學習之效果。根據實驗結果顯示,本研究方法能更有效地降低小樣本學習之預測誤差。

    In today’s world market, companies need to make decisions fast and accurately. However, it’s difficult to collect sufficient sample in a short period of time. Therefore, extracting more information from small dataset and building robust models provide great values to the enterprise. Since the performance of predictive models is not purely influenced by the sample size, increasing data dimensionality can also enhance the learning to the learning models. Based on this concept, this paper proposes a method that focuses on creating new data attributes by using fuzzy operations for solving small dataset learning problems. Using the concept of fuzzy rules, the membership value of antecedents in each rule can be extracted from the data point. Therefore, in this research, those membership values will be deemed as new data features and the data dimensionality will be extended. To test the effectiveness of the proposed method, we not only compare the performance of our method with raw data, but also with the method that was used in the past studies. Two commonly used models will be built which include backpropagation neural network and support vector regression. To see how effective the proposed method is, paired t-test will be carried out. The experimental results show that our method can lower the error rate of the predictions.

    摘要 I Abstract II 誌謝 VI 目錄 VII 表目錄 IX 圖目錄 X 第一章 緒論 1 1.1 研究背景 1 1.2 研究動機 3 1.3 研究目的 4 1.4 研究流程與架構 4 第二章 相關文獻 7 2.1 小樣本學習 7 2.1.1 訊息擴散的概念 7 2.1.2 整體趨勢擴散之值域估計 8 2.1.3小樣本學習其他方法 10 2.2 資料屬性的前處理 12 2.3 建模工具 14 2.3.1 倒傳遞類神經網路 14 2.3.2 支援向量機與支援向量迴歸 15 第三章 研究方法 16 3.1 資料集解釋 16 3.2 資料群體的決定 16 3.2.1 fuzzy c-means 17 3.2.2 模糊側影係數 18 3.3 模糊語意的建構 19 3.3.1 基於盒鬚圖之值域推估 19 3.3.2 三角隸屬函數之建構 20 3.4 樣本延伸屬性與資料分割 21 3.4.1 樣本延伸屬性 21 3.4.2 資料分割 24 3.5 方法整體流程 25 第四章 實驗結果 27 4.1 資料集說明 27 4.1.1 案例一:Cell製程偏移問題 27 4.1.2 案例二:CF製程間隙物高度問題 29 4.2 環境設定 29 4.2.1 實驗的運作方式 29 4.2.2 預測誤差衡量量數與假設檢定 30 4.2.3 建模軟體與模型參數 31 4.3 實驗結果 31 4.3.1 個案一之實驗結果 31 4.3.2 個案二之實驗結果 36 4.4 小結 41 第五章 結論與未來研究 42 5.1 結論 42 5.2 未來研究 42 參考文獻 43

    王文俊 (2005)。認識FUZZY(第三版),全華圖書股份有限公司。
    洪維恩 (2013)。Matlab程式設計(第二版),旗標出版股份有限公司。
    施雅月,賴錦慧 (2007)。資料探勘,台灣培生教育出版股份有限公司。
    曾國立 (2014)。整合資訊擴散技術之適應性類神經模糊推論系統。高階管理碩士在職專班碩士論文,國立成功大學高階管理碩士在職專班(EMBA)
    黃文定 (2014)。使用基於屬性趨勢相似度生成之虛擬樣本建構液晶面板廠之高維度資料製造模式。博士論文,國立成功大學工業與資訊管理學系
    彭立中 (2011)。結合整體擴展技術及基因表示規劃法建構非線性相關虛擬樣本。碩士論文,國立成功大學工業與資訊管理學系
    Ahmed, S., Zhang, M., Peng, L., & Xue, B. (2014, July). Multiple feature construction for effective biomarker identification and classification using genetic programming. In Proceedings of the 2014 conference on Genetic and evolutionary computation (pp. 249-256). ACM.
    Adler, N., & Golany, B. (2001). Evaluation of deregulated airline networks using data envelopment analysis combined with principal component analysis with an application to Western Europe. European Journal of Operational Research 132(2), 260-273.
    Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences, 10(2), 191-203.
    Chao, G. Y., Tsai, T. I., Lu, T. J., Hsu, H. C., Bao, B. Y., Wu, W. Y., …& Lu, T. L. (2011). A new approach to prediction of radiotherapy of bladder cancers cels in small dataset analysis. Expert Systems With Applcations, 38(7), 7963-7969.
    Campello, R. J., & Hruschka, E. R. (2006). A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets and Systems, 157(21), 2858-2875.
    Chih-Chung Chang and Chinh-Jen Lin, LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27,2011.
    Chen, M. S., & Wang, S. W. (1999). Fuzzy clustering analysis for optimizing fuzzy membership functions. Fuzzy Sets and Systems, 103(2), 239-254.
    Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
    Drucker, H., Burges, C. J., Kaufman, L., Smola, A., & Vapnik, V. (1997). Support vector regression machines. Advances in Neural Information Processing Systems, 9, 155-161.
    Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap: New York: Chapmen & Hall.
    Fu, W., Johnston, M., & Zhang, M. (2014). Distribution-based invariant feature construction using genetic programming for edge detection. Soft computing, 1-19.
    Hughes, G. (1948). On the means accuracy of statistical pattern recognizers. Information Therory, IEEE Transactions on, 14(1), 55-63.
    Huang, C. (1997). Principle of information diffusion. Fuzzy Sets and Systems, 91(1), 69-90.
    Hofmann, T., Schölkopf, B., & Smola, A. J. (2008). Kernel methods in machine learning. The annals of stistics, 1171-1220.
    Huang, C., & Moraga, C. (2004). A diffusion-neural-network for learning from small samples. International Journal of Approximate Reasoning, 35(2), 137-161.
    Jang, J. S. (1993). ANFIS: adaptive-network-based fuzzy inference system. IEEE Transactions on Systems, Man and Cybernetics, 23(3), 665-685.
    Karahoca, A., & Karahoca, D. (2011). GSM churn management by using fuzzy c-means clustering and adaptive neuro fuzzy inference system. Expert Systems with applications, 38(3), 1814-1822.
    Li, Der-Chiang, Chih-Chieh Chang, and Chiao-Wen Liu. "Using structure-based data transformation method to improve prediction accuracies for small data sets." Decision Support Systems 52.3 (2012c): 748-756.
    Li, D. C., & Lin, L. S. (2013b). A new approach to assess product lifetime performance for small data sets. European Journal of Operational Research, 230(2), 290-298.
    Li, D. C., & Lin, L. S. (2014b). Generating information for small data sets with a multi-modal distribution. Decision Support Systems, 66, 71-81.
    Li, D. C., Lin, L. S., & Peng, L. J. (2014c). Imroving learning accuracy by using synthetic samples for small datasets with non-linear attributes dependency. Decision Support Systems, 59 286-295.
    Li, D. C., & Liu, C. W. (2012b). Extending attribute information for small data set classification. IEEE Transactions on Knowledge and Data Engineering, 24(3), 452-464.
    Li, D. C., Chen, C. C., Chang, C. J., & Chen, W. C. (2012a). Employing box-and-whisker plots for learning more knowledge in TFT-LCD pilot runs. International Journal of Production Research, 50(6), 1539-1553.
    Li, D. C., Chang, C. C., Liu, C. W., & Chen, W. C. (2013a). A new approach for manufacturing forecast problems with insufficient data: the case of TFT–LCDs. Journal of Intelligent Manufacturing, 24(2), 225-233.
    Li, D. C., Fang, Y. H., Lai, Y. Y., & Hu, S. C. (2009). Utilization of virtual samples to facilitate cancer identification for DNA microarray data in the early stages of an investigation. Information Sciences, 179(16), 2740-2753.
    Li, D. C., Hsu, H. C., Tsai, T. I., Lu, T. J., & Hu, S. C. (2007a). A new method to help diagnose cancers for small sample size. Expert Systems with Applications, 33(2), 420-424.
    Li, D. C., Huang, W. T., Chen, C. C., & Chang, C. J. (2014a). Employing box plots to build high-dimensional manufacturing models for new products in TFT-LCD plants. Neurocomputing, 142(0), 73-85.
    Li, D. C., Liu, C. W., & Hu, S. C. (2010b). A learning method for the class imbalance problem with medical data sets. Computers in biology and medicine, 40(5), 509-518.
    Li, D. C., Liu, C. W., & Hu, S. C. (2011). A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets. Artificial Intelligence in Medicine, 52(1), 45-52.
    Li, D. C., Wu, C. S., Tsai, T. I., & Lina, Y. S. (2007b). Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing sys tem scheduling knowledge. Computers & Operations Research, 34(4), 966-982.
    Li, D. C., Wu, C., & Chang, F. M. (2005). Using data-fuzzification technology in small data set learning to improve FMS scheduling accuracy. The International Journal of Advanced Manufacturing Technology, 27(3-4), 321-328.
    Luukka, P. (2008). Similarity classifier in diagnosis of bladder cancer. Computer methods and programs in biomedicine, 89(1), 43-49.
    Lillywhite, K., Lee, D. J., Tippetts, B., & Archibald, J. (2013). A feature construction method for general object recognition. Pattern Recognition, 46(12), 3300-3314.
    Motoda, H., & Liu, H. (2002). Feature selection, extraction and construction. Communication of IICM (Institute of Information and Computing Machinery, Taiwan) (5), 67-72.
    Pang-Ning, T., Steinbach, M., & Kumar, V. (2006). Introduction to data mining. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc.
    Piramuthu, S., & Sikora, R. T. (2009). Iterative feature construction for improving inductive learning algorithms. Expert Systems with Applications,36(2), 3401-3406.
    Rizoiu, M. A., Velcin, J., & Lallich, S. (2013). Unsupervised feature construction for improving data representation and semantics. Journal of Intelligent Information Systems, 40(3), 501-527.
    Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, 20, 53-65.
    Rummelhart, D. (1986). Learning representations by back-propagating errors. Nature, 323(9), 533-536.
    Tafarroj, M. M., Kalani, H., Moavenian, M., & Ghanbarzadeh, A. (2014). An application of principal compoent analysis method in wood defects identification. Journal of the Indian Academy of Wood Science, 11(1), 33-38.
    Tukey, J. W. (1977). Exploratory data analysis: Reading (MA): Addison-Wesley.
    Tsai, T. I., & Li, D. C. (2008). Utilize bootstrap in small data set learning for pilot run modeling of manufacturing systems. Expert Systems with Applications,35(3), 1293-1300.
    Wold, S., Esbensen, K., & Geladi, P. (1987). Principal component analysis.Chemometrics and intelligent laboratory systems, 2(1), 37-52.

    無法下載圖示 校內:2025-12-31公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE