簡易檢索 / 詳目顯示

研究生: 郭皇志
Guo, Huang-Jhih
論文名稱: 晶圓切割之不平衡資料集的分類
Classification of the Imbalanced Data Sets Collected from Wafer Sawing
指導教授: 翁慈宗
Wong, Tzu-Tsung
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業與資訊管理學系碩士在職專班
Department of Industrial and Information Management (on the job class)
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 50
中文關鍵詞: 破刀資料分類法F度量值不平衡類別資料
外文關鍵詞: broken blade, classification, F-measure, imbalanced data set
相關次數: 點閱:104下載:8
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 晶粒為該晶圓切割工作站點之重要產出,本研究目的欲利用後製程的晶圓光學檢測結果,建構前製程切割工作站點預測模型,探討晶圓切割過程中產生缺點模式『破刀』的主要因素。檢視收集的兩個月製程資料,發現母體晶圓樣本數為52,651筆,而晶圓破刀樣本數為342筆,多數類別樣本數約略為少數類別樣本數的154倍,故此為典型不平衡資料集。本研究以資料探勘分類方法建構分類預測模型並比較,使用決策樹、K最鄰近法、RIPPER分類規則三種分類演算法針對不平衡資料集建立分類模型,計算所得其召回率、精確率、F度量值來評估該演算法對於少數類別正確分類能力。

    決策樹,具有易於整理以及高可讀性的分類規則特性,面對不平衡資料集時,容易受多數類別影響造成分類結果的偏頗,故經由SMOTE抽樣方法將多數與少數類別比例趨於平衡,雖然抽樣對資料集之影響,但得出的分類規則於製程中仍有參考意義;RIPPER演算法,係參考原資料結構而產生對應的分類規則,面對不平衡資料集時,優先從少數類別開始學習,也具有易於整理以及高可讀性的分類規則特性,RIPPER本研究中於少數類別精確率為0.7174,表示預測『破刀』的少數類別情形有71.74%的正確率;K最鄰近法,係參考原資料結構於特徵空間距離而產生的分類結果,較能反應真實資料特性,求得F度量值並作為演算法比較參考基準點,評估該不平衡資料集的分類預測程度。

    本研究經由RIPPER與決策樹模型得出分類規則結果,對找出少數類別『破刀』可歸納為三類的重要特徵,分別以機台機件及訊號相關特徵、刀刃相關特徵、晶圓材料特性探討。可應用於製程中的參數即時監控,正確預測破刀問題發生的時機,改善製程上的晶粒品質產出,並減少成本的浪費。

    The cutting of a wafer into dies is an important step for the yields of a semi-conductor company. If the blade used for cutting is broken, the dies produced from the sawing stage can result relatively low yields. The purpose of this study is to identify the major factors for broken blade. The data collected from a manufacturing process for two months have 52,651 instances, and only 342 of them are the cases for broken blade. It is a highly imbalanced data set because the size for non-broken blade is approximately 154 times of the size for broken blade. Classification algorithms RIPPER and k-nearest neighbors are therefore chosen to build up models for class prediction, and F measure is used for performance evaluation. Sampling method SMOTE is applied on the imbalanced data set to obtain a balanced one for inducing a decision tree. This decision tree can provide critical attributes for predicting the occurrence of broken blade. The experimental results show that RIPPER algorithm achieves higher F-measure than k-nearest neighbors. The critical attributes identified by RIPPER algorithm and decision tree induction for broken blade can be divided into three categories: machine related, blade related, and wafer related. These critical attributes can be helpful for engineers to correctly identify the occurrence of broken blade such that the quality of dies can be improved to reduce manufacturing cost.

    摘要 i Abstract ii 目錄 iv 表目錄 vi 圖目錄 vii 第一章 緒論 1 1-1 研究背景 1 1-2 研究動機 2 1-3 研究目的 2 1-4 研究流程與架構 3 第二章 文獻回顧 5 2-1 半導體晶圓切割參數對晶粒品質的影響 5 2-2 類別不平衡資料 7 2-2-1 不平衡資料的分類問題 7 2-2-2 不平衡資料分類技術 8 2-2-3 不平衡資料分類之應用 10 2-3 分類方法 10 2-3-1 決策樹 10 2-3-2 RIPPER演算法 12 2-3-3 K最鄰近法 13 2-4 交互認證法則 15 2-5 分類績效評估 15 2-6 小結 17 第三章 研究方法 18 3-1 研究架構 18 3-2 資料前置處理 21 3-3 特徵選取方法 21 3-4 SMOTE抽樣方法 22 3-5 決策樹 23 3-6 RIPPER演算法 25 3-7 K最鄰近法 26 3-8 分類模型評估 27 第四章 分類預測模式驗證與結果分析 28 4-1 資料收集與整理 28 4-2 資料前置處理 29 4-3 特徵選取方法 30 4-4 決策樹法分類模式 30 4-5 RIPPER分類規則模式 33 4-6 K最鄰近法分類模式 35 4-7 小結 36 第五章 結論與未來展望 40 5-1 結論 40 5-2 研究建議 42 參考文獻 43 附錄 46

    Bradley, A. P. (1997) “The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms.” Pattern Recognition, 30(7), pp. 1145-1159.

    Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002) “SMOTE: Synthetic Minority Over-Sampling Technique.” The Journal of artificial intelligence research, 16(1), pp. 321-357.

    Dudoit, S., Laan, M., Keles, S. and Cornec, M. (2003) “Unified Cross-validation Methodology for Estimator Selection and Application to Genomic”, Bulletin of the International Statistical Institute, 54th Session Proceedings, Vol. LX, Book 2, pp. 412-415.

    Garcia, E. A. and He, H. (2009) “Learning from Imbalanced Data.” IEEE Transactions on Knowledge and Data Engineering, 21(9), pp. 1263-1284.

    Holte, R. C., Acker, L. E. and Porter, B. W. (1989) “Concept Learning and the Problem of Small Disjuncts.” In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, 1(1), pp. 813-818.

    Harding , J. A., Shahbaz , M., Srinivas and Kusiak , A. (2005) “Data Mining in Manufacturing: A Review.” Journal of Manufacturing Science and Engineering, Transactions of the ASME, 128(4), pp. 969-976.

    Huang, Y., Hung, C. and Jiau, H. C. (2006) “Evaluation of Neural Networks and Data Mining Methods on a Credit Assessment Task for Class Imbalance Problem.” Nonlinear Analysis: Real World Applications, 7(4), pp. 720-747.

    Kubat, M. and Matwin, S. (1997) “Addressing the Curse of Imbalanced Training Sets: One-sided Selection.” In Proceedings of the Fourteenth International Conference on Machine Learning, pp. 179-186.

    Kerdprasop, K. and Kerdprasop, N. (2011) “A Data Mining Approach to Automate Fault Detection Model Development in the Semiconductor Manufacturing Process.” International Journal of Mechanics, 5(4), pp. 336-344.

    Liu, X., Wu, J. and Zhou, Z. (2009) “Exploratory Undersampling for Class Imbalance Learning.” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 39(2), pp. 539-550.

    Lee, J., Lapira, E., Bagheri, B. and Kao, H. (2013) “Recent Advances and Trends in Predictive Manufacturing Systems in Big Data Environment.” Manufacturing Letters, 1(1), pp. 38-41.

    Munirathinam, S. and Ramados, B. (2015) “Machine Learning Predictive Models in Semiconductor Manufacturing Processes.” International Journal of Applied Engineering Research, 10(21), pp. 41883-41890.

    Ong, P., Choo, Y. and Muda, A. K., (2015) “A Manufacturing Failure Root Cause Analysis in Imbalance Data Set Using PCA Weighted Association Rule Mining.” Jurnal Teknologi, 77(18), pp. 103-111.

    Provost, F. and Fawcett, T. (2001) “Robust Classification for Imprecise Environments. ” Machine Learning, 42(3), pp. 203-231.

    Su, C. T. and Hsiao, Y. H. (2007) “An Evaluation of the Robustness of MTS for Imbalanced Data.” IEEE Transactions on Knowledge and Data Engineering, 19(10), pp. 1321-1332.

    Weiss, G. M. and Hirsh, H. (2000) “A Quantitative Study of Small Disjuncts: Experiments and Results.”, pp. 23-42.

    Weiss, G. M. (2004) “Mining with Rarity – Problems and Solutions: A Unifying Framework.” SIGKDD Explorations, 6(1), pp. 7-19.

    Wasikowski, M. and Chen, X. (2010) “Combating the Small Sample Class Imbalance Problem Using Feature Selection.” IEEE Transactions on Knowledge and Data Engineering, 22(10), pp. 1388-1400.

    Wu, X., Zhu, X., Wu, G. and Ding, W. (2014) “Data Mining with Big Data.” IEEE Transactions on Knowledge and Data Engineering, 26(1), pp. 97-107.

    Zhang , J. and Mani, I. (2003) “KNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction.” Workshop Learning from Imbalanced Data Sets, ICML, Washington DC.

    Zhou, Z. and Liu, X. (2006) “Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem.” IEEE Transactions on Knowledge and Data Engineering, 18(1), pp. 63-77.

    下載圖示 校內:2023-07-01公開
    校外:2023-07-01公開
    QR CODE