簡易檢索 / 詳目顯示

研究生: 施又銘
Shih, Yu-Ming
論文名稱: 探討集成式分類樹模型於高維度資料特徵之選取 -以電子束檢測設備為例
Investigate the Ensemble Classification Tree Models for Feature Selection on High Dimension Dataset - Case of an E-Beam Inspection Equipment
指導教授: 呂執中
Lyu, Jr-Jung
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業與資訊管理學系碩士在職專班
Department of Industrial and Information Management (on the job class)
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 53
中文關鍵詞: 電子束檢測設備重要特徵選取隨機森林梯度提升機
外文關鍵詞: Semiconductor electron beam inspection equipment, feature selection, random forest, gradient boost machine
相關次數: 點閱:89下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著半導體的晶片朝向更高效能、更小尺寸的科技進展,尤其15奈米世代以下的製程,光學檢測已達到檢測極限,取而代之的是半導體電子束檢測,其檢測設備更是扮演供應鏈中的重要角色,如何提供客戶更高品質的機台成為急需研究之問題。然而電子束檢測機台功能模組眾多且特徵維度複雜,若全部納入重要品質特性監控,勢必耗用大量資源,本研究嘗試應用資料探勘與機器學習,以找到半導體電子束檢測產業中高維度資料集中且具參考價值之重要特徵。
    本研究以半導體電子束檢測設備製造廠為對象,針對其生產之電子束檢測設備主要的三個功能模組進行探討。首先蒐集產品缺陷、產品組裝後的測試資料整理,接著應用統計方法做資料預處理,並透過機器學習中隨機森林與梯度提升機這兩種集成決策樹分類學習法針對209個樣本、188個特徵進行演算;並以混淆矩陣模型評估,在梯度提升機準確度高達90%以上的分類模型下做特徵選取。結果得到三個模組的關鍵特徵數43個(佔總特徵數的22.8%),可有效的即時監控製程品質,先期偵測品質風險,資料顯示確實有38%的異常模組可被提前偵測,有效減少後段機台成品的異常解析時間。
    本研究結果顯示,以資料探勘架構與機器學習模型對於高維度且少量多樣的資料,可得到優異的分類效果,且歸納出貢獻度和可解釋性高的特徵,大幅縮減資料維度,協助工程定義模組組裝製程中的重要特徵。在有限的資源與面臨人力監控製程變異,本研究架構可有效減少功能模組安裝到機台後產生的品質問題。

    As semiconductor chips moving towards higher performance and smaller size, optical inspection has reached its limit, especially for the processes below the 15nm generation. Electron beam inspection is so far the only solution which plays an important role in the wafer yield. How to provide customers with higher quality electron beam inspection machines have become an urgent issue for semiconductor industry. However, there are many functional modules and complex feature dimensions in the inspection machines. The main purpose of this study is to classify the key quality characteristics from high dimensionality and small sample with variety data from the manufacturing data set.

    This study applies the data mining architecture and compares two ensemble machine learning methods in classifying key characteristics. Firstly, product defects and test data from historical record are collected. Secondly, statistical methods are applied for data pre-processing, and random forests as well as gradient boost machine are used to find the model. Finally, features with high contribution are selected. It is found out that 43 key features of the three modules could be obtained, accounting for only 22.8% of the total features, which can effectively monitor the quality of the process in real time and to detect the quality risk in advance. The results also show that 38% of abnormal modules could be detected in advance to avoid further abnormal analysis of the finished machine.

    This work showed the data mining architecture and machine learning methods can support engineers in defining key quality characteristics of process control and could establish a framework for high dimensional feature products to reduce the quality risk of system with limited resource. While the findings are based on a case study, further extensive research could be applied to demonstrate its generality.

    摘要 II ABSTRACT III 致謝 XI 目錄 XII 表目錄 XV 圖目錄 XVI 第一章 緒論 1 1.1 研究背景 1 1.2 研究動機 2 1.3 研究目的 2 1.4 研究範圍與限制 3 1.5 研究流程與架構 4 第二章 文獻探討 5 2.1 半導體電子束檢測設備 5 2.1.1 電子束檢測原理 5 2.1.2 電子束檢測訊躁比 6 2.1.3 多槍式電子束檢測 6 2.2 資料探勘(Data Mining) 6 2.2.1 資料探勘建構流程 7 2.2.2 資料清理 9 2.3 機器學習分類器 10 2.3.1 隨機森林Random Forest 12 2.3.2 梯度提升機Gradient Boost Machines 14 2.4 特徵選取 15 2.4.1 過濾方法Filter Model 15 2.4.2 包裝方法Wrapper Model 16 2.4.3 嵌入式方法Embedded method 16 2.4.4 特徵關聯分析 16 2.5 文獻小結 17 第三章 研究方法 18 3.1 研究架構 18 3.2 製程問題定義 20 3.3 資料預處理 22 3.4 建構模型 23 3.4.1 集成樹演算法模型建構 23 3.4.2 模型評估與結果解釋 26 3.4.3 特徵分析與選擇 28 3.5 小結 30 第四章 研究結果 31 4.1 個案公司介紹與問題定義 31 4.2 資料蒐集與預處理 34 4.3 模型建構與效果評估 36 4.3.1 模型參數設定 36 4.3.2 模型效果評估 38 4.4 特徵評估與選取 41 4.5 管理意涵 46 第五章 結論與未來展望 47 5.1 研究結論 47 5.2 未來研究方向 48 參考文獻 50

    英文文獻:
    Arora, P., & Varshney, S. (2016). Analysis of k-means and k-medoids algorithm for big data. Procedia Computer Science, 78, 507-512.
    Assunção, M. D., Calheiros, R. N., Bianchi, S., Netto, M. A., & Buyya, R. (2015). Big Data computing and clouds: Trends and future directions. Journal of Parallel and Distributed Computing, 79, 3-15.
    Bansal, A., & Kaur, S. (2018). Extreme gradient boosting based tuning for classification in intrusion detection systems. Paper presented at the International Conference on Advances in Computing and Data Sciences.
    Barnett, V., & Lewis, T. (1974). Outliers in statistical data: Wiley.
    Breiman, L., & Cutler, A. (2014). Random forests, 2001. Mach. Learn, 45(5).
    Cai, J., Luo, J., Wang, S., & Yang, S. (2018). Feature selection in machine learning: A new perspective. Neurocomputing, 300, 70-79.
    Chen,J.(2019).K-means Clustering.
    Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. Paper presented at the Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining.
    Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
    Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI magazine, 17(3), 37-37.
    Galhardas, H., Florescu, D., Shasha, D., Simon, E., & Saita, C. (2001). Declarative data cleaning: Language, model, and algorithms.
    Georganos, S., Grippa, T., Vanhuysse, S., Lennert, M., Shimoni, M., Kalogirou, S., & Wolff, E. (2018). Less is more: Optimizing classification performance through feature selection in a very-high-resolution remote sensing object-based urban application. GIScience & remote sensing, 55(2), 221-242.
    Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182.
    Kamsu-Foguem, B., Tchuenté-Foguem, G., Allart, L., Zennir, Y., Vilhelm, C., Mehdaoui, H., . . . Ravaux, P. (2012). User-centered visual analysis using a hybrid reasoning architecture for intensive care units. Decision Support Systems, 54(1), 496-509.
    Hayashi, H., Oomura, M., Ihata, N., Shinkawa, A., Fan, F., & Li, J. (2009). Detection of critical defects with E-beam technology for development and monitoring of advanced NAND processes. Paper presented at the 2009 IEEE/SEMI Advanced Semiconductor Manufacturing Conference.
    Hernández, M. A., & Stolfo, S. J. (1998). Real-world data is dirty: Data cleansing and the merge/purge problem. Data mining and knowledge discovery, 2(1), 9-37.
    Ledesma, S., Cerda, G., Avina, G., Hernández, D., & Torres, M. (2008). Feature selection using artificial neural networks. Paper presented at the Mexican International Conference on Artificial Intelligence.
    Liu, H., & Motoda, H. (2012). Feature selection for knowledge discovery and data mining , Vol. 454 .Springer Science & Business Media.
    Maimon, O., & Rokach, L. (2005). Data mining and knowledge discovery handbook.
    Meisburger, D., Spallas, J., Werder, K., & Muray, L. (2015). Proposed architecture of a multicolumn electron-beam wafer inspection system for high-volume manufacturing. Journal of Vacuum Science & Technology B, 33(6), 5. doi:10.1116/1.4931589
    Montavon, G., Samek, W., & Müller, K.-R. (2018). Methods for interpreting and understanding deep neural networks. Digital Signal Processing, 73, 1-15.
    NAQVI, S. (2011). A Hybrid Filter-Wrapper Approach for FeatureSelection. In.
    Orr, K. (1998). Data quality and systems theory. Communications of the ACM, 41(2), 66-71.
    Ren, W., Liu, X., Hu, X., Luo, X., Ji, X., Xi, Q., . . . Ma, E. (2019). Multi-beam technology for defect inspection of wafer and mask. Paper presented at the 35th European Mask and Lithography Conference (EMLC 2019).
    Saeys, Y., Abeel, T., & Van de Peer, Y. (2008). Robust feature selection using ensemble feature selection techniques. Paper presented at the Joint European Conference on Machine Learning and Knowledge Discovery in Databases.
    Seiler, H. (1983). Secondary electron emission in the scanning electron microscope. Journal of Applied Physics, 54(11), R1-R18.
    Waldrop, M. M. (2016). The chips are down for Moore’s law. Nature News, 530(7589), 144.

    Wang, R. Y., Storey, V. C., & Firth, C. P. (1995). A framework for analysis of data quality research. IEEE Transactions on Knowledge & Data Engineering (4), 623-640.
    Wang, S., Zhang, Y., Zhan, T., Phillips, P., Zhang, Y.-D., Liu, G., . . . Wu, X. (2016). Pathological brain detection by artificial intelligence in magnetic resonance imaging scanning (invited review). Progress In Electromagnetics Research, 156, 105-133.
    Wei, Y., Zhen, H., & Wenmeng, T. (2011). The application of ReliefF algorithm for identifying CTQ in complex products. Paper presented at the 2011 2nd IEEE International Conference on Emergency Management and Management Sciences.
    Wirth, R., & Hipp, J. (2000). CRISP-DM: Towards a standard process model for data mining. Paper presented at the Proceedings of the 4th internationalconference on the practical applications of knowledge discovery and data mining.
    Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data Mining: Practical machine learning tools and techniques: Morgan Kaufmann.
    Zhi, T., Luo, H., & Liu, Y. (2018). A Gini impurity-based interest flooding attack defence mechanism in NDN. IEEE Communications Letters, 22(3), 538- 541.
    Ziegler, A., & Konig, I. R. (2014). Mining data with random forests: current options for real-world applications. Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery, 4(1), 55-63. doi:10.1002/widm.1114

    中文文獻:
    葉俊宏(2012)。使用先進電子束缺陷檢測設備以加速金氧化半導體製程開發。
    碩士論文,國立成功大學,台南市。
    Atkinson(2019年1月25日)。從 2019 年半導體消化庫存的一年,看整體未來市場發展。科技新報。取自http://technews.tw/2019/01/25/semiconductor-market-development/
    曾瑞榆、曹世綸(2019年1月22日)。國際半導體產業協會年度報告:2019 年,台灣半導體三大發展潛力方向是什麼?科技報橘。取自https://buzzorange.com/techorange/2019/01/22/semi-semiconductor-2019/
    邱世卿(2019)。關於資料探勘(Data Mining)、機器學習(Machine Learning)
    和人工智慧(AI)。取自https://jackwaver.blogspot.com/2019/04/data- miningmachine-learningai-201676.html

    無法下載圖示 校內:2025-09-04公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE