簡易檢索 / 詳目顯示

研究生: 劉家瑄
Liu, Jia-Xuan
論文名稱: 研發適用於高維度巨量資料之快速良率改善機制
Development of Fast Yield Improvement Scheme for High-Dimension Big Data
指導教授: 陳朝鈞
Chen, Chao-Chun
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 製造資訊與系統研究所
Institute of Manufacturing Information and Systems
論文出版年: 2017
畢業學年度: 105
語文別: 中文
論文頁數: 50
中文關鍵詞: 高維度巨量資料製造分析良率管理巨量資料快速關鍵參數搜尋
外文關鍵詞: High-Dimensional Big Data, Manufacturing Analysis, Yield Analysis, Fast Key-variable Search for Big Data
相關次數: 點閱:103下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在現今工業4.0的時代中,先進製造的生產工廠往往具有非常複雜與龐大的產線。資料量累積非常迅速且資料呈現高維度的型式,導致產生巨量資料儲存與處理分析的問題。對於製造分析中非常重要的良率分析問題,必須在眾多製程資料中找出影響良率的關鍵因子。傳統的高維度參數搜尋演算法,在面對巨量資料的情況下,必須會花費大量的時間才能完成參數搜尋,導致降低良率分析的效率。因此,本論文根據了三階段貪婪演算法(TPOGA)的概念,提出了巨量資料快速關鍵參數搜尋演算法機制(Fast Key-Variable Search Algorithm for Big Data Scheme,簡稱FKSABD Scheme),用以解決巨量資料下的關鍵參數搜尋問題。FKSABD Scheme透過雙階段式的分析,利用資料摘要技術將大部分的資料轉換成特徵摘要,再使用TPOGA的分析概念來進行參數搜尋,並且利用Apache Spark作為系統運算架構。相對於原本的TPOGA方法,在約380GB的資料情境下花費了約62個小時的運算時間,而FKSABD Scheme約8個小時即完成運算,減少約8倍的運算時間。因此FKSABD能夠有效改善良率問題分析時的運算之效能。

    In the Industry 4.0 era, the processes of an advanced manufacturing factory are very complex and huge. Because the production data are collected rapidly and the data are high-dimensional, the manufacturing analysis encounters issues of big data storage and processing. Yield analysis is one of important manufacturing analysis problems, which needs to find out key factors causing yield losses from diverse production data. In a big data processing scenario, the traditional high-dimensional key-variable searching algorithms often cost many searching time so that the yield improvement task is inefficient. In this thesis, based on the triple phase orthogonal greedy algorithm (TPOGA), a fast key-variable searching algorithm for big data, called FKSABD scheme, is proposed to solve the key-variable searching problem for big production data. The FKSABD scheme is a two-phase searching scheme. First, it uses a data digesting technology to extract the feature from a large portion of the production data in phase 1. Then, the feature data together with the remaining production data are applied to search key variables in phase 2. Also, Apache Spark is used to implement the FKSABD scheme for processing big production data. Testing results of case studies show that the searching results of the FKSABD scheme are close to those of the TPOGA. Also, the TPOGA spends 62 hours to process 380 GB data, whereas the FKSABD scheme takes only 8 hours to process the same data, about 8 times faster than the TPOGA. Thus, the proposed FKSABD scheme is promising to greatly speed up the yield analysis in processing big production data.

    摘 要 II 誌 謝 X 第1章 緒論 1 1.1 研究背景 1 1.2 研究動機與目的 5 1.3 研究流程 6 1.4 論文架構 7 第2章 文獻探討與理論基礎 8 2.1 文獻探討 8 2.1.1 巨量資料情境下的製造分析的需求與挑戰 8 2.1.2 根本因素偵測與良率改善方法探討 8 2.1.3 現有高維度關鍵參數選擇方法探討 9 2.1.4 現有巨量資料處理技術探討 10 2.2 相關理論基礎 12 2.2.1 雙階段搜尋策略 12 2.2.2 基於TPOGA的關鍵參數搜尋方法探討 13 第3章 巨量資料快速關鍵參數搜尋演算法機制 16 3.1 資料摘要階段 17 3.1.1 Tree-based Aggregator 18 3.1.2 SDD Generator 22 3.2 快速關鍵參數搜尋階段 23 3.2.1 LRA Module 24 3.2.2 Tuple Key Generator Module 26 3.2.3 SDD-Based Correlation Module 29 3.2.4 TPOGABD Module 34 第4章 案例研究與測試結果 36 4.1 實驗設計 37 4.1.1 正確性評估指標 37 4.1.2 相似度評估指標 37 4.2 正確性評估實驗:資料集一 40 4.3 正確性評估實驗:資料集二 41 4.4 LRA效果對正確性影響 42 4.5 資料大小對於效能的影響 43 4.6 資料摘要比率對於效能影響 44 4.7 Tree-based Aggregator對處理資料之效用 45 第5章 結論 46 5.1 總結 46 5.2 未來研究方向 47 參考文獻 48

    [1] Y. C. Lin, M. H. Hung, H. C. Huang, C. C. Chen, H. C. Yang, Y. S. Hsieh, and F. T. Cheng, “Development of Advanced Manufacturing Cloud of Things (AMCoT) - Smart Manufacturing Platform,” IEEE Robotics and Automation Letters, vol. 2, no. 3, pp. 1809–1816, Jul. 2017.
    [2] P. Lade, R. Ghosh, and S. Srinivasan, “Manufacturing Analytics and Industrial Internet of Things,” IEEE Intelligent Systems, vol. 32, no. 3, pp. 74–79, May 2017.
    [3] C. F. Chien and S. C. Chuang, “A Framework for Root Cause Detection of Sub-Batch Processing System for Semiconductor Manufacturing Big Data Analytics,” IEEE Transactions on Semiconductor Manufacturing, vol. 27, no. 4, pp. 475–488, Nov. 2014.
    [4] F. T. Cheng, Y. S. Hsieh, J. W. Zheng, S. M. Chen, R. X. Xiao, and C. Y. Lin, “A Scheme of High-Dimensional Key-Variable Search Algorithms for Yield Improvement,” IEEE Robotics and Automation Letters, vol. 2, no. 1, pp. 179–186, Jan. 2017.
    [5] G. A. Susto, A. Schirru, S. Pampuri, and S. McLoone, “Supervised Aggregative Feature Extraction for Big Data Time Series Regression,” IEEE Transactions on Industrial Informatics, vol. 12, no. 3, pp. 1243–1252, Jun. 2016.
    [6] H. K. Lim, Y. Kim, and M. K. Kim, “Failure Prediction Using Sequential Pattern Mining in the Wire Bonding Process,” IEEE Transactions on Semiconductor Manufacturing, vol. 30, no. 3, pp. 285–292, Aug. 2017.
    [7] H. Lee, C. O. Kim, H. H. Ko, and M. K. Kim, “Yield Prediction Through the Event Sequence Analysis of the Die Attach Process,” IEEE Transactions on Semiconductor Manufacturing, vol. 28, no. 4, pp. 563–570, Nov. 2015.
    [8] Y. Zhu and J. Xiong, “Modern Big Data Analytics For ‘Old-Fashioned’ Semiconductor Industry Applications,” in 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD),2-6 Nov. 2015, pp. 776–780.
    [9] Apache Hadoop. Available from: https://hadoop.apache.org/
    [10] Apache Impala. Available from: https://impala.incubator.apache.org/
    [11] The R Project for Statistical Computing. Available from: https://www.r-project.org/
    [12] Apache Spark. Available from: https://spark.apache.org/
    [13] U. Hessinger, W. K. Chan, and B. T. Schafman, “Data Mining for Significance in Yield-Defect Correlation Analysis,” IEEE Transactions on Semiconductor Manufacturing, vol. 27, no. 3, pp. 347–356, Aug. 2014..
    [14] F. Adly, O. Alhussein, P. D. Yoo, Y. Al-Hammadi, K. Taha, S. Muhaidat, Y. S. Jeong, U. Lee, and M. Ismail, “Simplified Subspaced Regression Network for Identification of Defect Patterns in Semiconductor Wafer Maps,” IEEE Transactions on Industrial Informatics, vol. 11, no. 6, pp. 1267–1276, Dec. 2015.
    [15] Chen and A. Hong, “Sample-Efficient Regression Trees (SERT) for Semiconductor Yield Loss Analysis,” IEEE Transactions on Semiconductor Manufacturing, vol. 23, no. 3, pp. 358–369, Aug. 2010.
    [16] R. Tibshirani, “Regression Shrinkage and Selection Via the Lasso,” Journal of the Royal Statistical Society, Series B, vol. 58, pp. 267–288, 1994.
    [17] T. Hastie, R. Tibshirani, and J. Friedman, “The Lasso,” in The Elements of Statistical Learning Data Mining,Inference,and Prediction, New York, NY, USA: Springer, Nov. 2013, p. 68.
    [18] C.-K. Ing and T. L. Lai, “A Stepwise Regression Method And Consistent Model Selection For High-Dimensional Sparse Linear Models,” Statistica Sinica, vol. 21, no. 4, pp. 1473–1513, 2011.
    [19] Y. Zhang, S. Ren, Y. Liu, and S. Si, “A Big Data Analytics Architecture For Cleaner Manufacturing And Maintenance Processes Of Complex Products,” Journal of Cleaner Production, vol. 142, Part 2, pp. 626–641, Jan. 2017.
    [20] P. O’Donovan, K. Leahy, K. Bruton, and D. T. J. O’Sullivan, “An Industrial Big Data Pipeline For Data-Driven Analytics Maintenance Applications In Large-Scale Smart Manufacturing Facilities,” Journal of Big Data, vol. 2, p. 25, Nov. 2015.
    [21] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica, “Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing,” in Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, Berkeley, CA, USA,25-27 Apr. 2012, pp. 2–2.
    [22] F. T. Cheng, H. C. Huang, and C. A. Kao, “Developing an Automatic Virtual Metrology System,” IEEE Transactions on Automation Science and Engineering, vol. 9, no. 1, pp. 181–188, Jan. 2012.
    [23] F. T. Cheng, Y. T. Chen, Y. C. Su, and D. L. Zeng, “Evaluating Reliance Level of a Virtual Metrology System,” IEEE Transactions on Semiconductor Manufacturing, vol. 21, no. 1, pp. 92–103, Feb. 2008.

    下載圖示
    2024-09-01公開
    QR CODE