簡易檢索 / 詳目顯示

研究生: 謝東佑
Hsieh, Tong-Yu
論文名稱: 可有效提升良率之容誤技術理論與應用
Theory and Applications of Error-Tolerance Techniques for Effective Yield Improvement
指導教授: 李昆忠
Lee, Kuen-Jong
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 147
中文關鍵詞: 產品分級可接受度晶片良率提升容誤錯誤率
外文關鍵詞: acceptability, yield improvement, product grading, error-tolerance, error rate
相關次數: 點閱:147下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 奈米世代下由於製程參數漂移(process variation)及電子元件之量子效應(quantum effects)等因素將導致晶片良率過低。容誤 (error-tolerance)為一可有效提升良率之嶄新觀念,其焦點在於辨識出雖然因含有faults而可能產生錯誤運算結果但效能對於某些應用來說仍可接受之待測電路。藉由將這些電路依照其效能之可接受度進行適當分級,並繼續使用於不同的應用當中,良率將可有效提升。由於人類對於聲音、顏色、影像上的微小變動極不敏感,我們極有可能無法察覺faults所造成之errors,故容誤可廣泛應用於多媒體訊號處理系統中。容誤亦可應用於許多專門用來提升系統效能的電路,例如處理器中的分支預測器(branch predictor)。 Faults雖然可能使這些電路本身產生錯誤運算結果,但這些電路通常具備了錯誤恢復機制 (error resilience),使得整體系統並不會因此而產生任何functional errors,僅可能會造成有限的系統效能下降,例如使每個時脈週期所能執行的指令數減少。我們將此類特殊fault稱為performance degrading fault。
    在本論文中我們提出了數項測試技術以辨別可接受之待測電路。首先是一以fault為主之測試方法。此方法先估計faults之錯誤率 (error rate),並根據此資訊將faults分類成可接受與不可接受之部分,而後只需針對不可接受faults進行測試。藉此方式,待測電路之可接受度可有效決定(通過測試之電路即為可接受之電路),所需之測試向量(test pattern)數目亦可有效降低而減少所需之測試成本。
    值得注意的是,由於電路所含faults間的相互關聯可能極為錯綜複雜,使得傳統測試向量產生方法即使已加入需避免偵測可接受faults之考量,所產生之測試向量仍極有可能偵測到大量的可接受faults,造成容誤所能提升良率幅度的大幅下降。在本論文中我們提出一多相測試技術 (multi-phase test technique),可極有效率地完全避免此問題發生。與文獻中相關技術相比,此技術只需極少測試向量。我們亦發展相關的理論基礎以證明此技術之有效性。
    我們另外提出一以error為主之測試方法。此方法與前述以fault為主之測試方法完全不同,不需考慮faults而是藉由分析待測電路所產生之運算結果決定此電路之可接受度。此方法之重要特色在於可提供一產品分級機制 (product grading),將待測電路依照其可接受度進行分級並銷售,以獲取最大利潤。我們提出一高效率晶片分級技術,可在短時間內分析待測電路之錯誤率,並根據此資訊適當分級所有待測電路。
    我們接著提出一系統化之容誤應用方法。此方法提供一易遵循之流程,讓使用者可有效率地分析一電路或系統之容誤特性,評估含有faults之晶片之可接受度並預測可達到的良率提升幅度。我們並利用一離散餘弦轉換器(discrete cosine transform)電路展示此方法之有效性。
    我們亦發展藉由容忍performance degrading faults以提升良率之觀念,並利用高效能之處理器設計探討此觀念之實用性。我們特別針對一個典型分支預測器中的faults進行詳細分析以展示此觀念於良率提升之有效性。結果顯示此電路中的每一個fault皆為performance degrading fault,且有高達97%的faults幾乎都不會造成處理器效能的下降。由此可見,若能善用error-tolerance 觀念來處理這些faults, 將可大幅提高產品之有效良率。

    Error-tolerance is an innovative technique to address the problem of low yields in nanometer very large scale integrated (VLSI) circuitry, which is the backbone of the system-on-a-chip (SOC) revolution. The basic principle of error-tolerance is that some faulty circuits may occasionally produce erroneous outputs, but still provide acceptable performance when used in certain systems. Using these circuits leads to an increase in effective yield. By taking advantage of human beings’ insensitivity to small changes in sounds, colors or images, this novel concept can be widely applied in multimedia signal processing systems. Error-tolerance can also be applied to many modules in a chip which are dedicated to enhance the system performance, such as branch predictors in CPU designs. Although faults in these modules may cause some errors at module outputs, error resilience is usually inherent in these modules such that the faults cannot cause a functional error at system outputs, but may result in system performance degradation within some limits, e.g., decreasing the total number of executed instructions per cycle. We refer to such faults as performance degrading faults.
    In this dissertation, we first present a number of error-tolerance test techniques to identify acceptable chips. A fault-oriented test methodology is first presented which estimates error rate of each fault in the circuit under test, accordingly identifies a maximum set of acceptable faults, and ignore these faults during manufacture testing. By not testing the acceptable faults, the acceptability of the target circuit can be effectively determined (The circuit that passes the test is determined as an acceptable circuit.). The total number of required test patterns can thus also become smaller and hence the test cost can be reduced.
    However it can be shown that without careful consideration, test patterns generated by a conventional ATPG procedure targeting only unacceptable faults can also detect many acceptable faults, resulting in a drastic degradation in achievable yield improvement. In this dissertation a multi-phase test technique is presented, which can perfectly and efficiently eliminate this over-detection problem. Compared with previous work which addresses the same problem, only a much smaller number of test patterns are required. Solid theoretical derivations are also provided to validate the effectiveness of this technique.
    Orthogonal to the proposed fault-oriented test methodology, we also present an error-oriented test methodology. This methodology supports product grading, i.e., chips can be classified and priced according to their acceptability quantified by various attributes. Here we employ error rate as the attribute to be measured, and present a sampling-based method to estimate the error rate of a chip. We also propose an efficient chip classification procedure based on an iterative and adaptive error rate estimation technique that can effectively deal with the issue of chip misclassification due to insufficient patterns. Experimental results confirm that the proposed error rate estimation method is accurate, and the proposed classification procedure effectively and appropriately classifies all target circuits after only a few executed iterations.
    We then present a systematic methodology that can help users to efficiently employ error-tolerance in any given target application, which has not been addressed in literature. A step-by-step flow is provided to explore the error-tolerance features of target designs, evaluate the acceptability of chips under test, and predict the achievable yield improvement. A case study on a discrete cosine transform circuit is carried out to illustrate and validate the proposed methodology.
    We also develop the notion of tolerance of performance degrading fault (pdf) for effective yield improvement. We demonstrate that this novel notion is applicable to a large fraction of modules in modern processors. In particular, we analyze the faults in a typical branch prediction unit that is widely used in a high-performance processor to illustrate the potential benefits of this notion in yield improvement. Experimental results show that every stuck-at fault in this unit is a pdf and 97% of these faults induce almost no performance degradation, which clearly demonstrates the effectiveness of such notion in yield improvement.

    CHAPTER 1 INTRODUCTION...1 1.1. ERROR-TOLERANCE FOR IMPROVING CHIP YIELD.....1 1.2. OVERVIEW OF THIS DISSERTATION.....4 1.2.1. A Fault-Oriented Test Methodology to Support Error-Tolerance.....4 1.2.2. An Efficient Multi-Phase Test Technique to Perfectly Prevent Over-Detection of Acceptable Faults for Optimal Yield Improvement via Error-Tolerance.....5 1.2.3. An Error-Oriented Test Methodology to Support Product Grading Based on Error-Tolerance.....6 1.2.4. A Systematic Methodology to Employ Error-Tolerance.....7 1.2.5. An Illustrated Technique to Analyze Acceptability of Performance Degrading Faults for Yield Improvement.....8 1.3. ORGANIZATION OF THIS DISSERTATION.....8 CHAPTER 2 PREVIOUS WORK ON ERROR-TOLERANCE TEST TECHNIQUES.....9 2.1. PREVIOUS WORK ON FAULT-ORIENTED TEST TECHNIQUES.....9 2.2. PREVIOUS WORK ON ERROR-ORIENTED TEST TECHNIQUES.....11 CHAPTER 3 A FAULT-ORIENTED TEST METHODOLOGY TO SUPPORT ERROR-TOLERANCE.....13 3.1. PROPOSED TEST METHODOLOGY.....14 3.2. ERROR RATE ESTIMATION.....15 3.3. ACCEPTABLE FAULTS IDENTIFICATION.....21 3.4. EXPERIMENTAL RESULTS.....25 3.4.1. Experimental Results for Error Rate Estimation.....25 3.4.2. Experimental Results for Acceptable Fault Identification.....29 3.4.3. Discussion.....31 3.5. SUMMARY.....32 CHAPTER 4 AN EFFICIENT MULTI-PHASE TEST TECHNIQUE TO PERFECTLY PREVENT OVER-DETECTION OF ACCEPTABLE FAULTS FOR OPTIMAL YIELD IMPROVEMENT VIA ERROR-TOLERANCE .....33 4.1. OVER-DETECTION PROBLEM OF ACCEPTABLE FAULTS.....34 4.2. BASIC MULTI-PHASE TEST (MPTEST) TECHNIQUE.....36 4.2.1. Test Set Generation Procedure and Test Application Scheme of MPTest.....36 4.2.2. Theoretical Background of MPTest.....38 4.3. MORE EFFICIENT MULTI-PHASE TEST TECHNIQUE (MPTEST+).....41 4.3.1. Test Set Generation Procedure and Test Application Scheme of MPTest+.....41 4.3.2. Theoretical Background of MPTest+.....44 4.4. TEST PATTERN SELECTION & OUTPUT MASKING TECHNIQUES EMPLOYED IN MPTEST AND MPTEST+.....47 4.4.1. Test Pattern Selection Technique.....47 4.4.2. Output Masking Technique.....50 4.5. EXPERIMENTAL RESULTS.....53 4.5.1. Total Number of Required Patterns for MPTest.....60 4.5.2. Total Number of Required Patterns for MPTest+.....61 4.5.3. Comparison on Total Numbers of Required Patterns for MPTest and MPTest+.....61 4.6. SUMMARY.....64 CHAPTER 5 AN ERROR-ORIENTED TEST METHODOLOGY TO SUPPORT PRODUCT GRADING BASED ON ERROR-TOLERANCE.....66 5.1. PROPOSED TEST METHODOLOGY.....66 5.2. ERROR RATE ESTIMATION METHOD.....68 5.3. ANALYSIS ON CHIP MISCLASSIFICATION.....73 5.4. PROPOSED CHIP CLASSIFICATION PROCEDURE.....76 5.5. PREDICTION OF ACHIEVABLE YIELD IMPROVEMENT.....80 5.6. EXPERIMENTAL RESULTS.....82 5.6.1. Experimental Results for Error Rate Estimation.....83 5.6.2. Experimental Results for Chip Classification.....85 5.6.3. Experimental Results for Yield Improvement Prediction.....91 5.7. SUMMARY.....92 CHAPTER 6 A SYSTEMATIC METHODOLOGY TO EMPLOY ERROR-TOLERANCE.....94 6.1. PROPOSED METHODOLOGY.....94 6.2. CASE STUDY.....100 6.3. SUMMARY.....104 CHAPTER 7 AN ILLUSTRATED TECHNIQUE TO ANALYZE ACCEPTABILITY OF PERFORMANCE DEGRADING FAULTS FOR YIELD IMPROVEMENT.....105 7.1. PERFORMANCE DEGRADING FAULT.....105 7.2. ASPECTS OF HIGH PERFORMANCE PROCESSORS.....109 7.3. CASE STUDY.....112 7.3.1. High-Level Description of Branch Predictor.....114 7.3.2. Logic Level Description of Branch Predictor.....117 7.3.3. Faults and Test Cases.....119 7.3.4. Experimental Results.....121 7.3.4.1 Branch Prediction Degradation.....121 7.3.4.2 Performance Degradation.....130 7.4. SUMMARY.....133 CHAPTER 8 CONCLUSIONS AND FUTURE WORK.....134 8.1. CONCLUSIONS.....134 8.2. FUTURE WORK.....136 REFERENCES.....139 作者簡歷.....144

    [1] N. R. Shanbhag, “Reliable and efficient system-on-chip design,” IEEE Computer, 37(3): pages 42-50, 2004.
    [2] Int’l. Technology Roadmap for Semiconductors (ITRS), 2007. http://www.itrs.net/ Links/2007ITRS/Home2007.htm.
    [3] M. A. Breuer, S. K. Gupta, and T. M. Mak, “Defect and error-tolerance in the presence of massive numbers of defects,” IEEE Design & Test of Computers, 21(3): pages 216-227, 2004.
    [4] M. A. Breuer, “Intelligible test techniques to support error-tolerance,” Proc. Asian Test Symp., pages 386-393, 2004.
    [5] H. Chung and A. Ortega, “Analysis and testing for error tolerant motion estimation,” Proc. Int’l. Symp. on Defect and Fault Tolerance in VLSI Systems, pages 514-522, 2005.
    [6] I. Chong and A. Ortega, “Hardware testing for error tolerant multimedia compression based on linear transforms,” Proc. Int’l. Symp. on Defect and Fault Tolerance in VLSI Systems, pages 523-531, 2005.
    [7] H.-Y. Cheong, I. S. Chong and A. Ortega, “Computation error tolerance in motion estimation algorithms,” Proc. Int’l. Conf. on Image Processing, pages 3289-3292, 2006.
    [8] C.-L. Hsu, Y.-S. Huang and T.-H. Liu, “SSD-based testing scheme for error tolerance analysis in H.264/AVC encoder,” Proc. Int’l. Conf. on Communications, Circuits and Systems, pages 684-688, 2008.
    [9] M. A. Breuer and H. Zhu, “An illustrated methodology for analysis of error-tolerance,” IEEE Design & Test of Computers, 25(2): pages 168-177, 2008.
    [10] K.-J. Lee, T.-Y. Hsieh and M. A. Breuer, “A novel test methodology based on error-rate to support error- tolerance,” Proc. Int’l. Test Conf., pages 1136-1144, 2005.
    [11] T.-Y. Hsieh, K.-J. Lee and M. A. Breuer, “An error-rate based test methodology to support error-tolerance,” IEEE Trans. on Reliability, 57(1): pages 204-214, 2008.
    [12] T.-Y. Hsieh, K.-J. Lee and M. A. Breuer, “An efficient multi-phase test technique to perfectly prevent over-detection of acceptable faults for optimal yield improvement via error-tolerance,” Proc. Symp. on VLSI Design, Automation and Test, pages 255-258, 2009.
    [13] T.-Y. Hsieh, K.-J. Lee and M. A. Breuer, “Reduction of detected acceptable faults for yield improvement via error-tolerance,” Proc. Design Automation and Test in Europe Conf. and Exhibition, pages 1599-1604, 2007.
    [14] T.-Y. Hsieh, K.-J. Lee and M. A. Breuer, “Preventing over-detection of acceptable faults for yield enhancement,” Int’l. Journal of Electrical Engineering, 14(3): pages 185-193, 2007.
    [15] T.-Y. Hsieh, K.-J. Lee, and M. A. Breuer, “An error-oriented test methodology to improve yield with error-tolerance,” Proc. VLSI Test Symp., pages 130-135, 2006.
    [16] T.-Y. Hsieh, K.-J. Lee, C.-L. Lu and M. A. Breuer, “A systematic methodology to employ error-tolerance for yield improvement,” Proc. Int’l. Symp. on VLSI Design, Automation and Test., pages 105-108, 2008.
    [17] Z. Jiang and S. K. Gupta, “An ATPG for threshold testing: obtaining acceptable yield in future processes,” Proc. Int’l. Test Conf., pages 824-833, 2002.
    [18] Z. Pan and M. A. Breuer, “Estimating error-rate in defective logic using signature analysis,” IEEE Trans. on Computers, 56(5): pages 650-661, 2007.
    [19] S. Shahidi and S. K. Gupta, “Estimating error rate during self-test via one’s counting,” Proc. Int’l. Test Conf., pages 1-9, 2006.
    [20] S. Shahidi and S. K. Gupta, “ERTG: a test generator for error-rate testing,” Proc. Int’l. Test Conf., pages 1-10, 2007.
    [21] S. Shahidi and S. K. Gupta, “Multi-vector tests: a path to perfect error-rate testing,” Proc. Design Automation and Test in Europe Conf. and Exhibition, pages 1599-1604, 2008.
    [22] Z. Pan and M. A. Breuer, “Ones counting based error-rate estimation for multiple output circuits,” Proc. Int’l. Workshop on Design and Test of Nano Devices, pages 59-62, 2008.
    [23] Z. Pan and M. A. Breuer, “Basing acceptable error-tolerant performance on significance-based error-rate (SBER),” Proc. VLSI Test Symp., pages 59- 66, 2008.
    [24] D. Shin and S. K. Gupta, “A re-design technique for datapath modules in error tolerant applications,” Proc. Asian Test Symp., pages 431-437, 2008.
    [25] M. L. Bushnell and V. D. Agrawal, Essentials of Electronic Testing: for Digital, Memory & Mixed-Signal VLSI Circuits, Kluwer Academic Publishers, 2000.
    [26] M. Abramovici, M. A Breuer, and A. D. Friedman, Digital Systems Testing and Testable Design, IEEE Press, 1990.
    [27] S. Ghahramani, Fundamentals of Probability 2/P, Prentice Hall, 2000.
    [28] H. A. Peterson, A. J. Ahumada and A. B. Watson, “An improved detection model for DCT coefficient quantization,” Proc. SPIE, pages 191-201, 1993.
    [29] M. H. Tehranipour, M. Nourani, S. M. Fakhraie and A. Afzali-Kusha, “Systematic test program generation for SOC testing using embedded processor,” Proc. Int’l. Symp. on Circuits and Systems, pages 541-544, 2003.
    [30] J.-R. Huang, M. K. Iyer and K.-T. Cheng, “A self-test methodology for IP cores in bus-based programmable SOCs,” Proc. VLSI Test Symp., pages 198- 203, 2001.
    [31] A. Krstic, W.-C. Lai, K.-T. Cheng, L. Chen and S. Dey, “Embedded software-based self-test for programmable core-based designs,” IEEE Design & Test of Computers, 19 (4): pages 18-27, 2002.
    [32] M. Benabdenbi, A. Greiner, F. Pecheux, E. Viaud and M. Tuna, “STEPS: experimenting a new software-based strategy for testing SoCs containing 1500- compliant IP cores,” Proc. Design Automation and Test in Europe Conf. and Exhibition, pages 712-713, 2004.
    [33] K.-J. Lee, T.-Y. Hsieh, C.-Y. Chang, Y.-T. Hong and W.-C. Huang, “On-chip SOC test platform design based on IEEE 1500 standard,” to appear in IEEE Trans. on VLSI Systems.
    [34] [Online] Error-Tolerant Compression project of Department of Electrical Engineering, University of Southern California, USA, website. Http: http://biron.usc.edu/wiki/index.php/ETComp.
    [35] A. Agarwal, B. C. Paul, H. Mahmoodi, A. Datta and K. Roy, “A process-tolerant cache architecture for improved yield in nanoscale technologies,” IEEE Trans. on VLSI Systems, 13(1): pages 27-38, 2005.
    [36] P. P. Shirvani and E. J. McCluskey, “PADded cache: a new fault-tolerance technique for cache memories,” VLSI Test Symp., pages 440-445, 1999.
    [37] Http://www.intel.com/technology/architecture-silicon/next-gen/whitepaper.pdf.
    [38] http://www.pcper.com/article.php?aid=608.
    [39] S. McFarling, “Combining branch predictors,” WRL Technical Note TN-36, Digital Equipment Corporation, 1993.
    [40] SPEC2000 website, www.spec.org/osg/cpu2000/.
    [41] T. Austin, E. Larson and D. Ernst, “SimpleScalar: an infrastructure for computer system modeling,” Computer, 35(2): pages 59-67, 2002.

    下載圖示 校內:2011-08-17公開
    校外:2011-08-17公開
    QR CODE