| 研究生: | 林元祥 Lin, Yuan-Hsiang | 
|---|---|
| 論文名稱: | 3D堆疊晶圓高效能運算系統之內含記憶體可靠性強化 Reliability Enhancement for Embedded Memories in a 3D Stacked Wafer-Scale High Performance Computing System | 
| 指導教授: | 謝明得 Shieh, Ming-Der | 
| 共同指導教授: | 吳誠文 Wu, Cheng-Wen | 
| 學位類別: | 碩士 Master | 
| 系所名稱: | 電機資訊學院 - 電機工程學系 Department of Electrical Engineering | 
| 論文出版年: | 2022 | 
| 畢業學年度: | 110 | 
| 語文別: | 英文 | 
| 論文頁數: | 34 | 
| 中文關鍵詞: | 3D內建自行修復裝置 、3D延伸修復 、3D備用資源共享 、異質整合 、3D封裝 、記憶體修復 、可靠度 、超級運算 、晶圓級整合 、晶圓級晶片 、晶圓堆疊 、良率 | 
| 外文關鍵詞: | 3D built-in self-repair (3D-BISR), 3D redundancy allocation (3D-RA), 3D packaging, 3D peer-repair, heterogeneous integration, memory repair, reliability, supercomputing, wafer-scale integration, wafer-scale chip, wafer-on-wafer stacking, yield | 
| 相關次數: | 點閱:164 下載:17 | 
| 分享至: | 
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 | 
隨著人工智能模型和算法變得越來越複雜,運算硬體設計也在不斷更新和改進。除了通過工藝技術進步提高AI運算晶片的運行速度外,整體架構設計也是提升性能的關鍵。本論文基於一個假設,即未來有可能開發出3D堆疊晶圓級高性能計算系統,該系統通過堆疊多個晶圓級晶片而形成一個極其強大的AI計算系統。這樣的系統很可能在不久的將來被引入到百億級超級計算設施中。當然,要讓如此復雜的系統在實踐中變得可行,我們需要解決一個關鍵的良率問題,即由於嵌入式存儲器而產生的特定問題。在本論文中,我們強調了此類系統之品質和可靠性,並在系統級提出了一種新穎的3D內建自行修復(3D-BISR)架構,該架構是在我們現有的2D BISR方案之上提出。我們的3D-BISR的關鍵特性是3D相互修復功能。從我們的模擬結果來看,通過在3D相鄰的內建記憶體之間共享備用資源,我們可以提高修復率,也提高內建記憶體在這種3D堆疊晶圓級高性能計算系統中的良率和壽命。我們還提出了總共 6 種 3D 備用資源分配(3D-RA)方案,用於有效分配備用資源,同時平衡系統可靠性和內建記憶體修復成本。我們的實驗結果顯示,使用我們提出的3D-BISR (六個共享模式)可以實現比傳統BISR高30%的修復率。如果我們將其與3D-BISR(單一共享模式)情況進行比較,它可以實現高達5%的提升。至於與最佳解的比較,我們的結果只比它少0.2%。對於壽命的實驗,使用我們提出的3D-BISR (六個共享模式)可以實現比傳統BISR多30%的壽命。如果我們將其與3D-BISR(單一共享模式)情況進行比較,它可以實現高達5%的提升。至於與最佳解的比較,我們的結果只少了0.3%。
As AI models and algorithms are becoming more and more complex, computing hardware designs are constantly being updated and improved. In addition to improving the operation speed of AI computing chips through process technology advancement, the overall architecture design is also a key to improving performance. This thesis is based on an assumption that a 3D stacked wafer-scale high performance computing system is likely to be developed in the future, which is created by stacking multiple wafer-scale chips to form an extremely powerful AI computing system. Such a system is likely to be introduced to exa-scale super-computing facilities in the near future. Of course, for such a complicated system to become feasible in practice, we need to solve one of the critical yield problems, i.e., the specific one that is due to the embedded memories. In this thesis, we stress the memory quality and reliability of such systems, and propose a novel 3D built-in self-repair (3D-BISR) architecture at the system level, which is proposed on top of existing 2D BISR scheme. The key feature of our 3D-BISR is the 3D peer-repair capability. From our simulation results, we show that, by sharing spare resources among the embedded memories in the 3D neighborhood, we can improve the repair rate, and thus the yield and lifetime, of the embedded memories in such 3D stacked wafer-scale high performance computing systems. We also propose a total of six schemes for 3D redundancy allocation (3D-RA), which are for effective allocation of spare memory resources that also balance between system reliability and memory repair costs. Our experimental results show that the proposed 3D-BISR with all six schemes can achieve a memory repair rate that is 30% higher than the conventional BISR. If we compare it with the 3D-BISR of a single scheme, the repair rate increase is up to 5%. Our result is only 0.2% lower than the optimal solution. From the experimental result for lifetime, the 3D-BISR with all six schemes can achieve a lifetime that is 30% longer than the conventional BISR. If we compare it with the 3D-BISR of a single scheme, the lifetime increase is up to 5%. The result is only 0.3% lower than the optimal solution.
[1]	S. K. Esser, P. A. Merolla, J. V. Arthur, “Convolutional Networks for Fast, Energy-Efficient Neuromorphic Computing”, in Proceedings of the National Academy of Sciences of the United States of America (PNAS), October 11, 2016 113 (41), pp. 11441-11446.
[2]	S. K. Moore, “Huge Chip Smashes Deep Learning's Speed Barrier”, in IEEE Spectrum, vol. 57, no. 1, pp. 24-27, Jan. 2020, doi: 10.1109/MSPEC.2020.8946303.
[3]	M. Davies, N. Srinivasa, T.-H. Lin et al., “Loihi: A Neuromorphic Manycore Processor with On-Chip Learning”, in IEEE Micro, vol. 38, no. 1, pp. 82–99, Jan./Feb. 2018.
[4]	N. P. Jouppi, C. Young, N. Patil, et al. “In-Datacenter Performance Analysis of a Tensor Processing Unit”, in Proc. 44th Annual International Symposium on Computer Architecture. 2017.
[5]	M. Aoki, K. Hozawa and K. Takeda, “Wafer-Level Hybrid Bonding Technology with Copper/Polymer Co-Planarization”, in Proc. 2010 IEEE International 3D Systems Integration Conference (3DIC), 2010, pp. 1-4, doi: 10.1109/3DIC.2010.5751471.
[6]	M. Ibrahim, “2.5D / 3D TSV& Wafer Level Stacking Technology & Market Updates 2019”, Yole Developpement, https://www.i-micronews.com/advanced-packaging-report/product/p2-5d-3d-tsv-wafer-level-stacking-technology-market-updates-2019.html
[7]	C.-H. Tung, P.-Y. Tan, and C.-W. Wu, “A Memory Built-In Peer-Repair Architecture for Mesh-Connected Processor Array”, in Proc. 32nd VLSI Design/CAD Symp., Pingtung, Aug. 2021
[8]	J.-F. Li, J.-C. Yeh, R.-F. Huang, and C.-W. Wu, “A Built-In Self-Repair Design for Rams with 2-D Redundancy”, in IEEE Transactions on Very Large-Scale Integration (VLSI) Systems, vol. 13, no. 6, pp. 742-745, June 2005, doi: 10.1109/TVLSI.2005.848824.
[9]	M. Lee, L.-M. Denq, and C.-W. Wu, “A Memory Built-In Self-Repair Scheme Based on Configurable Spares”, in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 30, no. 6, pp. 919-929, June 2011, doi: 10.1109/TCAD.2011.2106812.
[10]	C.-T. Huang, C.-F. Wu, J.-F. Li, and C.-W. Wu, “Built-In Redundancy Analysis for Memory Yield Improvement”, in IEEE Transactions on Reliability, vol. 52, no. 4, pp. 386-399, Dec. 2003, doi: 10.1109/TR.2003.821925.
[11]	Y.-L. Li and C.-W. Wu, “Logic and Fault Simulation by Cellular Automata”, in Proc. European Design and Test Conference (EDAC-ETC-EUROASIC), 1994, pp. 552-556, doi: 10.1109/EDTC.1994.326820.
[12]	S. Reda, G. Smith and L. Smith, “Maximizing the Functional Yield of Wafer-to-Wafer 3-D Integration”, in IEEE Transactions on Very Large-Scale Integration (VLSI) Systems, vol. 17, no. 9, pp. 1357-1362, Sept. 2009, doi: 10.1109/TVLSI.2008.2003513.
[13]	S.-H. Lee, K. Chen, and J.-J. Lu, “Wafer-to-Wafer Alignment for Three-Dimensional Integration: A Review”, in Journal of Microelectromechanical Systems, vol. 20, no. 4, pp. 885-898, Aug. 2011, doi: 10.1109/JMEMS.2011.2148161.
[14]	L. Jiang, R. Ye and Q. Xu, “Yield Enhancement for 3D-Stacked Memory by Redundancy Sharing Across Dies”, in Proc. 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2010, pp. 230-234, doi: 10.1109/ICCAD.2010.5654160.
[15]	“Wafer Scale Processors the Time Has Come”, in Cerebras Blog, https://cerebras.net/blog/wafer-scale-processors-the-time-has-come/
[16]	C.-K. Lee, I. Lim, S.-H. Kang, “Efficient Systolic-Array Redundancy Architecture for Offline/Online Repair”, in Electronics 2020, 9, 338.