簡易檢索 / 詳目顯示

研究生: 盛祖丞
Sheng, Zu-Cheng
論文名稱: 一個採用雙路輸入架構與預先量化技巧之類比式記憶體內運算巨集
An Analog Computing-In-Memory Macro with Twin-Path Input Architecture and Pre-quantization Technique
指導教授: 張順志
Chang, Soon-Jyh
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 143
中文關鍵詞: 記憶體內運算靜態隨機存取記憶體類神經網路高能源效率稀疏性
外文關鍵詞: Computing-in-Memory (CIM), static random-access memory (SRAM), neural network (NN), high energy-efficiency, Sparsity
相關次數: 點閱:144下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文提出一個高能源效率與高吞吐量之類比式記憶體內運算巨集,其適用於類神經網路之應用。此晶片基於靜態隨機存取記憶體,在每個位元單元中加入額外六顆電晶體和兩個電容,實現兩組用電荷重新分布實現的類比運算單元。為了提升加速器之吞吐量與能源效率,本論文提出了兩個架構和技巧:雙路輸入架構與預先量化技巧。第一個架構可以優化吞吐量和能源效率之間的設計取捨。雙路輸入架構在2倍吞吐量的情況下,達到1.57-0.53倍的能源效率;或是在相同的吞吐量下,提升1.05-4.27倍的能源效率。第二個技巧是透過已知的神經網路權重分布,跳過前幾個位元的類比數位轉換,來降低量化器所需的功率消耗,進而提高能源效率。
    本設計以台積電40奈米CMOS標準1P9M製程實作測試晶片,整體晶片面積為2.732 mm2。測試結果顯示在輸入電壓0.9伏特及一億赫茲頻率下,本晶在搭配高稀疏性神經網路 [55] 下,其佈局後模擬結果的能源效率為35.64 TOPS/W,吞吐量為3.2 GOPs;在輸入電壓0.7伏特及七千萬赫茲時脈下,能源效率為50.92 TOPS/W,吞吐量為 2.24 GOPs;此外FoM_3 (EF*R_IF*R_W*OR_bit)為 2271-3245 TOPS/W,FoM_4 (EF*R_IF*R_W*OR_level)為 2246-3209 TOPS/W

    This thesis presents a high-throughput and high-energy-efficiency analog Computing-in-Memory (CIM) macro which is aimed to the application of neural network (NN). This CIM macro is based on static random-access memory (SRAM) with additional 6 transistors and 2 capacitors in each bit-cell, which constitute two sets of analog computing elements based on charge-redistribution. We proposed two inventions of one architecture and one technique: twin-path input architecture and the pre-quantization technique. The first one aims at improving throughput. In the scenario where twin-path input architecture increases throughput by 2 times, the energy efficiency becomes 1.57-0.53 times; or under the same throughput, it improves energy efficiency by 1.05-4.27 times. The second one is proposed to reduce power consumption by skipping the few MSBs during quantization according to the known distributions of weights that have been trained by neural networks.
    The proof-of-concept prototype was fabricated in TSMC 40-nm CMOS general purposed 1P9M technology, where the chip covers an area of 2.732 mm2. The post-layout simulation with the high sparsity neural network [55], supply voltage of 0.9-V and a clock frequency of 100-MHz, respectively, the energy efficiency is 35.64 TOPS/W while throughput is 3.2 GOPs. In addition, the energy-efficiency could achieve 50.92 TOPS/W with a supply voltage of 0.7-V and a clock frequency 70-MHz while the throughput is 2.23 GOPs. Besides, the FoM_3 (EF*R_IF*R_W*OR_bit) is 2271-3245 TOPS/W and the FoM_4 (EF*R_IF*R_W*OR_level) is 2246-3209 TOPS/W.

    a摘 要 IV Abstract V List of Tables XII List of Figures XIII Chapter 1 Introduction 1 1.1 Background and Motivation 1 1.2 Computing-in-Memory 2 1.3 Thesis Organization 4 Chapter 2 Basics of Neural Networks 5 2.1 Fundamentals of Neural Networks 5 2.2 Explore the Function of Neural Networks and Different Neural Networks Architecture 10 2.2.1 Basics of Convolutional Neural Networks 10 2.2.2 Fully Connected Layer 12 2.2.3 Non-Linear Activation Function 13 2.2.4 Pooling Function 15 2.2.5 Normalization Function 16 2.3 Quantization of Neural Networks 17 2.3.1 Uniform (Linear) Quantization 19 2.3.2 Non-uniform (Non-linear Quantization) 20 Chapter 3 Fundamentals of AI Accelerator and Computing-in-Memory 22 3.1 Introduction of AI accelerator 23 3.1.1 Dataflow and data reuse 26 3.2 Computing-in-Memory (CIM) architecture 31 3.2.1 Basic concept of DAC 34 3.2.2 Static Specifications of DAC 35 3.2.3 DAC (MAC operation) in CIM 39 3.2.4 Analog-to-Digital Conversion 46 3.2.5 The Concept and Operation of SAR ADC 47 3.2.6 Static Specifications of ADC 51 3.3 Figure of merit (FOM) of CIM and different types of CIM 59 Chapter 4 A High Energy Efficiency and High Throughput Analog CIM 66 4.1 Introduction 66 4.2 Proposed Twin-Path Input Architecture 67 4.3 Proposed Pre-quantization Technique 74 4.4 The CIM Architecture 81 4.4.1 The proposed 12T2C bit-cell 82 4.4.2 SAR ADC with pre-quantization technique 84 4.5 Circuits of CIM and SRAM 88 Chapter 5 Simulation and Measurement Results 99 5.1 Layout and Chip Floor Plan 99 5.2 Simulation Results 103 5.3 Die Micrograph and Measurement Setup 109 5.4 Measurement Results 112 Chapter 6 Conclusion and Future Works 116 Bibliography 118

    [1] Xiaoling Xia, Cui Xu, and Bing Nan, “Inception-v3 for flower classification,” The 2nd IEEE international conference on image, vision and computing (ICIVC), 2017.
    [2] Yi Zhu and Shawn Newsam, “Densenet for dense flow,” 2017 IEEE international conference on image processing (ICIP). 2017.
    [3] Xu Qin and Zhilin Wang, “Nasnet: A neuron attention stage-by-stage net for single image deraining,” arXiv preprint arXiv:1912.03151 (2019).
    [4] John von-Neumann, “First Draft of a Report on the EDVAC,” IEEE Annals of the History of Computing 15.4 (1993): 27-75.
    [5] Vivienne Sze, et al., “Efficient processing of deep neural networks: A tutorial and survey,” Proceedings of the IEEE, 105.12 (2017): 2295-2329.
    [6] Yu-Hsin Chen, et al., “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” IEEE journal of solid-state circuits, 52.1 (2016): 127-138.
    [7] Yu-Hsin Chen, et al., “Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 9.2 (2019): 292-308.
    [8] Ji-Hoon Kim, et al., “Z-PIM: A sparsity-aware processing-in-memory architecture with fully variable weight bit-precision for energy-efficient deep neural networks,” IEEE Journal of Solid-State Circuits, 56.4 (2021): 1093-1104.
    [9] L. Deng, G. Li, S. Han, L. Shi and Y. Xie, “Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey,” in Proceedings of the IEEE, vol. 108, no. 4, pp. 485-532, April 2020.
    [10] Jie-Fang Zhang, et al., “Snap: An efficient sparse neural acceleration processor for unstructured sparse deep neural network inference,” IEEE Journal of Solid-State Circuits, 56.2 (2020): 636-647.
    [11] Chowdhary, KR1442, “Natural language processing,” Fundamentals of artificial intelligence (2020): 603-649.
    [12] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM, 60.6 (2017): 84-90.
    [13] Mingxing Tan and Quoc Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” International conference on machine learning. PMLR, 2019.
    [14] Samuel Greydanus, Misko Dzamba, and Jason Yosinski, “Hamiltonian neural networks,” Advances in neural information processing systems, 32 (2019).
    [15] Warren S. McCulloch and Walter Pitts, “A logical calculus of the ideas immanent in nervous activity,” The bulletin of mathematical biophysics, 5.4 (1943): 115-133.
    [16] Sepp Hochreiter, “The vanishing gradient problem during learning recurrent neural nets and problem solutions,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6.02 (1998): 107-116.
    [17] Sergey Ioffe, and Christian Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” International conference on machine learning. PMLR, 2015.
    [18] Haotong Qin, et al., “Binary neural networks: A survey,” Pattern Recognition 105 (2020): 107281.
    [19] Hande Alemdar, et al., “Ternary neural networks for resource-efficient AI applications,” IEEE international joint conference on neural networks (IJCNN), 2017.
    [20] Mark Horowitz, “1.1 computing's energy problem (and what we can do about it),” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2014.
    [21] Vivienne Sze, et al., “Hardware for machine learning: Challenges and opportunities,” 2017 IEEE Custom Integrated Circuits Conference (CICC). IEEE, 2017.
    [22] Jintao Zhang, Zhuo Wang, and Naveen Verma, “In-memory computation of a machine-learning classifier in a standard 6T SRAM array,” IEEE Journal of Solid-State Circuits, 52.4 (2017): 915-924.
    [23] Gonugondla, Sujan Kumar, Mingu Kang, and Naresh Shanbhag, “A 42pJ/decision 3.12 TOPS/W robust in-memory machine learning classifier with on-chip training,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2018.
    [24] Win-San Khwa, et al., “A 65nm 4Kb algorithm-dependent Computing-in-Memory SRAM unit-macro with 2.3 ns and 55.8 TOPS/W fully parallel product-sum operation for binary DNN edge processors,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2018.
    [25] Xin Si, et al., “A twin-8T SRAM computation-in-memory macro for multiple-bit CNN-based machine learning,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2019.
    [26] Shihui Yin, et al., “XNOR-SRAM: In-memory computing SRAM macro for binary/ternary deep neural networks,” IEEE Journal of Solid-State Circuits, 55.6 (2020): 1733-1743.
    [27] Hossein Valavi, et al., “A mixed-signal binarized convolutional-neural-network accelerator integrating dense weight storage and multiplication for reduced data movement,” IEEE Symposium on VLSI Circuits, 2018.
    [28] Hossein Valavi, et al., “A 64-tile 2.4-Mb in-memory-computing CNN accelerator employing charge-domain compute,” IEEE Journal of Solid-State Circuits, 54.6 (2019): 1789-1799.
    [29] Zhewei Jiang, et al., “C3SRAM: An in-memory-computing SRAM macro based on robust capacitive coupling computing mechanism,” IEEE Journal of Solid-State Circuits, 55.7 (2020): 1888-1897.
    [30] Jinseok Lee, et al., “Fully row/column-parallel in-memory computing SRAM macro employing capacitor-based mixed-signal computation with 5-b inputs,” IEEE Symposium on VLSI Circuits, 2021.
    [31] Xin Si, et al., “15.5 A 28nm 64Kb 6T SRAM Computing-in-Memory macro with 8b MAC operation for AI edge chips,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2020.
    [32] Jian-Wei Su, et al., “16.3 a 28nm 384kb 6t-sram computation-in-memory macro with 8b precision for ai edge chips,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2021.
    [33] Avishek Biswas and Anantha P. Chandrakasan, “Conv-RAM: An energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2018.
    [34] Qing Dong, et al., “15.3 A 351TOPS/W and 372.4 GOPS compute-in-memory SRAM macro in 7nm FinFET CMOS for machine-learning applications,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2020.
    [35] Ping-Chun Wu, et al., “A 28nm 1Mb Time-Domain Computing-in-Memory 6T-SRAM Macro with a 6.6 ns Latency, 1241GOPS and 37.01 TOPS/W for 8b-MAC Operations for Edge-AI Devices,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2022.
    [36] Jun Yang, et al., “24.4 sandwich-RAM: An energy-efficient in-memory BWN architecture with pulse-width modulation,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2019.
    [37] Ruiqi Guo, et al., “A 5.1 pJ/neuron 127.3 us/inference RNN-based speech recognition processor using 16 Computing-in-Memory SRAM macros in 65nm CMOS,” IEEE Symposium on VLSI Circuits, 2019.
    [38] Yen-Cheng Chiu, et al., “A 4-Kb 1-to-8-bit configurable 6T SRAM-based computation-in-memory unit-macro for CNN-based AI edge processors,” IEEE Journal of Solid-State Circuits, 55.10 (2020): 2790-2801.
    [39] Zhiyu Chen, et al., “CAP-RAM: A charge-domain in-memory computing 6T-SRAM for accurate and precision-programmable CNN inference,” IEEE Journal of Solid-State Circuits, 56.6 (2021): 1924-1935.
    [40] Jinseok Lee, et al., “Fully row/column-parallel in-memory computing SRAM macro employing capacitor-based mixed-signal computation with 5-b inputs,” IEEE Symposium on VLSI Circuits, 2021.
    [41] Chengshuo Yu, et al., “A 65-nm 8T SRAM Compute-in-Memory Macro With Column ADCs for Processing Neural Networks,” IEEE Journal of Solid-State Circuits, (2022).
    [42] Guan-Ying Huang, et al., “A 1-µW 10-bit 200-kS/s SAR ADC with a bypass window for biomedical applications,” IEEE Journal of Solid-State Circuits, 47.11 (2012): 2783-2795.
    [43] Chun-Cheng Liu, et al., “A 1V 11fJ/conversion-step 10bit 10MS/s asynchronous SAR ADC in 0.18 µm CMOS,” IEEE Symposium on VLSI Circuits, 2010.
    [44] Jon Guerber, et al., “A 10-b ternary SAR ADC with quantization time information utilization,” IEEE Journal of Solid-State Circuits, 47.11 (2012): 2604-2613.
    [45] Yan Zhu, et al., “A 10-bit 100-MS/s reference-free SAR ADC in 90 nm CMOS,” IEEE Journal of Solid-state circuits, 45.6 (2010): 1111-1121.
    [46] Guan-Ying Huang, et al., “A 10b 200MS/s 0.82 mW SAR ADC in 40nm CMOS,” IEEE Asian Solid-State Circuits Conference (A-SSCC), 2013.
    [47] C.-H. Kuo, “A 10-bit 120-MS/s SAR ADC with compact architecture and noise suppression technique,” M.S. thesis, Dept. Elect. Eng., National Cheng Kung Univ., Tainan, Taiwan, 2014.
    [48] Randy W. Mann, and Benton H. Calhoun, “New category of ultra-thin notchless 6T SRAM cell layout topologies for sub-22nm,” The 12th IEEE International Symposium on Quality Electronic Design, 2011.
    [49] Chun-Cheng Liu, Che-Hsun Kuo, and Ying-Zu Lin, “A 10 bit 320 MS/s low-cost SAR ADC for IEEE 802.11 ac applications in 20 nm CMOS,” IEEE Journal of Solid-State Circuits, 50.11 (2015): 2645-2654.
    [50] Hao Xu and Asad A. Abidi, “Analysis and design of regenerative comparators for low offset and noise,” IEEE Transactions on Circuits and Systems I: Regular Papers, 66.8 (2019): 2817-2830.
    [51] Bernhard Wicht, Thomas Nirschl, and Doris Schmitt-Landsiedel, “Yield and speed optimization of a latch-type voltage sense amplifier,” IEEE Journal of Solid-State Circuits, 39.7 (2004): 1148-1158.
    [52] Bo Wang, et al., “A 28nm Horizontal-Weight-Shift and Vertical-feature-Shift-Based Separate-WL 6T-SRAM Computation-in-Memory Unit-Macro for Edge Depthwise Neural-Networks,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2023.
    [53] Peiyu Chen, et al., “A 22nm Delta-Sigma Computing-in-Memory (Δ∑ CIM) SRAM Macro with Near-Zero-Mean Outputs and LSB-First ADCs Achieving 21.38 TOPS/W for 8b-MAC Edge AI Processing,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2023.
    [54] Sung-En Hsieh, et al., “A 70.85-86.27 TOPS/W PVT-Insensitive 8b Word-Wise ACIM with Post-Processing Relaxation,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2023.
    [55] Zih-Huang Cheng, Chih-Hung Kuo, “Bit-Wise Quantization-Aware Training for Sparsifying Neural Networks,” M.S. thesis, Dept. Elect. Eng., National Cheng Kung Univ., Tainan, Taiwan, 2023.

    無法下載圖示 校內:2029-04-19公開
    校外:2029-04-19公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE