成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	盛祖丞 Sheng, Zu-Cheng
論文名稱：	一個採用雙路輸入架構與預先量化技巧之類比式記憶體內運算巨集 An Analog Computing-In-Memory Macro with Twin-Path Input Architecture and Pre-quantization Technique
指導教授：	張順志 Chang, Soon-Jyh
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2024
畢業學年度：	112
語文別：	中文
論文頁數：	143
中文關鍵詞：	記憶體內運算、靜態隨機存取記憶體、類神經網路、高能源效率、稀疏性
外文關鍵詞：	Computing-in-Memory (CIM), static random-access memory (SRAM), neural network (NN), high energy-efficiency, Sparsity
相關次數：	點閱：144 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文提出一個高能源效率與高吞吐量之類比式記憶體內運算巨集，其適用於類神經網路之應用。此晶片基於靜態隨機存取記憶體，在每個位元單元中加入額外六顆電晶體和兩個電容，實現兩組用電荷重新分布實現的類比運算單元。為了提升加速器之吞吐量與能源效率，本論文提出了兩個架構和技巧：雙路輸入架構與預先量化技巧。第一個架構可以優化吞吐量和能源效率之間的設計取捨。雙路輸入架構在2倍吞吐量的情況下，達到1.57-0.53倍的能源效率；或是在相同的吞吐量下，提升1.05-4.27倍的能源效率。第二個技巧是透過已知的神經網路權重分布，跳過前幾個位元的類比數位轉換，來降低量化器所需的功率消耗，進而提高能源效率。
本設計以台積電40奈米CMOS標準1P9M製程實作測試晶片，整體晶片面積為2.732 mm2。測試結果顯示在輸入電壓0.9伏特及一億赫茲頻率下，本晶在搭配高稀疏性神經網路 [55] 下，其佈局後模擬結果的能源效率為35.64 TOPS/W，吞吐量為3.2 GOPs；在輸入電壓0.7伏特及七千萬赫茲時脈下，能源效率為50.92 TOPS/W，吞吐量為 2.24 GOPs；此外FoM_3 (EF*R_IF*R_W*OR_bit)為 2271-3245 TOPS/W，FoM_4 (EF*R_IF*R_W*OR_level)為 2246-3209 TOPS/W

This thesis presents a high-throughput and high-energy-efficiency analog Computing-in-Memory (CIM) macro which is aimed to the application of neural network (NN). This CIM macro is based on static random-access memory (SRAM) with additional 6 transistors and 2 capacitors in each bit-cell, which constitute two sets of analog computing elements based on charge-redistribution. We proposed two inventions of one architecture and one technique: twin-path input architecture and the pre-quantization technique. The first one aims at improving throughput. In the scenario where twin-path input architecture increases throughput by 2 times, the energy efficiency becomes 1.57-0.53 times; or under the same throughput, it improves energy efficiency by 1.05-4.27 times. The second one is proposed to reduce power consumption by skipping the few MSBs during quantization according to the known distributions of weights that have been trained by neural networks.
The proof-of-concept prototype was fabricated in TSMC 40-nm CMOS general purposed 1P9M technology, where the chip covers an area of 2.732 mm2. The post-layout simulation with the high sparsity neural network [55], supply voltage of 0.9-V and a clock frequency of 100-MHz, respectively, the energy efficiency is 35.64 TOPS/W while throughput is 3.2 GOPs. In addition, the energy-efficiency could achieve 50.92 TOPS/W with a supply voltage of 0.7-V and a clock frequency 70-MHz while the throughput is 2.23 GOPs. Besides, the FoM_3 (EF*R_IF*R_W*OR_bit) is 2271-3245 TOPS/W and the FoM_4 (EF*R_IF*R_W*OR_level) is 2246-3209 TOPS/W.

a摘  要	IV
Abstract	V
List of Tables	XII
List of Figures	XIII
Chapter 1	Introduction	1
1	Background and Motivation	1
2	Computing-in-Memory	2
3	Thesis Organization	4
Chapter 2	Basics of Neural Networks	5
1	Fundamentals of Neural Networks	5
2	Explore the Function of Neural Networks and Different Neural Networks Architecture	10
2.1	Basics of Convolutional Neural Networks	10
2.2	Fully Connected Layer	12
2.3	Non-Linear Activation Function	13
2.4	Pooling Function	15
2.5	Normalization Function	16
3	Quantization of Neural Networks	17
3.1	Uniform (Linear) Quantization	19
3.2	Non-uniform (Non-linear Quantization)	20
Chapter 3	Fundamentals of AI Accelerator and Computing-in-Memory	22
1	Introduction of AI accelerator	23
1.1	Dataflow and data reuse	26
2	Computing-in-Memory (CIM) architecture	31
2.1	Basic concept of DAC	34
2.2	Static Specifications of DAC	35
2.3	DAC (MAC operation) in CIM	39
2.4	Analog-to-Digital Conversion	46
2.5	The Concept and Operation of SAR ADC	47
2.6	Static Specifications of ADC	51
3	Figure of merit (FOM) of CIM and different types of CIM	59
Chapter 4	A High Energy Efficiency and High Throughput Analog CIM	66
1	Introduction	66
2	Proposed Twin-Path Input Architecture	67
3	Proposed Pre-quantization Technique	74
4	The CIM Architecture	81
4.1	The proposed 12T2C bit-cell	82
4.2	SAR ADC with pre-quantization technique	84
5	Circuits of CIM and SRAM	88
Chapter 5	Simulation and Measurement  Results	99
1	Layout and Chip Floor Plan	99
2	Simulation Results	103
3	Die Micrograph and Measurement Setup	109
4	Measurement Results	112
Chapter 6	Conclusion and Future Works	116
Bibliography	118
                                    

[1] Xiaoling Xia, Cui Xu, and Bing Nan, “Inception-v3 for flower classification,” The 2nd IEEE international conference on image, vision and computing (ICIVC), 2017.
[2] Yi Zhu and Shawn Newsam, “Densenet for dense flow,” 2017 IEEE international conference on image processing (ICIP). 2017.
[3] Xu Qin and Zhilin Wang, “Nasnet: A neuron attention stage-by-stage net for single image deraining,” arXiv preprint arXiv:1912.03151 (2019).
[4] John von-Neumann, “First Draft of a Report on the EDVAC,” IEEE Annals of the History of Computing 15.4 (1993): 27-75.
[5] Vivienne Sze, et al., “Efficient processing of deep neural networks: A tutorial and survey,” Proceedings of the IEEE, 105.12 (2017): 2295-2329.
[6] Yu-Hsin Chen, et al., “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” IEEE journal of solid-state circuits, 52.1 (2016): 127-138.
[7] Yu-Hsin Chen, et al., “Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 9.2 (2019): 292-308.
[8] Ji-Hoon Kim, et al., “Z-PIM: A sparsity-aware processing-in-memory architecture with fully variable weight bit-precision for energy-efficient deep neural networks,” IEEE Journal of Solid-State Circuits, 56.4 (2021): 1093-1104.
[9] L. Deng, G. Li, S. Han, L. Shi and Y. Xie, “Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey,” in Proceedings of the IEEE, vol. 108, no. 4, pp. 485-532, April 2020.
[10] Jie-Fang Zhang, et al., “Snap: An efficient sparse neural acceleration processor for unstructured sparse deep neural network inference,” IEEE Journal of Solid-State Circuits, 56.2 (2020): 636-647.
[11] Chowdhary, KR1442, “Natural language processing,” Fundamentals of artificial intelligence (2020): 603-649.
[12] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM, 60.6 (2017): 84-90.
[13] Mingxing Tan and Quoc Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” International conference on machine learning. PMLR, 2019.
[14] Samuel Greydanus, Misko Dzamba, and Jason Yosinski, “Hamiltonian neural networks,” Advances in neural information processing systems, 32 (2019).
[15] Warren S. McCulloch and Walter Pitts, “A logical calculus of the ideas immanent in nervous activity,” The bulletin of mathematical biophysics, 5.4 (1943): 115-133.
[16] Sepp Hochreiter, “The vanishing gradient problem during learning recurrent neural nets and problem solutions,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6.02 (1998): 107-116.
[17] Sergey Ioffe, and Christian Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” International conference on machine learning. PMLR, 2015.
[18] Haotong Qin, et al., “Binary neural networks: A survey,” Pattern Recognition 105 (2020): 107281.
[19] Hande Alemdar, et al., “Ternary neural networks for resource-efficient AI applications,” IEEE international joint conference on neural networks (IJCNN), 2017.
[20] Mark Horowitz, “1.1 computing's energy problem (and what we can do about it),” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2014.
[21] Vivienne Sze, et al., “Hardware for machine learning: Challenges and opportunities,” 2017 IEEE Custom Integrated Circuits Conference (CICC). IEEE, 2017.
[22] Jintao Zhang, Zhuo Wang, and Naveen Verma, “In-memory computation of a machine-learning classifier in a standard 6T SRAM array,” IEEE Journal of Solid-State Circuits, 52.4 (2017): 915-924.
[23] Gonugondla, Sujan Kumar, Mingu Kang, and Naresh Shanbhag, “A 42pJ/decision 3.12 TOPS/W robust in-memory machine learning classifier with on-chip training,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2018.
[24] Win-San Khwa, et al., “A 65nm 4Kb algorithm-dependent Computing-in-Memory SRAM unit-macro with 2.3 ns and 55.8 TOPS/W fully parallel product-sum operation for binary DNN edge processors,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2018.
[25] Xin Si, et al., “A twin-8T SRAM computation-in-memory macro for multiple-bit CNN-based machine learning,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2019.
[26] Shihui Yin, et al., “XNOR-SRAM: In-memory computing SRAM macro for binary/ternary deep neural networks,” IEEE Journal of Solid-State Circuits, 55.6 (2020): 1733-1743.
[27] Hossein Valavi, et al., “A mixed-signal binarized convolutional-neural-network accelerator integrating dense weight storage and multiplication for reduced data movement,” IEEE Symposium on VLSI Circuits, 2018.
[28] Hossein Valavi, et al., “A 64-tile 2.4-Mb in-memory-computing CNN accelerator employing charge-domain compute,” IEEE Journal of Solid-State Circuits, 54.6 (2019): 1789-1799.
[29] Zhewei Jiang, et al., “C3SRAM: An in-memory-computing SRAM macro based on robust capacitive coupling computing mechanism,” IEEE Journal of Solid-State Circuits, 55.7 (2020): 1888-1897.
[30] Jinseok Lee, et al., “Fully row/column-parallel in-memory computing SRAM macro employing capacitor-based mixed-signal computation with 5-b inputs,” IEEE Symposium on VLSI Circuits, 2021.
[31] Xin Si, et al., “15.5 A 28nm 64Kb 6T SRAM Computing-in-Memory macro with 8b MAC operation for AI edge chips,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2020.
[32] Jian-Wei Su, et al., “16.3 a 28nm 384kb 6t-sram computation-in-memory macro with 8b precision for ai edge chips,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2021.
[33] Avishek Biswas and Anantha P. Chandrakasan, “Conv-RAM: An energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2018.
[34] Qing Dong, et al., “15.3 A 351TOPS/W and 372.4 GOPS compute-in-memory SRAM macro in 7nm FinFET CMOS for machine-learning applications,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2020.
[35] Ping-Chun Wu, et al., “A 28nm 1Mb Time-Domain Computing-in-Memory 6T-SRAM Macro with a 6.6 ns Latency, 1241GOPS and 37.01 TOPS/W for 8b-MAC Operations for Edge-AI Devices,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2022.
[36] Jun Yang, et al., “24.4 sandwich-RAM: An energy-efficient in-memory BWN architecture with pulse-width modulation,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2019.
[37] Ruiqi Guo, et al., “A 5.1 pJ/neuron 127.3 us/inference RNN-based speech recognition processor using 16 Computing-in-Memory SRAM macros in 65nm CMOS,” IEEE Symposium on VLSI Circuits, 2019.
[38] Yen-Cheng Chiu, et al., “A 4-Kb 1-to-8-bit configurable 6T SRAM-based computation-in-memory unit-macro for CNN-based AI edge processors,” IEEE Journal of Solid-State Circuits, 55.10 (2020): 2790-2801.
[39] Zhiyu Chen, et al., “CAP-RAM: A charge-domain in-memory computing 6T-SRAM for accurate and precision-programmable CNN inference,” IEEE Journal of Solid-State Circuits, 56.6 (2021): 1924-1935.
[40] Jinseok Lee, et al., “Fully row/column-parallel in-memory computing SRAM macro employing capacitor-based mixed-signal computation with 5-b inputs,” IEEE Symposium on VLSI Circuits, 2021.
[41] Chengshuo Yu, et al., “A 65-nm 8T SRAM Compute-in-Memory Macro With Column ADCs for Processing Neural Networks,” IEEE Journal of Solid-State Circuits, (2022).
[42] Guan-Ying Huang, et al., “A 1-µW 10-bit 200-kS/s SAR ADC with a bypass window for biomedical applications,” IEEE Journal of Solid-State Circuits, 47.11 (2012): 2783-2795.
[43] Chun-Cheng Liu, et al., “A 1V 11fJ/conversion-step 10bit 10MS/s asynchronous SAR ADC in 0.18 µm CMOS,” IEEE Symposium on VLSI Circuits, 2010.
[44] Jon Guerber, et al., “A 10-b ternary SAR ADC with quantization time information utilization,” IEEE Journal of Solid-State Circuits, 47.11 (2012): 2604-2613.
[45] Yan Zhu, et al., “A 10-bit 100-MS/s reference-free SAR ADC in 90 nm CMOS,” IEEE Journal of Solid-state circuits, 45.6 (2010): 1111-1121.
[46] Guan-Ying Huang, et al., “A 10b 200MS/s 0.82 mW SAR ADC in 40nm CMOS,” IEEE Asian Solid-State Circuits Conference (A-SSCC), 2013.
[47] C.-H. Kuo, “A 10-bit 120-MS/s SAR ADC with compact architecture and noise suppression technique,” M.S. thesis, Dept. Elect. Eng., National Cheng Kung Univ., Tainan, Taiwan, 2014.
[48] Randy W. Mann, and Benton H. Calhoun, “New category of ultra-thin notchless 6T SRAM cell layout topologies for sub-22nm,” The 12th IEEE International Symposium on Quality Electronic Design, 2011.
[49] Chun-Cheng Liu, Che-Hsun Kuo, and Ying-Zu Lin, “A 10 bit 320 MS/s low-cost SAR ADC for IEEE 802.11 ac applications in 20 nm CMOS,” IEEE Journal of Solid-State Circuits, 50.11 (2015): 2645-2654.
[50] Hao Xu and Asad A. Abidi, “Analysis and design of regenerative comparators for low offset and noise,” IEEE Transactions on Circuits and Systems I: Regular Papers, 66.8 (2019): 2817-2830.
[51] Bernhard Wicht, Thomas Nirschl, and Doris Schmitt-Landsiedel, “Yield and speed optimization of a latch-type voltage sense amplifier,” IEEE Journal of Solid-State Circuits, 39.7 (2004): 1148-1158.
[52] Bo Wang, et al., “A 28nm Horizontal-Weight-Shift and Vertical-feature-Shift-Based Separate-WL 6T-SRAM Computation-in-Memory Unit-Macro for Edge Depthwise Neural-Networks,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2023.
[53] Peiyu Chen, et al., “A 22nm Delta-Sigma Computing-in-Memory (Δ∑ CIM) SRAM Macro with Near-Zero-Mean Outputs and LSB-First ADCs Achieving 21.38 TOPS/W for 8b-MAC Edge AI Processing,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2023.
[54] Sung-En Hsieh, et al., “A 70.85-86.27 TOPS/W PVT-Insensitive 8b Word-Wise ACIM with Post-Processing Relaxation,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, 2023.
[55] Zih-Huang Cheng, Chih-Hung Kuo, “Bit-Wise Quantization-Aware Training for Sparsifying Neural Networks,” M.S. thesis, Dept. Elect. Eng., National Cheng Kung Univ., Tainan, Taiwan, 2023.

校內：2029-04-19公開
校外：2029-04-19公開電子論文尚未授權公開，紙本請查館藏目錄

簡易檢索 / 詳目顯示

相關論文