成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	錢信達 Chien, Hsin-Ta
論文名稱：	一個應用於神經網路之全類比式記憶體內運算加速器 A Fully Analog Computing-in-Memory Accelerator for Neural Network
指導教授：	張順志 Chang, Soon-Jyh
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	英文
論文頁數：	139
中文關鍵詞：	記憶體內運算、類比人工智慧加速器、神經網路、電荷重新分佈、類比運算、靜態隨機存取記憶體
外文關鍵詞：	computing-in-memory (CIM), analog ai accelerator, neural network (NN), charge redistribution, fully analog computation, static random access memory (SRAM)
相關次數：	點閱：132 下載：7
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文提出一個應用於神經網路以全類比式處理之記憶體內運算人工智慧加速器。本論文提出全新架構以進行全類比式的運算，捨棄了記憶體內運算巨集常用的類比數位轉換器、數位類比轉換器以及存儲上述所產生之數位碼的暫存器，將以上電路的功能整合至切換式電容積分器。其中靜態隨機存取記憶體採用了九顆電晶體，其可直接讀取權重以實現運算，並可在運算時避免誤寫入資料。此外，本論文提出了三個技巧：第一個技巧為電容式類比接收單元，輔以電荷重新分佈來進行乘加運算，此方法比常見的電流充放電方式具有較佳的線性度，且可直接接收類比電壓進行運算，不需要數位類比轉換器；第二個技巧為基於電荷共享的類比累進加法電路，與傳統數位式存儲電路與加法器相較，具有低功耗的優點；第三個技巧為內建二補數產生技術的切換式電容積分器，有利於增加運算速度、降低功耗。
本設計以台積電180 nm CMOS標準1P6M製程實作測試晶片，整體晶片面積為25 mm2，核心電路占整體的90%。測試結果顯示本晶片在MNIST的Top-1準確率可達88%。在輸入電壓1.8伏特及2.044μs的圖片判斷速度下，記憶體內運算電路的能源效率為457.0 TOPS/W，神經網路運算系統的能源效率為1.88 TOPS/W；在輸入電壓1.6伏特下，而記憶體內運算電路的能源效率為706.3 TOPS/W，神經網路運算系統的能源效率為3.03 TOPS/W。

This thesis presents a fully analog processing computing-in-memory (CIM) prototype chip for neural network. This chip proposes a new architecture for fully analog computing, abandoning the analog-to-digital converters, digital-to-analog converters and registers that store the generated digital codes commonly used in other CIM macros, and integrating the above-mentioned functions into a switched-capacitor integrator. The CIM macro adopts 9T static random access memory (SRAM) cell, in which the internal computing read port can directly read the weight in order to realize the MAC computation. In addition, this thesis proposes three techniques: the first technique is a capacitive analog receiver unit, and it can implement the multiplication and accumulation by charge redistribution. This method has better linearity than the current charging and discharging methods, and can directly receive analog voltages to perform computations without the need for a digital-to-analog converter. The second technique is an analog accumulative circuit based on charge sharing operation, which has the advantage of reducing power consumption compared with traditional digital storage circuits and adders. The third technique is a switched-capacitor integrator with built-in two's complement generation technique, which is beneficial to increase computing speed and reduce power consumption.
The proof-of-concept prototype was fabricated in TSMC's 180 nm CMOS standard 1P6M technology. The chip area is 25 mm2, and the core circuit accounts for 90%. The chip achieves 88% of Top-1 accuracy with MNIST. The measurement results show that at 1.8V input voltage and 2.044μs of inference speed per image, the energy efficiency of the CIM-macro is 457.0 TOPS/W, and the energy efficiency of neural-network-computing system is 1.88 TOPS/W; at 1.6V input voltage, the energy efficiency of the CIM-macro is 706.3 TOPS/W, and the energy efficiency of neural-network-computing system is 3.03 TOPS/W.

摘  要	I
Abstract	III
List of Tables	IX
List of Figures	X
Chapter 1	Introduction	1
1	Background and Motivation of Accelerator	1
2	Memory Centric Computation	3
3	Thesis Organization	5
Chapter 2	Overview of Neural Networks and Data Management	7
1	Introduction of Neural Networks	8
2	Basics of Convolutional Neural Network	12
2.1	Convolutional Neural Networks	12
2.2	Fully Connected Neural Networks	13
2.3	Non-Linear Activation Function	13
2.4	Pooling Function	14
2.5	Normalization Function	15
3	Quantization of Neural Networks	16
3.1	Linear Quantization	17
3.2	Non-Linear Quantization	18
3.3	Quantization-Aware Training	19
4	Introduction of Data Movement	20
4.1	Dataflow	20
4.2	Data Reuse	23
Chapter 3	Fundamental of Conventional Computing-in-Memory	26
1	Introduction to Computing-in-Memory	28
1.1	Concept of Computing-in-Memory	28
1.2	Dataflow and Structure of Computing-in-Memory	30
2	Digital-to-Analog Conversion	31
2.1	Fundamental of DAC	32
2.2	Static Specifications of DAC	34
2.3	Output types of DAC in CIM	37
3	Mixed-Signal Computation with SRAM	38
3.1	Classic 6T SRAM Bit-cell	38
3.2	Current-Based Multiplication and Accumulation	42
3.3	Charge-Based Multiplication and Accumulation	47
4	Mixed-Signal Computation with RRAM	49
4.1	Classic RRAM Bit-cell	49
4.2	Current-Based Multiplication and Accumulation	50
5	Analog-to-Digital Conversion	54
5.1	The Concept and Operation of SAR ADC	55
5.2	Quantization Error	58
5.3	Static Specifications of ADC	59
Chapter 4	Proposed Architecture: A Fully Analog Processing Computing-in-Memory AI Accelerator for Neural Network	62
1	Introduction	62
2	Proposed Architecture	63
3	Proposed Techniques	68
3.1	Analog Computing Unit	68
3.2	Analog Domino: Shift and Accumulation Method	70
3.3	Shadow 2’s Compliment Generation into the Integrator	74
4	Circuit Implementation	75
4.1	Peripheral Circuits of SRAM	75
4.2	Switched-Capacitor-Based Integrator	78
4.3	Bias Circuits	89
4.4	Dynamic Comparator	91
4.5	Capacitors on SRAM Cell for Computing	94
Chapter 5	Simulation and Measurement Results	97
1	Layout and Chip Floor Plan	97
2	Simulation Results	102
3	Die Micrograph and Measurement Setup	105
4	Measurement Results	113
Chapter 6	Conclusion and Future Works	130
Bibliography	132
                                    

[1] MAHESH, Batta. Machine learning algorithms-a review. International Journal of Science and Research (IJSR).[Internet], 2020.
[2] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–444, May 2015.
[3] Siri Team, “Hey Siri: An on-device DNN-powered voice trigger for apple’s personal assistant,” Apple Mach. Learn. J., vol. 1, no. 6, Oct. 2017. [Online]. Available: https://machinelearning.apple.com/2017/10/01/hey-siri.html.
[4] Computer Vision Machine Learning Team, “An on-device deep neural network for face detection,” Apple Mach. Learn. J., vol. 1, no. 7, Nov. 2017. [Online]. Available: https://machinelearning.apple.com/2017/11/16/face-detection.html.
[5] A. G. Howard et al. (2017). “MobileNets: Efficient convolutional neural networks for mobile vision applications.” [Online]. Available: https://arxiv.org/abs/1704.04861
[6] KWABENA, Owusu-Agyemang, et al. Mscryptonet: Multi-scheme privacy-preserving deep learning in cloud computing. IEEE Access, 2019.
[7] GHOSH, Ananda M.; GROLINGER, Katarina. Deep learning: Edge-cloud data analytics for iot. in Proc. of 2019 IEEE Canadian Conference of Electrical and Computer Engineering (CCECE), 2019.
[8] Silver, D., Huang, A., Maddison, C. et al. “Mastering the game of Go with deep neural networks and tree search,” Nature 529, 484–489.2016.
[9] K. He, X. Zhang, S. Ren and J. Sun, “Deep Residual Learning for Image Recognition,” in Proc. of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[10] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Quantized neural networks: Training neural networks with low precision weights and activations,” arXiv preprint arXiv:1609.07061, 2016.
[11] J. Wu, C. Leng, Y. Wang, Q. Hu and J. Cheng, “Quantized Convolutional Neural Networks for Mobile Devices,” in Proc. of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[12] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks,” arXiv preprint arXiv:1603.05279, 2016.
[13] I. Hubara, D. Soudry, and R. El-Yaniv, “Binarized Neural Networks,” NIPS. arXiv preprint arXiv:1602.02505, 2016.
[14] Q. Dong et al., “A 351TOPS/W and 372.4GOPS Compute-in-Memory SRAM Macro in 7nm FinFET CMOS for Machine-Learning Applications,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2020.
[15] M. Horowitz, “Computing's energy problem (and what we can do about it),” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2014.
[16] K. Guo, W. Li, K. Zhong, Z. Zhu, S. Zeng, S. Han, Y. Xie, P. Debacker, M. Verhelst, Y. Wang. "Neural Network Accelerator Comparison," [Online]. Available: https://nicsefc.ee.tsinghua.edu.cn/projects/neural-network-accelerator/
[17] W. -S. Khwa et al., “A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3ns and 55.8TOPS/W fully parallel product-sum operation for binary DNN edge processors,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2018.
[18] M. Kang, S. K. Gonugondla, A. Patil and N. R. Shanbhag, “A Multi-Functional In-Memory Inference Processor Using a Standard 6T SRAM Array,” in IEEE Journal of Solid-State Circuits, vol. 53, no. 2, pp. 642-655, Feb. 2018.
[19] https://zh.m.wikipedia.org/wiki/File:Neuron.svg
[20] V. Sze, Y. -H. Chen, T. -J. Yang and J. S. Emer, "Efficient Processing of Deep Neural Networks: A Tutorial and Survey," in Proc. of the IEEE, vol. 105, no. 12, pp. 2295-2329, Dec. 2017.
[21] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in Proc. of ICML, 2013.
[22] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification,” in Proc. of ICCV, 2015.
[23] D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate deep network learning by exponential linear units (ELUs),” in Proc. ICLR, 2016.
[24] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.” in Proc. of ICML, 2015.
[25] M. Yufei, N. Suda, Yu Cao, J. Seo and S. Vrudhula, “Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA,” in Proc. of 2016 26th International Conference on Field Programmable Logic and Applications (FPL), 2016.
[26] H, Song, H. Mao, and W. Dally. “Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding,” in Proc. of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[27] R. Krishnamoorthi, “Quantizing deep convolutional networks for efficient inference: A whitepaper,” in Proc. of CoRR, 2018.
[28] E. H. Lee, D. Miyashita, E. Chai, B. Murmann and S. S. Wong, “LogNet: Energy-efficient neural networks using logarithmic computation,” in Proc. of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.
[29] A. Zhou, A. Yao, Y. Guo, L. Xu and Y. Chen. “Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights,” in Proc. of ICLR, 2017.
[30] Jacob, Benoit, et al. "Quantization and training of neural networks for efficient integer-arithmetic-only inference." in Proc. of 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018.
[31] CICC ES4-3 - "Introduction to Compute-in-Memory" - Dr. Dave Fick and Dr. Laura Fick.
[32] X. Si et al., “A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2019.
[33] X. Si et al., “A 28nm 64Kb 6T SRAM Computing-in-Memory Macro with 8b MAC Operation for AI Edge Chips,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2020.
[34] J. -W. Su et al., “A 28nm 384kb 6T-SRAM Computation-in-Memory Macro with 8b Precision for AI Edge Chips,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2021.
[35] M. E. Sinangil et al., “A 7-nm Compute-in-Memory SRAM Macro Supporting Multi-Bit Input, Weight and Output and Achieving 351 TOPS/W and 372.4 GOPS,” in IEEE Journal of Solid-State Circuits, vol. 56, no. 1, pp. 188-198, Jan. 2021.
[36] H. Valavi, P. J. Ramadge, E. Nestler and N. Verma, "A 64-Tile 2.4-Mb In-Memory-Computing CNN Accelerator Employing Charge-Domain Compute," in IEEE Journal of Solid-State Circuits, vol. 54, no. 6, pp. 1789-1799, June 2019.
[37] Z. Jiang, S. Yin, J. Seo and M. Seok, "C3SRAM: An In-Memory-Computing SRAM Macro Based on Robust Capacitive Coupling Computing Mechanism," in IEEE Journal of Solid-State Circuits, vol. 55, no. 7, pp. 1888-1897, July 2020.
[38] B. Murmann, “ADC Performance Survey 1997-2018,” [Online]. Available: http://web.stanford.edu/~murmann/adcsurvey.html.
[39] J. Zhang, Z. Wang and N. Verma, “In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array,” in IEEE Journal of Solid-State Circuits, vol. 52, no. 4, pp. 915-924, April 2017.
[40] A. Biswas and A. P. Chandrakasan, “Conv-RAM: An energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2018.
[41] C. Liu, S. Chang, G. Huang and Y. Lin, “A 10-bit 50-MS/s SAR ADC With a Monotonic Capacitor Switching Procedure,” in IEEE Journal of Solid-State Circuits, vol. 45, no. 4, pp. 731-740, April 2010.
[42] C. -X. Xue et al., “A 22nm 4Mb 8b-Precision ReRAM Computing-in-Memory Macro with 11.91 to 195.7TOPS/W for Tiny AI Edge Devices,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2021
[43] CHEN, Wei-Hao, et al., “A 16Mb dual-mode ReRAM macro with sub-14ns computing-in-memory and memory functions enabled by self-write termination scheme,” in Proc. of 2017 IEEE International Electron Devices Meeting (IEDM), 2017.
[44] MOCHIDA, Reiji, et al., “A 4M synapses integrated analog ReRAM based 66.5 TOPS/W neural-network processor with cell current controlled writing and flexible network architecture,” in Proc. of 2018 IEEE Symposium on VLSI Technology, 2018.
[45] CHEN, Wei-Hao, et al., “A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2018.
[46] XUE, Cheng-Xin, et al., “A 1Mb multibit ReRAM computing-in-memory macro with 14.6 ns parallel MAC computing time for CNN based AI edge processors,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2019.
[47] KIM, Hyungjun; KIM, Yulhwa; KIM, Jae-Joon, “In-memory batch-normalization for resistive memory based binary neural network hardware,” in Proc. of the 24th Asia and South Pacific Design Automation Conference. 2019.
[48] PENG, Xiaochen; LIU, Rui; YU, Shimeng, “Optimizing weight mapping and data flow for convolutional neural networks on RRAM based processing-in-memory architecture,” in Proc. of 2019 IEEE International Symposium on Circuits and Systems (ISCAS), 2019.
[49] XUE, Cheng-Xin, et al., “A 22nm 2Mb ReRAM compute-in-memory macro with 121-28TOPS/W for multibit MAC computing for tiny AI edge devices,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2020.
[50] LIU, Qi, et al., “A fully integrated analog ReRAM based 78.4 TOPS/W compute-in-memory chip with fully parallel MAC computing,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2020.
[51] ZHANG, Sai; HUANG, Kejie; SHEN, Haibin, “A robust 8-bit non-volatile computing-in-memory core for low-power parallel MAC operations,” IEEE Transactions on Circuits and Systems I: Regular Papers, 2020.
[52] XUE, Cheng-Xin, et al. “Embedded 1-Mb ReRAM-based computing-in-memory macro with multibit input and weight for CNN-based AI edge processors” IEEE Journal of Solid-State Circuits, 2019.
[53] J. Yue, Z. Yuan, M. F. Chang, et al., “A 65nm Computing-in-Memory-Based CNN Processor with 2.9-to-35.8TOPS/W System Energy Efficiency Using Dynamic-Sparsity Performance-Scaling Architecture and Energy Efficiency Inter/Intra-Macro Data Reuse,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2020.
[54] J. Su et al., "A 28nm 64Kb Inference-Training Two-Way Transpose Multibit 6T SRAM Compute-in-Memory Macro for AI Edge Chips," in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2020.
[55] Y. -D. Chih et al., "An 89TOPS/W and 16.3TOPS/mm2 All-Digital SRAM-Based Full-Precision Compute-In Memory Macro in 22nm for Machine-Learning Edge Applications," in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2021.
[56] J. Yue et al., “A 2.75-to-75.9TOPS/W Computing-in-Memory NN Processor Supporting Set-Associate Block-Wise Zero Skipping and Ping-Pong CIM with Simultaneous Computation and Weight Updating,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2021.
[57] JIANG, Hongwu, et al., “A 40nm Analog-Input ADC-Free Compute-in-Memory RRAM Macro with Pulse-Width Modulation between Sub-arrays,” in IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits) Dig. Tech. Papers, 2022.
[58] WU, Ping-Chun, et al., “A 28nm 1Mb Time-Domain Computing-in-Memory 6T-SRAM Macro with a 6.6 ns Latency, 1241GOPS and 37.01 TOPS/W for 8b-MAC Operations for Edge-AI Devices,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2022.
[59] YAN, Bonan, et al., “A 1.041-Mb/mm 2 27.38-TOPS/W Signed-INT8 Dynamic-Logic-Based ADC-less SRAM Compute-in-Memory Macro in 28nm with Reconfigurable Bitwise Operation for AI and Embedded Applications,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2022.

校內：2025-01-13公開
校外：2025-01-13公開

簡易檢索 / 詳目顯示

相關論文