成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	林柏翰 Lin, Po-Han
論文名稱：	一個基於八位元連續漸進逼近式類比數位轉換器且操作於一億赫茲之混訊神經網路加速器 An 8-bit 100-MHz SAR ADC-Based Mixed-Signal Accelerator for Neural Networks
指導教授：	張順志 Chang, Soon-Jyh
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2020
畢業學年度：	108
語文別：	英文
論文頁數：	109
中文關鍵詞：	混訊加速器、類比運算、乘加累積、神經網路、連續漸進逼近式、類比數位轉換器
外文關鍵詞：	mixed-signal accelerator, analog computation, multiply-accumulate (MAC), neural networks, successive-approximation register (SAR), analog-to-digital converter (ADC)
相關次數：	點閱：167 下載：21
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文提出一個基於八位元連續漸進逼近式類比數位轉換器且操作於一億赫茲之混訊神經網路加速器。除了量化神經網路中的激活(activation) 跟權重(weight)之外，此加速器採用類比式運算進一步降低神經網路中算術運算所需的能量。此外，為了提升神經網路的top-1準確率，本論文提出一個執行乘加累積運算的5階段切換方式去減少動態偏移誤差(dynamic offset)。最後，為了將類比的乘積累加運算結果量化成數位輸出碼，一個連續漸進逼近式類比數位轉換器被整合進加速器中。
本設計以台積電40奈米CMOS標準1P9M製程實作測試晶片。晶片面積佔 2.613 mm2，其中核心電路占整體的25.8%。在0.9伏特電源供電且一億赫茲時脈下，本設計在MNIST 與CIFAR10資料庫的top1準確率可分別達99.3%與86%，而能量效率為3.3TOPS/W，將每一算術運算能量除以類比數位轉換器的輸出階層數做為指標下，加速器在此操作情況下可達1.18 fJ/step，將加速器用0.7伏特電源供電且操作於八千萬赫茲時脈下，可進一步提升能量效率與指標，在此操作條件下，MNIST 與CIFAR10資料庫的top1準確率可分別維持在99.3%與85.3%，而能量效率與指標則分別為6.34 TOPS/W 與 0.62fJ/step。

This thesis presents an 8-bit 100-MHz SAR ADC-based mixed-signal accelerator for neural networks. In addition to quantizing the activations and weights in neural networks, the analog computation is adopted in the accelerator to further reduce the energy consumption per arithmetic operation. Moreover, in order to enhance the top-1 accuracies of neural networks, a 5-phase switching scheme, which performs the multiply-accumulate (MAC) operation, is proposed to mitigate the dynamic offset. Last but not least, a successive-approximation register (SAR) analog-to-digital converter (ADC) is incorporated into the proposed accelerator to quantize the analog multiply-accumulate signal into the digital output code.
The proof-of-concept prototype was fabricated in TSMC 40-nm CMOS standard 1P9M process, where the chip occupies 2.613 mm2, and the core circuit accounts for 25.8% of the total area. With 100-MHz clock frequency and 0.9-V supply voltage, the design achieves the top-1 accuracies of 99.3% and 87.3% on MNIST and CIFAR10 datasets, respectively. In addition, the energy efficiency of 3.3TOPS/W is attained, and the figure of merit (FOM), i.e. the energy consumption per arithmetic operation normalized to the quantization steps of the ADC output, is 1.18 fJ/step. To achieve better energy efficiency and FOM, the prototype is operated with 80-MHz clock frequency and 0.7-V supply voltage. In this case, the top-1 accuracies on MNIST and CIFAR10 datasets are 99.3% and 86%, respectively. The energy efficiency and FOM are 6.34 TOPS/W and 0.62fJ/step, respectively.

Table of Contents
摘  要	I
Abstract	II
List of Tables	IX
List of Figures	X
Chapter 1	Introduction	1
1	Background and Motivation	1
2	Thesis Organization	5
Chapter 2	Basics of Neural Networks	6
1	Basics of Deep Neural Network (DNN)	7
2	Overview of CNN	10
2.1	Basics of CNN	10
2.2	Non-Linear Activation Function	13
2.3	Pooling Function	15
2.4	Normalization Function	16
3	Quantization of Neural Networks	17
3.1	Linear Quantization	18
3.2	Non-Linear Quantization	20
3.3	Quantization-Aware Training Technique[24]	22
Chapter 3	Fundamentals of SAR ADC	24
1	Building Block of SAR ADC	25
1.1	Behavioral Operation of SAR ADC	26
1.2	Circuit-level Operation of SAR ADC	29
2	Quantization Error	32
3	Static Specifications	35
3.1	Offset Error	35
3.2	Gain error	36
3.3	Nonlinearity	38
3.3.1	Differential nonlinearity error	38
3.3.2	Integral Nonlinearity	40
4	Dynamic Specifications	43
4.1	Spurious-Free Dynamic Range	43
4.2	Signal-to-Noise Ratio	44
4.3	Signal-to-Noise and Distortion Ratio	46
4.4	Total Harmonic Distortion	47
4.5	Effective Number of Bits	48
4.6	Effective Resolution Bandwidth	48
4.7	Figure of Merit	49
Chapter 4	An 8-bit 100-MHz SAR ADC-Based Accelerator for Neural Networks	50
1	Introduction of Overall Architecture	51
2	Multiply-Accumulate Unit	52
2.1	Passive Digital-to-Analog Multiplier Circuit [15] [30]	53
2.2	Systematic Offsets from Parasitic Capacitances	56
2.3	Proposed 5-phase Switching Scheme	61
3	Adopted Techniques of SAR ADC	66
3.1	Merged Capacitor Switching Method [31]	66
3.2	Direct Switching technique [34]  and Compact Combinational Timing Control [35]	69
4	Circuit Realization	71
4.1	Phase Generator	71
4.2	MAC Control Logic	73
4.3	Dynamic Comparator	75
4.4	Capacitive DAC	77
Chapter 5	Simulation and Measurement Results	81
1	Layout and Chip Floor Plan	81
2	Simulation Results	85
3	Design Consideration for PCB	92
4	Die Micrograph and Measurement Setup	95
5	Measurement Results	97
Chapter 6	Conclusions and Future Works	103
Bibliography	106
                                    

[1] D. Silver et al., “Mastering the game of Go with deep neural networks and tree search.” Nature, 529(7587), pp. 484-489, 2016.
[2] K. He et al., “Deep Residual Learning for Image Recognition.” CVPR. arXiv preprint arXiv:1512.03385, 2016.
[3] Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Quantized neural networks: Training neural networks with low precision weights and activations. arXiv preprint:1609.07061, 2016.
[4] Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint:1606.06160, 2016.
[5] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi. XNOR-Net: Imagenet classification using binary convolutional neural networks. arXiv preprint arXiv:1603.05279, 2016. 1, 2.
[6] I. Hubara, et al., “Binarized Neural Networks.” NIPS. arXiv preprint arXiv:1602.02505, 2016.
[7] Q. Dong et al., "15.3 A 351TOPS/W and 372.4GOPS Compute-in-Memory SRAM Macro in 7nm FinFET CMOS for Machine-Learning Applications," in 2020 IEEE International Solid- State Circuits Conference - (ISSCC), 2020, pp. 242-244.
[8] E. A. Vittoz, "Future of analog in the VLSI environment," in IEEE International Symposium on Circuits and Systems, 1990, pp. 1372-1375 vol.2.
[9] B. E. Boser, E. Sackinger, J. Bromley, Y. LeCun, R. E. Howard, and L. D. Jackel, "An analog neural network processor and its application to high-speed character recognition," in IJCNN-91-Seattle International Joint Conference on Neural Networks, 1991, vol. i, pp. 415-420 vol.1.
[10] P. Masa et al., "10 mW CMOS retina and classifier for handheld, 1000 images/s optical character recognition system," in 1999 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC. First Edition (Cat. No.99CH36278), 1999, pp. 204-205.
[11] J. Lu, S. Young, I. Arel, and J. Holleman, "30.10 A 1TOPS/W analog deep machine-learning engine with floating-gate storage in 0.13μm CMOS," in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014, pp. 504-505.
[12] C. Xue et al., "15.4 A 22nm 2Mb ReRAM Compute-in-Memory Macro with 121-28TOPS/W for Multibit MAC Computing for Tiny AI Edge Devices," in 2020 IEEE International Solid- State Circuits Conference - (ISSCC), 2020, pp. 244-246.
[13] X. Si et al., "15.5 A 28nm 64Kb 6T SRAM Computing-in-Memory Macro with 8b MAC Operation for AI Edge Chips," in 2020 IEEE International Solid- State Circuits Conference - (ISSCC), 2020, pp. 246-248.
[14] K. Watanabe and G. Temes, "A switched-capacitor multiplier/divider with digital and analog outputs," IEEE Transactions on Circuits and Systems, vol. 31, no. 9, pp. 796-800, 1984.
[15] D. Bankman and B. Murmann, "An 8-bit, 16 input, 3.2 pJ/op switched-capacitor dot product circuit in 28-nm FDSOI CMOS," in 2016 IEEE Asian Solid-State Circuits Conference (A-SSCC), 2016, pp. 21-24.
[16] X Glorot, A Bordes, Y Bengio,"Deep Sparse Rectifier Neural Networks", in Proceedings of the 14th International Conference on Artifical Intekigence and Statistics 2011, Fort Lauderdale, FL, USA(2011).
[17] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in ICML,2013.
[18] D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs),” ICLR, 2016.
[19] X. Zhang, J. Trmal, D. Povey, and S. Khudanpur, "Improving deep neural network acoustic models using generalized maxout networks," in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 215-219.
[20] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in ICML, 2015.
[21] M. Yufei, N. Suda, C. Yu, J. Seo, and S. Vrudhula, "Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA," in 2016 26th International Conference on Field Programmable Logic and Applications (FPL), 2016, pp. 1-8.
[22] E. H. Lee, D. Miyashita, E. Chai, B. Murmann, and S. S. Wong, "LogNet: Energy-efficient neural networks using logarithmic computation," in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 5900-5904.
[23] H. M. S. Han, and W. J. Dally, “Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding,” in ICLR, 2016.
[24] R. Krishnamoorthi, "Quantizing deep convolutional networks for efficient inference: A whitepaper", CoRR, vol. abs/1806.08342, 2018.
[25] A. Y. A. Zhou, Y. Guo, L. Xu, and Y. Chen, “Incremental Network Quantization: Towards Lossless CNNs with Lowprecision Weights,” in ICLR, 2017.
[26] M. Horowitz, "1.1 Computing's energy problem (and what we can do about it)," in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014, pp. 10-14.
[27] V. Sze, Y. Chen, T. Yang, and J. S. Emer, "Efficient Processing of Deep Neural Networks: A Tutorial and Survey," Proceedings of the IEEE, vol. 105, no. 12, pp. 2295-2329, 2017.
[28] B. Murmann, “ADC Performance Survey 1997-2018,” [Online]. Available: http://web.stanford.edu/~murmann/adcsurvey.html.
[29] J. L. McCreary and P. R. Gray, "All-MOS charge redistribution analog-to-digital conversion techniques. I," IEEE Journal of Solid-State Circuits, vol. 10, no. 6, pp. 371-379, 1975.
[30] D. Bankman and B. Murmann, "Passive charge redistribution digital-to-analogue multiplier," Electronics Letters, vol. 51, no. 5, pp. 386-388, 2015.
[31] V. Hariprasath, J. Guerber, S. Lee, and U. Moon, "Merged capacitor switching based SAR ADC with highest switching energy-efficiency," Electronics Letters, vol. 46, no. 9, pp. 620-621, 2010.
[32] Y. Zhu et al., "A 10-bit 100-MS/s Reference-Free SAR ADC in 90 nm CMOS," IEEE Journal of Solid-State Circuits, vol. 45, no. 6, pp. 1111-1121, 2010.
[33] C. Liu, S. Chang, G. Huang, and Y. Lin, "A 10-bit 50-MS/s SAR ADC With a Monotonic Capacitor Switching Procedure," IEEE Journal of Solid-State Circuits, vol. 45, no. 4, pp. 731-740, 2010.
[34] G. Huang, S. Chang, Y. Lin, C. Liu, and C. Huang, "A 10b 200MS/s 0.82mW SAR ADC in 40nm CMOS," in 2013 IEEE Asian Solid-State Circuits Conference (A-SSCC), 2013, pp. 289-292.
[35] A 10-bit 120-MS/s SAR ADC with Compact Architecture and Noise Suppression Technique 哲勳, 郭. (Author). 2014 Aug 22.
[36] C. Liu, C. Kuo, and Y. Lin, "A 10 bit 320 MS/s Low-Cost SAR ADC for IEEE 802.11ac Applications in 20 nm CMOS," IEEE Journal of Solid-State Circuits, vol. 50, no. 11, pp. 2645-2654, 2015.
[37] X. Si et al., "24.5 A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning," in 2019 IEEE International Solid- State Circuits Conference - (ISSCC), 2019, pp. 396-398.
[38] E. H. Lee and S. S. Wong, "24.2 A 2.5GHz 7.7TOPS/W switched-capacitor matrix multiplier with co-designed local memory in 40nm," in 2016 IEEE International Solid-State Circuits Conference (ISSCC), 2016, pp. 418-419.
[39] J. Su et al., "15.2 A 28nm 64Kb Inference-Training Two-Way Transpose Multibit 6T SRAM Compute-in-Memory Macro for AI Edge Chips," in 2020 IEEE International Solid- State Circuits Conference - (ISSCC), 2020, pp. 240-242.

校內：2021-08-31公開
校外：2021-08-31公開

簡易檢索 / 詳目顯示

相關論文