簡易檢索 / 詳目顯示

研究生: 林柏翰
Lin, Po-Han
論文名稱: 一個基於八位元連續漸進逼近式類比數位轉換器且操作於一億赫茲之混訊神經網路加速器
An 8-bit 100-MHz SAR ADC-Based Mixed-Signal Accelerator for Neural Networks
指導教授: 張順志
Chang, Soon-Jyh
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 109
中文關鍵詞: 混訊加速器類比運算乘加累積神經網路連續漸進逼近式類比數位轉換器
外文關鍵詞: mixed-signal accelerator, analog computation, multiply-accumulate (MAC), neural networks, successive-approximation register (SAR), analog-to-digital converter (ADC)
相關次數: 點閱:118下載:19
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文提出一個基於八位元連續漸進逼近式類比數位轉換器且操作於一億赫茲之混訊神經網路加速器。除了量化神經網路中的激活(activation) 跟權重(weight)之外,此加速器採用類比式運算進一步降低神經網路中算術運算所需的能量。此外,為了提升神經網路的top-1準確率,本論文提出一個執行乘加累積運算的5階段切換方式去減少動態偏移誤差(dynamic offset)。最後,為了將類比的乘積累加運算結果量化成數位輸出碼,一個連續漸進逼近式類比數位轉換器被整合進加速器中。
    本設計以台積電40奈米CMOS標準1P9M製程實作測試晶片。晶片面積佔 2.613 mm2,其中核心電路占整體的25.8%。在0.9伏特電源供電且一億赫茲時脈下,本設計在MNIST 與CIFAR10資料庫的top1準確率可分別達99.3%與86%,而能量效率為3.3TOPS/W,將每一算術運算能量除以類比數位轉換器的輸出階層數做為指標下,加速器在此操作情況下可達1.18 fJ/step,將加速器用0.7伏特電源供電且操作於八千萬赫茲時脈下,可進一步提升能量效率與指標,在此操作條件下,MNIST 與CIFAR10資料庫的top1準確率可分別維持在99.3%與85.3%,而能量效率與指標則分別為6.34 TOPS/W 與 0.62fJ/step。

    This thesis presents an 8-bit 100-MHz SAR ADC-based mixed-signal accelerator for neural networks. In addition to quantizing the activations and weights in neural networks, the analog computation is adopted in the accelerator to further reduce the energy consumption per arithmetic operation. Moreover, in order to enhance the top-1 accuracies of neural networks, a 5-phase switching scheme, which performs the multiply-accumulate (MAC) operation, is proposed to mitigate the dynamic offset. Last but not least, a successive-approximation register (SAR) analog-to-digital converter (ADC) is incorporated into the proposed accelerator to quantize the analog multiply-accumulate signal into the digital output code.
    The proof-of-concept prototype was fabricated in TSMC 40-nm CMOS standard 1P9M process, where the chip occupies 2.613 mm2, and the core circuit accounts for 25.8% of the total area. With 100-MHz clock frequency and 0.9-V supply voltage, the design achieves the top-1 accuracies of 99.3% and 87.3% on MNIST and CIFAR10 datasets, respectively. In addition, the energy efficiency of 3.3TOPS/W is attained, and the figure of merit (FOM), i.e. the energy consumption per arithmetic operation normalized to the quantization steps of the ADC output, is 1.18 fJ/step. To achieve better energy efficiency and FOM, the prototype is operated with 80-MHz clock frequency and 0.7-V supply voltage. In this case, the top-1 accuracies on MNIST and CIFAR10 datasets are 99.3% and 86%, respectively. The energy efficiency and FOM are 6.34 TOPS/W and 0.62fJ/step, respectively.

    Table of Contents 摘 要 I Abstract II List of Tables IX List of Figures X Chapter 1 Introduction 1 1.1 Background and Motivation 1 1.2 Thesis Organization 5 Chapter 2 Basics of Neural Networks 6 2.1 Basics of Deep Neural Network (DNN) 7 2.2 Overview of CNN 10 2.2.1 Basics of CNN 10 2.2.2 Non-Linear Activation Function 13 2.2.3 Pooling Function 15 2.2.4 Normalization Function 16 2.3 Quantization of Neural Networks 17 2.3.1 Linear Quantization 18 2.3.2 Non-Linear Quantization 20 2.3.3 Quantization-Aware Training Technique[24] 22 Chapter 3 Fundamentals of SAR ADC 24 3.1 Building Block of SAR ADC 25 3.1.1 Behavioral Operation of SAR ADC 26 3.1.2 Circuit-level Operation of SAR ADC 29 3.2 Quantization Error 32 3.3 Static Specifications 35 3.3.1 Offset Error 35 3.3.2 Gain error 36 3.3.3 Nonlinearity 38 3.3.3.1 Differential nonlinearity error 38 3.3.3.2 Integral Nonlinearity 40 3.4 Dynamic Specifications 43 3.4.1 Spurious-Free Dynamic Range 43 3.4.2 Signal-to-Noise Ratio 44 3.4.3 Signal-to-Noise and Distortion Ratio 46 3.4.4 Total Harmonic Distortion 47 3.4.5 Effective Number of Bits 48 3.4.6 Effective Resolution Bandwidth 48 3.4.7 Figure of Merit 49 Chapter 4 An 8-bit 100-MHz SAR ADC-Based Accelerator for Neural Networks 50 4.1 Introduction of Overall Architecture 51 4.2 Multiply-Accumulate Unit 52 4.2.1 Passive Digital-to-Analog Multiplier Circuit [15] [30] 53 4.2.2 Systematic Offsets from Parasitic Capacitances 56 4.2.3 Proposed 5-phase Switching Scheme 61 4.3 Adopted Techniques of SAR ADC 66 4.3.1 Merged Capacitor Switching Method [31] 66 4.3.2 Direct Switching technique [34] and Compact Combinational Timing Control [35] 69 4.4 Circuit Realization 71 4.4.1 Phase Generator 71 4.4.2 MAC Control Logic 73 4.4.3 Dynamic Comparator 75 4.4.4 Capacitive DAC 77 Chapter 5 Simulation and Measurement Results 81 5.1 Layout and Chip Floor Plan 81 5.2 Simulation Results 85 5.3 Design Consideration for PCB 92 5.4 Die Micrograph and Measurement Setup 95 5.5 Measurement Results 97 Chapter 6 Conclusions and Future Works 103 Bibliography 106

    [1] D. Silver et al., “Mastering the game of Go with deep neural networks and tree search.” Nature, 529(7587), pp. 484-489, 2016.
    [2] K. He et al., “Deep Residual Learning for Image Recognition.” CVPR. arXiv preprint arXiv:1512.03385, 2016.
    [3] Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Quantized neural networks: Training neural networks with low precision weights and activations. arXiv preprint:1609.07061, 2016.
    [4] Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint:1606.06160, 2016.
    [5] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi. XNOR-Net: Imagenet classification using binary convolutional neural networks. arXiv preprint arXiv:1603.05279, 2016. 1, 2.
    [6] I. Hubara, et al., “Binarized Neural Networks.” NIPS. arXiv preprint arXiv:1602.02505, 2016.
    [7] Q. Dong et al., "15.3 A 351TOPS/W and 372.4GOPS Compute-in-Memory SRAM Macro in 7nm FinFET CMOS for Machine-Learning Applications," in 2020 IEEE International Solid- State Circuits Conference - (ISSCC), 2020, pp. 242-244.
    [8] E. A. Vittoz, "Future of analog in the VLSI environment," in IEEE International Symposium on Circuits and Systems, 1990, pp. 1372-1375 vol.2.
    [9] B. E. Boser, E. Sackinger, J. Bromley, Y. LeCun, R. E. Howard, and L. D. Jackel, "An analog neural network processor and its application to high-speed character recognition," in IJCNN-91-Seattle International Joint Conference on Neural Networks, 1991, vol. i, pp. 415-420 vol.1.
    [10] P. Masa et al., "10 mW CMOS retina and classifier for handheld, 1000 images/s optical character recognition system," in 1999 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC. First Edition (Cat. No.99CH36278), 1999, pp. 204-205.
    [11] J. Lu, S. Young, I. Arel, and J. Holleman, "30.10 A 1TOPS/W analog deep machine-learning engine with floating-gate storage in 0.13μm CMOS," in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014, pp. 504-505.
    [12] C. Xue et al., "15.4 A 22nm 2Mb ReRAM Compute-in-Memory Macro with 121-28TOPS/W for Multibit MAC Computing for Tiny AI Edge Devices," in 2020 IEEE International Solid- State Circuits Conference - (ISSCC), 2020, pp. 244-246.
    [13] X. Si et al., "15.5 A 28nm 64Kb 6T SRAM Computing-in-Memory Macro with 8b MAC Operation for AI Edge Chips," in 2020 IEEE International Solid- State Circuits Conference - (ISSCC), 2020, pp. 246-248.
    [14] K. Watanabe and G. Temes, "A switched-capacitor multiplier/divider with digital and analog outputs," IEEE Transactions on Circuits and Systems, vol. 31, no. 9, pp. 796-800, 1984.
    [15] D. Bankman and B. Murmann, "An 8-bit, 16 input, 3.2 pJ/op switched-capacitor dot product circuit in 28-nm FDSOI CMOS," in 2016 IEEE Asian Solid-State Circuits Conference (A-SSCC), 2016, pp. 21-24.
    [16] X Glorot, A Bordes, Y Bengio,"Deep Sparse Rectifier Neural Networks", in Proceedings of the 14th International Conference on Artifical Intekigence and Statistics 2011, Fort Lauderdale, FL, USA(2011).
    [17] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in ICML,2013.
    [18] D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs),” ICLR, 2016.
    [19] X. Zhang, J. Trmal, D. Povey, and S. Khudanpur, "Improving deep neural network acoustic models using generalized maxout networks," in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 215-219.
    [20] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in ICML, 2015.
    [21] M. Yufei, N. Suda, C. Yu, J. Seo, and S. Vrudhula, "Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA," in 2016 26th International Conference on Field Programmable Logic and Applications (FPL), 2016, pp. 1-8.
    [22] E. H. Lee, D. Miyashita, E. Chai, B. Murmann, and S. S. Wong, "LogNet: Energy-efficient neural networks using logarithmic computation," in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 5900-5904.
    [23] H. M. S. Han, and W. J. Dally, “Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding,” in ICLR, 2016.
    [24] R. Krishnamoorthi, "Quantizing deep convolutional networks for efficient inference: A whitepaper", CoRR, vol. abs/1806.08342, 2018.
    [25] A. Y. A. Zhou, Y. Guo, L. Xu, and Y. Chen, “Incremental Network Quantization: Towards Lossless CNNs with Lowprecision Weights,” in ICLR, 2017.
    [26] M. Horowitz, "1.1 Computing's energy problem (and what we can do about it)," in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014, pp. 10-14.
    [27] V. Sze, Y. Chen, T. Yang, and J. S. Emer, "Efficient Processing of Deep Neural Networks: A Tutorial and Survey," Proceedings of the IEEE, vol. 105, no. 12, pp. 2295-2329, 2017.
    [28] B. Murmann, “ADC Performance Survey 1997-2018,” [Online]. Available: http://web.stanford.edu/~murmann/adcsurvey.html.
    [29] J. L. McCreary and P. R. Gray, "All-MOS charge redistribution analog-to-digital conversion techniques. I," IEEE Journal of Solid-State Circuits, vol. 10, no. 6, pp. 371-379, 1975.
    [30] D. Bankman and B. Murmann, "Passive charge redistribution digital-to-analogue multiplier," Electronics Letters, vol. 51, no. 5, pp. 386-388, 2015.
    [31] V. Hariprasath, J. Guerber, S. Lee, and U. Moon, "Merged capacitor switching based SAR ADC with highest switching energy-efficiency," Electronics Letters, vol. 46, no. 9, pp. 620-621, 2010.
    [32] Y. Zhu et al., "A 10-bit 100-MS/s Reference-Free SAR ADC in 90 nm CMOS," IEEE Journal of Solid-State Circuits, vol. 45, no. 6, pp. 1111-1121, 2010.
    [33] C. Liu, S. Chang, G. Huang, and Y. Lin, "A 10-bit 50-MS/s SAR ADC With a Monotonic Capacitor Switching Procedure," IEEE Journal of Solid-State Circuits, vol. 45, no. 4, pp. 731-740, 2010.
    [34] G. Huang, S. Chang, Y. Lin, C. Liu, and C. Huang, "A 10b 200MS/s 0.82mW SAR ADC in 40nm CMOS," in 2013 IEEE Asian Solid-State Circuits Conference (A-SSCC), 2013, pp. 289-292.
    [35] A 10-bit 120-MS/s SAR ADC with Compact Architecture and Noise Suppression Technique 哲勳, 郭. (Author). 2014 Aug 22.
    [36] C. Liu, C. Kuo, and Y. Lin, "A 10 bit 320 MS/s Low-Cost SAR ADC for IEEE 802.11ac Applications in 20 nm CMOS," IEEE Journal of Solid-State Circuits, vol. 50, no. 11, pp. 2645-2654, 2015.
    [37] X. Si et al., "24.5 A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning," in 2019 IEEE International Solid- State Circuits Conference - (ISSCC), 2019, pp. 396-398.
    [38] E. H. Lee and S. S. Wong, "24.2 A 2.5GHz 7.7TOPS/W switched-capacitor matrix multiplier with co-designed local memory in 40nm," in 2016 IEEE International Solid-State Circuits Conference (ISSCC), 2016, pp. 418-419.
    [39] J. Su et al., "15.2 A 28nm 64Kb Inference-Training Two-Way Transpose Multibit 6T SRAM Compute-in-Memory Macro for AI Edge Chips," in 2020 IEEE International Solid- State Circuits Conference - (ISSCC), 2020, pp. 240-242.

    下載圖示 校內:2021-08-31公開
    校外:2021-08-31公開
    QR CODE