| 研究生: |
林育緯 Lin, Yu-Wei |
|---|---|
| 論文名稱: |
可應用於高能量效益多位元卷積神經網路的邊緣裝置之8T SRAM記憶體內運算 An 8T SRAM-based Compute-In-Memory for Highly Energy-Efficient Multiple-Bit Convolutional Neural Network Edge Devices |
| 指導教授: |
邱瀝毅
Chiou, Lih-Yih |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 中文 |
| 論文頁數: | 82 |
| 中文關鍵詞: | 人工智慧 、卷積神經網路 、記憶體內運算 、權重位元切割累加 、時間調變 |
| 外文關鍵詞: | Artificial intelligence (AI), convolutional neural networks (CNNs), compute-in-memory (CIM), weight bit-partitioned accumulation (WBPA), time modulation |
| 相關次數: | 點閱:71 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,人工智慧的高速發展與廣泛的應用,尤其以卷積神經網路為代表,像是影像辨識、物件辨識與語言翻譯等等,為了將人工智慧實現於邊緣裝置與物聯網裝置上,達到低能量消耗、低延遲與高安全性,因此發展出了各式的特殊應用積體電路,即人工智慧加速器。與傳統數位加速器相比,記憶體內運算為一種新興的AI加速器計算單元,將輸入資料與權重在記憶體電路中,直接進行乘法與累加的運算,將能避免傳統數位加速器過多資料搬移的現象,即記憶體牆之問題,因此大幅降低功率的消耗與計算的延遲時間。
本論文為採用40奈米製程設計,提出具有高能量效益與支援多位元乘加運算的8T SRAM記憶體內運算,利用權重位元切割累加技術,降低整體多位元計算時的量化誤差與輸出級對於單一高解析度資料轉換器的需求,避免過大的功率消耗;使用時間調變技術,降低數位類比轉換器的使用數量,從n×n個降低為一個,因此對於面積與功率消耗皆能獲得改善。與過去相關文獻相比,本論文提出的記憶體內運算,在整體電路的能量效益、操作速度與運算精準度均有較佳的比較性與競爭優勢。
In recent years, with fast growing amount of artificial intelligence (AI) applications, especially, convolutional neural networks (CNNs) have successfully implemented to image recognition, object recognition and language translation etc. To extend AI at the terminal side (referred to as edge) of internet of things (IoTs), it is required to achieve low energy consumption, low latency and high security. The application-specific integrated circuit (ASIC), i.e. AI accelerators, have been developed for edge devices. Compute-in-memory (CIM) is one of emerging computation paradigms. When compared with digital AI accelerators, CIM allows the operation of multiply-and-accumulate (MAC) operations inside memory. Therefore, CIM can help to mitigate data congestion between the processor and memory, i.e. memory wall issue. The computation paradigm promises less power consumption and latency than its counter part, von Neumann architectures.
In this thesis, we present a highly energy-efficient 8T SRAM-based CIM to support multiple-bit MAC operations, which use a weight bit-partition accumulation (WBPA) method to decrease quantization errors of multiple-bit MACs operation and reduce the output data converter overhead while maintaining precision. Moreover, for reducing the number of digital-to analog converter (DAC) from n×n to 1, the time modulation mechanism applies the proposed CIM to greatly improve area and power consumption. The whole circuit was fabricated as a test chip using 40-nm technology. When compared with the typical SRAM-based CIM, the proposed CIM is more competitive in energy-efficient, inference time and output precision.
[1] Y. LeCun, L. Jackel, B. Boser, J. Denker, H. Graf, I. Guyon, et al., “Handwritten Digit Recognition: Applications of Neural Network Chips and Automatic Learning,” IEEE Communications Magazine, vol. 27, no. November, pp. 41–46, 1989.
[2] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, et al., “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision, pp. 248–255, 2015.
[3] A. Krizhevsky, et al., “ImageNet Classification with Deep Convolutional Neural Networks,” Proceedings of the 25th International Conference on Neural Information Processing Systems, 2012, pp. 1097–1105.
[4] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
[5] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, et al., “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” 2017. [Online]. Available: http://arxiv.org/abs/1704.04861.
[6] K. L. Loh, “Fertilizing AIoT from Roots to Leaves,” in Proc. IEEE International Solid- State Circuits Conference - (ISSCC), 2020, pp. 15–21.
[7] FRAMINGHAM, and Mass, “The Growth in Connected IoT Devices Is Expected to Generate 79.4ZB of Data in 2025, According to a New IDC ForecastTitle,” IDC, 2019. .
[8] J. C. Talwana, and H. J. Hua, “Smart World of Internet of Things (IoT) and Its Security Concerns,” in Proc. IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), 2016, pp. 240–245.
[9] M. Gusev, and S. Dustdar, “Going back to the roots - The evolution of edge computing, an IoT perspective,” IEEE Internet Computing, vol. 22, no. 2, pp. 5–15, 2018.
[10] H. J. Yoo, “Intelligence on Silicon: From Deep-Neural-Network Accelerators to Brain Mimicking AI-SoCs,” in Proc. IEEE International Solid-State Circuits Conference, 2019, pp. 20–26.
[11] Y. H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks,” IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127–138, 2017.
[12] B. Moons, R. Uytterhoeven, W. Dehaene, and M. Verhelst, “Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI,” in Proc. IEEE International Solid-State Circuits Conference, 2017, vol. 60, pp. 246–247.
[13] Y. H. Chen, J. Emer, and V. Sze, “Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks,” in Proc. 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), 2016, pp. 367–379.
[14] S. Park, K. Bong, D. Shin, J. Lee, S. Choi, and H. Yoo, “A 1.93TOPS/W Scalable Deep Learning /Inference Processor with Tetra-Parallel MIMD Architecture for Big-Data Applications,” in Proc. 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers, 2015, pp. 80–82.
[15] Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, et al., “ShiDianNao: Shifting vision processing closer to the sensor,” Proceedings - International Symposium on Computer Architecture, vol. 13-17-June, pp. 92–104, 2015.
[16] K. Simonyan, and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. 3rd International Conference on Learning Representations, ICLR 2015, 2015, pp. 1–14.
[17] K. T. Tang, W. C. Wei, Z. W. Yeh, T. H. Hsu, Y. C. Chiu, C. X. Xue, et al., “Considerations of Integrating Computing-In-Memory and Processing-In-Sensor into Convolutional Neural Network Accelerators for Low-Power Edge Devices,” in Proc. IEEE Symposium on VLSI Circuits, Digest of Technical Papers, 2019, vol. 2019-June, pp. T166–T167.
[18] W. H. Chen, K. X. Li, W. Y. Lin, K. H. Hsu, P. Y. Li, C. H. Yang, et al., “A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors,” in Proc. IEEE International Solid-State Circuits Conference, 2018, vol. 61, pp. 494–496.
[19] W. S. Khwa, J. J. Chen, J. F. Li, X. Si, E. Y. Yang, X. Sun, et al., “A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3ns and 55.8TOPS/W fully parallel product-sum operation for binary DNN edge processors,” in Proc. IEEE International Solid-State Circuits Conference, 2018, vol. 61, pp. 496–498.
[20] X.Si, J. J. Chen, Y. N. Tu, W. H. Huang, J. H. Wang, Y. C. Chiu, et al., “A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning,” in Proc. IEEE International Solid-State Circuits Conference, 2019, pp. 396–398.
[21] M. Kang, S. K. Gonugondla, S. Lim, and N. R. Shanbhag, “A 19.4-nJ/Decision, 364-K Decisions/s, In-Memory Random Forest Multi-Class Inference Accelerator,” in Proc. IEEE Journal of Solid-State Circuits, 2018, vol. 53, no. 7, pp. 2126–2135.
[22] Q. Dong, M. E. Sinangil, B. Erbagci, D. Sun, W. Khwa, H. Liao, et al., “A 351TOPS/W and 372.4GOPS Compute-in-Memory SRAM Macro in 7nm FinFET CMOS for Machine-Learning Applications,” in Proc. IEEE International Solid- State Circuits Conference - (ISSCC), 2020, pp. 488–489.
[23] A. Biswas, and A. P. Chandrakasan, “Conv-RAM: An Energy-Efficient SRAM with Embedded Convolution Computation for Low-Power CNN-Based Machine Learning Applications,” in Proc. IEEE International Solid-State Circuits Conference, 2018, vol. 61, pp. 488–490.
[24] C. X. Xue, W. H. Chen, J. S. Liu, J. F. Li, W. Y. Lin, W. E. Lin, et al., “A 1Mb Multibit ReRAM Computing-In-Memory Macro with 14.6ns Parallel MAC Computing Time for CNN Based AI Edge Processors,” in Proc. IEEE International Solid-State Circuits Conference, 2019, pp. 388–390.
[25] R. Guo, Y. Liu, S. Zheng, S.-Y. Wu, P. Ouyang, W. S. Khwa, et al., “A 5.1pJ/Neuron 127.3us/Inference RNN-based Speech Recognition Processor using 16 Computing-in-Memory SRAM Macros in 65nm CMOS,” in Proc. 2019 Symposium on VLSI Circuits Digest of Technical Papers, 2019, pp. 6351–6355.
[26] J. Yue, Z. Yuan, X. Feng, Y. He, Z. Zhang, X. Si, et al., “A 65nm Computing-in-Memory-Based CNN Processor with 2.9-to-35.8TOPS/W System Energy Efficiency Using Dynamic-Sparsity Performance-Scaling Architecture and Energy-Efficient Inter/Intra-Macro Data Reuse,” in Proc. IEEE International Solid-State Circuits Conference, 2020, pp. 234–236.
[27] S. K. Gonugondla, M. Kang, and N. Shanbhag, “A 42pJ/Decision 3.12TOPS/W Robust In-Memory Machine Learning Classifier with On-Chip Training,” IEEE International Solid-State Circuits Conference, vol. 61, pp. 490–492, 2018.
[28] J. W. Su, X. Si, Y. C. Chou, T. W. Chang, W. H. Huang, Y. N. Tu, et al., “A 28nm 64Kb Inference-Training Two-Way Transpose Multibit 6T SRAM Compute-in-Memory Macro for AI Edge Chips,” in Proc. IEEE International Solid-State Circuits Conference, 2020, pp. 240–242.
[29] W. Chen, W. Lin, L. Lai, S. Li, C. Hsu, H. Lin, et al., “A 16Mb Dual-Mode ReRAM Macro with Sub-14ns Computing-In-Memory and Memory Functions Enabled by Self-Write Termination Scheme,” in Proc. IEEE International Electron Devices Meeting (IEDM), 2017, pp. 657–660.
[30] I. Giannopoulos, A. Sebastian, M. LeGallo, V. P. Jonnalagadda, M. Sousa, M. N. Boon, et al., “8-bit Precision In-Memory Multiplication with Projected Phase-Change Memory,” in Proc. IEEE International Electron Devices Meeting (IEDM), 2018, pp. 27.7.1-27.7.4.
[31] H. Farkhani, M. Tohidi, S. Farkhani, J. K. Madsen, and F. Moradi, “A Low-Power High-Speed Spintronics-Based Neuromorphic Computing System Using Real-Time Tracking Method,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 8, no. 3, pp. 627–638, 2018.
[32] Y. Shi, S. Oh, Z. Huang, S. Member, X. Lu, S. H. Kang, et al., “Performance Prospects of Deeply Scaled Spin-transfer Torque Magnetic Random-access Memory for In-memory Computing,” in Proc. IEEE Electron Device Letters, 2020, vol. 3106, no. c, pp. 1–4.
[33] J. Zhang, Z. Wang, and N. Verma, “In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array,” IEEE Journal of Solid-State Circuits, vol. 52, no. 4, pp. 915–924, 2017.
[34] A. Jaiswal, I. Chakraborty, A. Agrawal, and K. Roy, “8T SRAM Cell as a Multibit Dot-Product Engine for beyond Von Neumann Computing,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 27, no. 11, pp. 2556–2567, 2019.
[35] X. Si, H. Qian, M. F. Chang, C. X. Xue, J. W. Su, Z. Zhang, et al., “Circuit Design Challenges in Computing-in-Memory for AI Edge Devices,” in Proc. 2019 IEEE 13th International Conference on ASIC (ASICON), 2019, pp. 1–4.
[36] J. M. Hung, X. Li, J. Wu, and M. F. Chang, “Challenges and Trends inDeveloping Nonvolatile Memory-Enabled Computing Chips for Intelligent Edge Devices,” IEEE Transactions on Electron Devices, vol. 67, no. 4, pp. 1444–1453, 2020.
[37] Z. Jiang, S. Yin, M. Seok, and J. S. Seo, “XNOR-SRAM: In-memory computing SRAM macro for binary/ternary deep neural networks,” in Proc. Digest of Technical Papers - Symposium on VLSI Technology, 2018, pp. 173–174.
[38] M. Kang, S. K. Gonugondla, A. Patil, and N. R. Shanbhag, “A Multi-Functional In-Memory Inference Processor Using a Standard 6T SRAM Array,” IEEE Journal of Solid-State Circuits, vol. 53, no. 2, pp. 642–655, 2018.
[39] K. Ohhata, “A 2.3-mW, 1-GHz, 8-Bit Fully Time-Based Two-Step ADC Using a High-Linearity Dynamic VTC,” IEEE Journal of Solid-State Circuits, vol. 54, no. 7, pp. 2038–2048, 2019.
[40] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “XNOR-net: Imagenet classification using binary convolutional neural networks,” Lecture Notes in Computer Science, vol. 9908 LNCS, pp. 525–542, 2016.
[41] Z. Cao, S. Yan, and Y. Li, “A 32mW 1.25GS/s 6b 2b/step SAR ADC in 0.13μm CMOS,” in Proc. IEEE International Solid-State Circuits Conference, 2008, vol. 51, pp. 542–544.
[42] B. Razavi, “TSPC Logic [A Circuit for All Seasons],” IEEE Solid-State Circuits Magazine, vol. 8, no. 4, IEEE, pp. 10–13, 2016.
[43] T. E. Rahkonen, and J. T. Kostamovaara, “The Use of Stabilized CMOS Delay Lines for the Digitization of Short Time Intervals,” IEEE Journal of Solid-State Circuits, vol. 28, no. 8, pp. 887–894, 1993.
[44] J. F. Lin, Y. T. Hwang, C. S. Wong, and M. H. Sheu, “Single-ended structure sense-amplifier-based flip-flop for low-power systems,” Electronics Letters, vol. 51, no. 1, pp. 20–21, 2015.
[45] D. J. White, P. E. William, M. W. Hoffman, and S. Balkir, “Low-power analog processing for sensing applications: Low-frequency harmonic signal classification,” Sensors (Switzerland), vol. 13, no. 8, pp. 9604–9623, 2013.
[46] J. Yang, Y. Kong, Z. Wang, Y. Liu, B. Wang, and S. Yin, “Sandwich-RAM: An Energy-Efficient In-Memory BWN Architecture with Pulse-Width Modulation,” in Proc. IEEE International Solid- State Circuits Conference - (ISSCC), 2019, pp. 394–396.
校內:2025-08-25公開