簡易檢索 / 詳目顯示

研究生: 林昱全
Lin, Yu-Chuan
論文名稱: 類比非揮發性記憶體內運算之權重分布與能耗探討:以電阻式記憶體為例
Investigation of Weight Distribution and Energy Consumption in Analog Non-Volatile Memory Neuromorphic Circuits: A Case Study of Resistive Memory
指導教授: 江孟學
Chiang, Meng-Hsueh
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 微電子工程研究所
Institute of Microelectronics
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 75
中文關鍵詞: 電阻式記憶體權重調變深度學習能耗優化PyTorch
外文關鍵詞: Resistive Random Access Memory, Weight modulation, deep learning, Pytorch, Energy consumption optimization
相關次數: 點閱:120下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近期AI發展速度急速上升,隨之而來對於運算資源以及能源的需求也越來越大。為了實現高效能運算,破除逢紐曼架構的記憶體內運算以其高度平行運算的特性成為顯學。而應用在記憶體內運算的非揮發性記憶體電路架構以及其相關的模擬平台逐漸受到大家的重視。
    本研究以電阻式記憶體為例,討論權重調變與運算能耗的相關性,基於現有的交叉式記憶體陣列模擬平台,以2D1S結構預估其能耗,利用量化感知訓練以及剪枝感知訓練來降低能耗。研究結果顯示,透過感知訓練可以在訓練過程中平滑剪枝以及量化對於模型的負面影響。在數字辨識準確率標準97%以上,可以使用量化感知訓練8 bits精度來進行,並搭配剪枝感知訓練在稀疏度80%可以維持97%準確率且降低整體能耗約72.2%。研究最後進行元件本身的討論,實驗結果得出以電阻式記憶體之應用為例,將其電導範圍控制在較小值可以有更好的能耗展現,且不會影響到模型準確率。而元件之開關比越大也會讓模型能耗下降,準確率上升。

    The rapid advancement of AI technology has significantly increased the demand for computational resources and energy. To achieve high-performance computing, overcoming the von Neumann architecture has become essential. In-memory computing, with its highly parallel processing capabilities, has emerged as a promising solution. Consequently, non-volatile memory (NVM) circuit architectures and their corresponding simulation platforms for in-memory computing have garnered considerable attention.
    This study focuses on Resistive Random Access Memory (RRAM) to explore the relationship between weight modulation and computational energy consumption. Utilizing an existing crossbar array simulation platform, we estimate the energy consumption of the 2D1S structure. Techniques such as quantization-aware training (QAT) and pruning-aware training (PAT) are employed to reduce energy consumption.
    The findings demonstrate that perceptual training can mitigate the negative effects of pruning and quantization on the model during the training process. When aiming for a digit recognition accuracy of over 97%, an 8-bit precision quantization-aware training can be used. Coupled with pruning-aware training at an 80% sparsity rate, this approach maintains a 97% accuracy while reducing overall energy consumption for 72.2%.
    In the final part of the study, we discuss the characteristics of the memory components themselves. The experimental results indicate that controlling the conductance range of ReRAM to smaller values can yield better energy efficiency without compromising model accuracy. Additionally, a higher on/off ratio of the memory device leads to reduced energy consumption and improved model accuracy.

    摘要 I Abstract II 誌謝 IV Contents VI Table Captions VII Figure Captions VIII Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 2 1.3 Introduction of Simulation Tools 3 1.4 Overview of the Thesis 3 Chapter 2 Literature Review 4 2.1 Non-volatile Memory 4 2.1.1 Resistive random-access memory (RRAM) 5 2.1.2 Phase change memory (PCM) 7 2.1.3 Flash memory 11 2.2 Cross Bar Memory Array architecture 13 2.2.1 NVM based crossbar array 13 2.2.2 Crossbar circuit architecture 15 2.3 Artificial neural network and Simulation platform 18 2.3.1 Neural network introduction 18 2.3.2 Forward and Backward propagation 21 2.3.3 Simulation platform 24 Chapter 3 Experiment Methodology 25 3.1 Simulation platform for neuromorphic accelerator 25 3.1.1 CIM platform architecture 25 3.1.2 NVM Device pulse and weight equation 28 3.1.3 Hardware training process 31 3.2 Weight modulation 32 3.2.1 Hardware structure implementation and optimization 32 3.2.2 Quantization algorithm 35 3.2.3 Sparsity 36 3.3 Energy consumption 39 3.3.1 Forward Propagation 40 3.3.2 Backward Propagation 41 3.3.3 Weight update energy consumption 41 Chapter 4 Result and Discussion 43 4.1 Real device neural network simulation 43 4.2 Energy consumption analysis 45 4.2.1 Initial energy consumption 45 4.2.2 Quantized energy consumption 48 4.2.3 Pruned energy consumption 52 4.3 Comprehensive performance analysis 58 Chapter 5 Conclusions and Future Work 60 References 62

    [1] Le, Hoang-Hiep, et al. "CIMulator: a comprehensive simulation platform for computing-in-memory circuit macros with low bit-width and real memory materials." arXiv preprint arXiv:2306.14649 (2023).
    [2] Zahoor, F., Azni Zulkifli, T.Z. & Khanday, F.A. Resistive Random Access Memory (RRAM): an Overview of Materials, Switching Mechanism, Performance, Multilevel Cell (mlc) Storage, Modeling, and Applications. Nanoscale Res Lett 15, 90 (2020).
    [3] H. . -S. P. Wong et al., "Metal–Oxide RRAM," in Proceedings of the IEEE, vol. 100, no. 6, pp. 1951-1970, June 2012, doi: 10.1109/JPROC.2012.2190369.
    [4] Wei-Chen, Hung (2019) A deep learning simulation platform for non-volatile memory-based analog neuromorphic circuits
    [5] H. . -S. P. Wong et al., "Phase Change Memory," in Proceedings of the IEEE, vol. 98, no. 12, pp. 2201-2227, Dec. 2010, doi: 10.1109/JPROC.2010.2070050.
    [6] M. Suri et al., "Phase change memory as synapse for ultra-dense neuromorphic systems: Application to complex visual pattern extraction," 2011 International Electron Devices Meeting, Washington, DC, USA, 2011, pp. 4.4.1-4.4.4, doi: 10.1109/IEDM.2011.6131488.
    [7] P. Kumari, B. M. S. B. Talukder, S. Sakib, B. Ray and M. T. Rahman, "Independent detection of recycled flash memory: Challenges and solutions," 2018 IEEE International Symposium on Hardware Oriented Security and Trust (HOST), Washington, DC, USA, 2018, pp. 89-95, doi: 10.1109/HST.2018.8383895.
    [8] G. Campardo et al., “40-mm 3-V-only 50-MHz 64-Mb 2-b/cell CHE NOR flash memory,” IEEE J. Solid-State Circuits, vol. 35, pp. 1655–1667, Nov. 2000
    [9] R. Bez, E. Camerlenghi, A. Modelli and A. Visconti, "Introduction to flash memory," in Proceedings of the IEEE, vol. 91, no. 4, pp. 489-502, April 2003, doi: 10.1109/JPROC.2003.811702.
    [10] H. Mulaosmanovic et al., "Novel ferroelectric FET based synapse for neuromorphic systems," 2017 Symposium on VLSI Technology, Kyoto, Japan, 2017, pp. T176-T177, doi: 10.23919/VLSIT.2017.7998165.
    [11] C. Xu et al., "Overcoming the challenges of crossbar resistive memory architectures," 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), Burlingame, CA, USA, 2015, pp. 476-488, doi: 10.1109/HPCA.2015.7056056.
    [12] Truong Ngoc, Son & Min, Kyeong-Sik. (2014). New Memristor-Based Crossbar Array Architecture with 50-% Area Reduction and 48-% Power Saving for Matrix-Vector Multiplication of Analog Neuromorphic Computing. JSTS:Journal of Semiconductor Technology and Science. 14. 356-363. 10.5573/JSTS.2014.14.3.356.
    [13] Sochacki, James S. "Extending Power Series Methods for the Hodgkin-Huxley Equations, Including Sensitive Dependence." CODEE Journal 13.1 (2020): 2.
    [14] Chi, Ping, et al. "Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory." ACM SIGARCH Computer Architecture News 44.3 (2016): 27-39.
    [15] Liu, Xiaoxiao, et al. "Harmonica: A framework of heterogeneous computing systems with memristor-based neuromorphic computing accelerators." IEEE Transactions on Circuits and Systems I: Regular Papers 63.5 (2016): 617-628.
    [16] Shafiee, Ali, et al. "ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars." ACM SIGARCH Computer Architecture News 44.3 (2016): 14-26.
    [17] Xia, Lixue, et al. "MNSIM: Simulation platform for memristor-based neuromorphic computing system." IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37.5 (2017): 1009-1022.
    [18] Chen, Pai-Yu, Xiaochen Peng, and Shimeng Yu. "NeuroSim: A circuit-level macro model for benchmarking neuro-inspired architectures in online learning." IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37.12 (2018): 3067-3080.
    [19] T. P. Xiao et al. CrossSim: Accuracy Simulation of Analog In-Memory Computing. Accessed: Jul. 13. [Online]. Available:https://github.com/sandialabs/cross-sim
    [20] Peng, Xiaochen, et al. "DNN+ NeuroSim V2. 0: An end-to-end benchmarking framework for compute-in-memory accelerators for on-chip training." IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 40.11 (2020): 2306-2319.
    [21] Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." International conference on machine learning. pmlr, 2015..

    [22] R. Gonzalez and M. Horowitz, "Energy dissipation in general purpose microprocessors," in IEEE Journal of Solid-State Circuits, vol. 31, no. 9, pp. 1277-1284, Sept. 1996, doi: 10.1109/4.535411.

    [23] Liang, Jiale, et al. "Effect of wordline/bitline scaling on the performance, energy consumption, and reliability of cross-point memory array." ACM Journal on Emerging Technologies in Computing Systems (JETC) 9.1 (2013): 1-14.
    [24] T. Sakurai and K. Tamaru, "Simple formulas for two- and three-dimensional capacitances," in IEEE Transactions on Electron Devices, vol. 30, no. 2, pp. 183-185, Feb. 1983, doi: 10.1109/T-ED.1983.21093.
    [25] Pei-En, Lin (2021) Capacitive-Resistive Switching Characteristics and Synaptic Functions of Double Dielectric Layered Devices.

    無法下載圖示 校內:2027-06-25公開
    校外:2027-06-25公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE