簡易檢索 / 詳目顯示

研究生: 温承達
Wen, Cheng-Da
論文名稱: 考量靜態及動態變異之容錯電阻式記憶體神經形態運算
Fault Tolerant RRAM-Based Neuromorphic Computing with Static and Dynamic Variations
指導教授: 林英超
Lin, Ing-Chao
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 56
中文關鍵詞: 可變電阻式記憶體可靠性神經網路電阻變化記憶體內運算
外文關鍵詞: Resistive Random Access Memory (RRAM), Reliability, Neural Network, Resistance Variation, In-Memory Computing
相關次數: 點閱:191下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   新興的可變電阻式記憶體已顯示出記憶體內處理能力的巨大潛力,因此在加速 諸如神經網路和神經型態運算之類的記憶體密集型應用方面吸引了相當多的研究興趣。但是,由於可變電阻式記憶體之單元電阻固有的變化差異,基於可變電阻式記憶體的神經網路計算的準確性可能會大大降低。

      在這篇論文中,我們提出一個協同的演算法架構容錯框架,以全面性解決單元電阻變化差異的問題。具體來說,我們考慮基於可變電阻式記憶體運算的三種主要錯誤類型:非線性電阻分布、靜態變化及動態變化。從演算法層面來看,我們提出一種電阻感知的量化方法,以強制神經網路參數遵循可變電阻式記憶體的精確非線性電阻分布,並引入一種輸入調節技術來補償可變電阻式記憶體的電阻變化。我們還提出了一種選擇性的權重更新方案,以解決運行時發生的動態變化問題。從架構層面來看,我們相應地提出一種通用且低成本的架構,以支持我們的容錯技術。實驗結果顯示,我們提出的三種容錯演算法幾乎沒有準確度損失,並且所提出的架構 產生的性能負擔僅為7.14%。

    Emerging Resistive Random Access Memory (RRAM) has shown great potential for in-memory processing power, so they have attracted considerable research interests in accelerating memory-intensive applications such as neural networks and neuromorphic computing. However, due to the inherent variation of the resistance of RRAM cells, the accuracy of neural network calculations based on RRAM may be greatly reduced.

    In this thesis, we propose SIGHT, a SynergIstic alGorithm-arcHitecture fault-Tolerant framework, to holistically address the problem of the resistance variation of RRAM cells. Specifically, we consider three main types of faults based on RRAM computing: non-linear resistance distribution, static variation, and dynamic variation. From the algorithm level, we propose a resistance-aware quantization to adjust the neural network parameters to follow the precise non-linear resistance distribution of RRAM, and introduce an input regulation technique to compensate the resistance variations of RRAM. We also propose a selective weight refreshing scheme to address the dynamic variation issue that occurs at run-time. From the architecture level, we accordingly propose a general and low-cost architecture to support our fault-tolerant scheme. Experimental results show that the three fault-tolerant algorithms we proposed have almost no accuracy loss, and the performance overhead of our proposed SIGHT architecture is only 7.14%.

    摘要i Abstract ii Table of Contents iii List of Tables vi List of Figures vii Chapter 1. Introduction 1 1.1 Contributions 3 Chapter 2. Preliminaries 6 2.1 Neural Network Basis 6 2.2 RRAM Basis 8 2.3 RRAM-based Computing System 10 2.4 Related Work 11 Chapter 3. RRAM Faults Modeling 17 3.1 Non-linear Resistance Distribution 17 3.2 Static Variation 19 3.3 Dynamic Variation 21 Chapter 4. Fault-Tolerant Scheme 23 4.1 Overview 23 4.2 Resistance-Aware Quantization 24 4.3 Input Regulation 26 4.4 Selective Weight Restoration 29 Chapter 5. Architecture Design 31 5.1 Overview 31 5.2 Hardware Design 32 5.2.1 RRAM-based PE 32 5.2.2 Input Regulation Unit 33 5.2.3 Refreshing Scheduler 33 5.2.4 Interconnection & Controller 34 5.3 Execution Flow 34 5.3.1 Mapping 34 5.3.2 Execution 35 5.3.3 Refreshing 35 Chapter 6. Evaluation 37 6.1 Methodology 37 6.1.1 Evaluation Tools 37 6.1.2 Accelerator Configuration 38 6.1.3 Benchmarks 39 6.2 Accuracy Results 39 6.2.1 Resistance-Aware Quantization 39 6.2.2 Input Regulation 40 6.2.3 Run-Time Weight Restoration 42 6.3 Hardware Results 45 6.3.1 Performance 45 6.3.2 Area/Power Analysis 47 6.3.3 Sensitivity Study 48 Chapter 7. Conclusion 50 References 51

    [1] Fabien Alibart, Ligang Gao, Brian D Hoskins, and Dmitri B Strukov. High precision tuning of state for memristive devices by adaptable variation-tolerant algorithm. Nanotechnology, 23(7):075201, 2012.
    [2] Stefano Ambrogio, Simone Balatti, Zhong Qiang Wang, Yu-Sheng Chen, Heng-Yuan Lee, Frederick T Chen, and Daniele Ielmini. Data retention statistics and modelling in hfo 2 resistive switching memories. In 2015 IEEE International Reliability Physics Symposium, pages MY–7. IEEE, 2015.
    [3] Hagan Demuth Beale, Howard B Demuth, and MT Hagan. Neural network design. Pws, Boston, 1996.
    [4] GW Burr, P Narayanan, RM Shelby, Severin Sidler, Irem Boybat, Carmelo di Nolfo, and Yusuf Leblebici. Large-scale neural networks implemented with non-volatile memory as the synaptic weight element: Comparative performance analysis (accuracy, speed, and power). In 2015 IEEE International Electron Devices Meeting (IEDM), pages 4–4. IEEE, 2015.
    [5] Carlo Cagli, Daniele Ielmini, Federico Nardi, and ANDREA LEONARDO Lacaita. Evidence for threshold switching in the set process of nio-based rram and physical modeling for set, reset, retention and disturb prediction. In 2008 IEEE International Electron Devices Meeting, pages 1–4. IEEE, 2008.
    [6] Yi Cai, Yujun Lin, Lixue Xia, Xiaoming Chen, Song Han, Yu Wang, and Huazhong Yang. Long live time: improving lifetime for training-in-memory engines by structured gradient sparsification. In 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), pages 1–6. IEEE, 2018.
    [7] Ching-Yi Chen, Hsiu-Chuan Shih, Cheng-Wen Wu, Chih-He Lin, Pi-Feng Chiu, Shyh-Shyuan Sheu, and Frederick T Chen. Rram defect modeling and failure analysis based on march test and a novel squeeze-search scheme. IEEE Transactions on Computers, 64(1):180–190, 2014.
    [8] Lerong Chen, Jiawen Li, Yiran Chen, Qiuping Deng, Jiyuan Shen, Xiaoyao Liang, and Li Jiang. Accelerator friendly neural-network training: Learning variations and defects in rram crossbar. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017, pages 19–24. IEEE, 2017.
    [9] Yang Yin Chen, Ludovic Goux, Sergiu Clima, Bogdan Govoreanu, Robin Degraeve, Gouri Sankar Kar, Andrea Fantini, Guido Groeseneken, Dirk J Wouters, and Malgorzata Jurczak. Endurance/retention trade-off on hfo2 metal cap 1t1r bipolar rram. IEEE Transactions on electron devices, 60(3):1114–1121, 2013.
    [10] Yang Yin Chen, Masanori Komura, Robin Degraeve, Bogdan Govoreanu, Ludovic Goux, Andrea Fantini, Naga Raghavan, Sergiu Clima, Leqi Zhang, Attilio Belmonte, et al. Improvement of data retention in hfo 2/hf 1t1r rram cell under low operating current. In IEEE International Electron Devices Meeting, pages 10–1. IEEE, 2013.
    [11] Ming Cheng, Lixue Xia, Zhenhua Zhu, Yi Cai, Yuan Xie, Yu Wang, and Huazhong Yang. Time: A training-in-memory architecture for rram-based deep neural networks. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 38(5):834–847, 2018.
    [12] Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory. ACM SIGARCH Computer Architecture News, 44(3):27–39, 2016.
    [13] WC Chien, YC Chen, KP Chang, EK Lai, YD Yao, P Lin, J Gong, SC Tsai, SH Hsieh, CF Chen, et al. Multi-level operation of fully cmos compatible wox resistive random access memory (rram). In 2009 IEEE International Memory Workshop, pages 1–2. IEEE, 2009.
    [14] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
    [15] Xiangyu Dong, Cong Xu, Yuan Xie, and Norman P Jouppi. Nvsim: A circuit-level performance, energy, and area model for emerging nonvolatile memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 31(7):994–1007, 2012.
    [16] Andrea Fantini, Ludovic Goux, Robin Degraeve, DJ Wouters, N Raghavan, G Kar, Attilio Belmonte, Y-Y Chen, Bogdan Govoreanu, and Malgorzata Jurczak. Intrinsic switching variability in hfo 2 rram. In 2013 5th IEEE International Memory Workshop, pages 30–33. IEEE, 2013.
    [17] Cheng-Yang Fu. pytorch-vgg-cifar10, 2019.
    [18] Daichi Fujiki, Scott Mahlke, and Reetuparna Das. In memory data parallel processor. ACM SIGPLAN Notices, 53(2):1–14, 2018.
    [19] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 580–587, 2014.
    [20] Ximeng Guan, Shimeng Yu, and H-S Philip Wong. On the switching parameter variation of metal-oxide rram—part i: Physical modeling and simulation methodology. IEEE Transactions on electron devices, 59(4):1172–1182, 2012.
    [21] Ximeng Guan, Shimeng Yu, and H.S. Philip Wong. On the variability of hfox rram: From numerical simulation to compact modeling. In Proc. Workshop Compact Models, pages 815–820, 2012.
    [22] Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.
    [23] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
    [24] XiaoLiang Hong, Desmond JiaJun Loy, Putu Andhita Dananjaya, Funan Tan, Chee- Mang Ng, and WenSiang Lew. Oxide-based rram materials for neuromorphic computing. Journal of materials science, 53(12):8720–8746, 2018.
    [25] Y Hosoi, Y Tamai, T Ohnishi, K Ishihara, T Shibuya, Y Inoue, S Yamazaki, T Nakano, S Ohnishi, N Awaya, et al. High speed unipolar switching resistance ram (rram) technology. In 2006 International Electron Devices Meeting, pages 1–4. IEEE, 2006.
    [26] Daniele Ielmini. Modeling the universal set/reset characteristics of bipolar rram by field-and temperature driven filament growth. IEEE Transactions on Electron Devices, 58(12):4309–4317, 2011.
    [27] Daniele Ielmini, Federico Nardi, Carlo Cagli, and ANDREA LEONARDO Lacaita. Trade-off between data retention and reset in nio rrams. In 2010 IEEE International Reliability Physics Symposium, pages 620–626. IEEE, 2010.
    [28] Yu Ji, Youhui Zhang, Wenguang Chen, and Yuan Xie. Bridge the gap between neural networks and neuromorphic hardware with a neural network compiler. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 448–460, 2018.
    [29] Yu Ji, YouHui Zhang, ShuangChen Li, Ping Chi, CiHang Jiang, Peng Qu, Yuan Xie, and WenGuang Chen. Neutrams: Neural network transformation and co-design under neuromorphic hardware constraints. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 1–13. IEEE, 2016.
    [30] Yu Ji, Youyang Zhang, Xinfeng Xie, Shuangchen Li, Peiqi Wang, Xing Hu, Youhui Zhang, and Yuan Xie. Fpsa: A full system stack solution for reconfigurable rerambased nn accelerator architecture. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 733–747, 2019.
    [31] Keller Jordan. PyTorch-ResNet-CIFAR10, 2018.
    [32] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009.
    [33] Edward H Lee, Daisuke Miyashita, Elaina Chai, Boris Murmann, and S Simon Wong. Lognet: Energy-efficient neural networks using logarithmic computation. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5900–5904. IEEE, 2017.
    [34] Seung Ryul Lee, Young-Bae Kim, Man Chang, Kyung Min Kim, Chang Bum Lee, Ji Hyun Hur, Gyeong-Su Park, Dongsoo Lee, Myoung-Jae Lee, Chang Jung Kim, et al. Multi-level switching of triple-layered taox rram with excellent reliability for storage class memory. In 2012 Symposium on VLSI Technology (VLSIT), pages 71–72. IEEE, 2012.
    [35] Boxun Li, Peng Gu, Yi Shan, Yu Wang, Yiran Chen, and Huazhong Yang. Rrambased analog approximate computing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 34(12):1905–1917, 2015.
    [36] Boxun Li, Peng Gu, Yi Shan, Yu Wang, Yiran Chen, and Huazhong Yang. Rrambased analog approximate computing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 34(12):1905–1917, 2015.
    [37] Jilan Lin, Shuangchen Li, Xing Hu, Lei Deng, and Yuan Xie. Cnnwire: Boosting convolutional neural network with winograd on reram based accelerators. In Proceedings of the 2019 on Great Lakes Symposium on VLSI, pages 283–286, 2019.
    [38] Jilan Lin, Lixue Xia, Zhenhua Zhu, Hanbo Sun, Yi Cai, Hui Gao, Ming Cheng, Xiaoming Chen, Yu Wang, and Huazhong Yang. Rescuing memristor-based computing with non-linear resistance levels. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 407–412. IEEE, 2018.
    [39] Chenchen Liu, Miao Hu, John Paul Strachan, and Hai Li. Rescuing memristor-based neuromorphic design with high defects. In 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC), pages 1–6. IEEE, 2017.
    [40] Yun Long, Xueyuan She, and Saibal Mukhopadhyay. Design of reliable dnn accelerator with un-reliable reram. In Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1769–1774. IEEE, 2019.
    [41] Daisuke Miyashita, Edward H Lee, and Boris Murmann. Convolutional neural networks using logarithmic data representation. arXiv preprint arXiv:1603.01025, 2016.
    [42] C Nail, G Molas, P Blaise, G Piccolboni, B Sklenard, C Cagli, M Bernard, A Roule, M Azzaz, E Vianello, et al. Understanding rram endurance, retention and window margin trade-off using experimental results and simulations. In 2016 IEEE International Electron Devices Meeting (IEDM), pages 4–5. IEEE, 2016.
    [43] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017.
    [44] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
    [45] Ugo Russo, Daniele Ielmini, Carlo Cagli, and Andrea L Lacaita. Filament conduction and reset mechanism in nio-based resistive-switching memory (rram) devices. IEEE Transactions on Electron Devices, 56(2):186–192, 2009. [46] Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R Stanley Williams, and Vivek Srikumar. Isaac: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Computer Architecture News, 44(3):14–26, 2016.
    [47] Abhishek Sharma. What is the differences between artificial neural network (computer science) and biological neural network? https://www.quora.com/What-is-the-differences-between-artificial-neural-network-computer-science-and-biological-neural-network, 2017.
    [48] Shyh-Shyuan Sheu, Meng-Fan Chang, Ku-Feng Lin, Che-Wei Wu, Yu-Sheng Chen, Pi- Feng Chiu, Chia-Chen Kuo, Yih-Shan Yang, Pei-Chia Chiang, Wen-Pin Lin, et al. A 4mb embedded slc resistive-ram macro with 7.2 ns read-write random-access time and 160ns mlc-access capability. In 2011 IEEE International Solid-State Circuits Conference, pages 200–202. IEEE, 2011.
    [49] Linghao Song, Xuehai Qian, Hai Li, and Yiran Chen. Pipelayer: A pipelined rerambased accelerator for deep learning. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 541–552. IEEE, 2017.
    [50] Martin Sundermeyer, Ralf Schlüter, and Hermann Ney. Lstm neural networks for language modeling. In Thirteenth annual conference of the international speech communication association, 2012.
    [51] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015.
    [52] Shyamkumar Thoziyoor, Naveen Muralimanohar, Jung Ho Ahn, and Norman P Jouppi. Cacti 5.1. Technical report, Technical Report HPL-2008-20, HP Labs, 2008.
    [53] Boubacar Traoré, Philippe Blaise, Elisa Vianello, Luca Perniola, Barbara De Salvo, and Yoshio Nishi. Hfo 2-based rram: Electrode effects, ti/hfo 2 interface, charge injection, and oxygen (o) defects diffusion through experiment andab initiocalculations. IEEE Transactions on Electron Devices, 63(1):360–368, 2015.
    [54] H-S Philip Wong, Heng-Yuan Lee, Shimeng Yu, Yu-Sheng Chen, Yi Wu, Pang-Shiu Chen, Byoungil Lee, Frederick T Chen, and Ming-Jinn Tsai. Metal–oxide rram. Proceedings of the IEEE, 100(6):1951–1970, 2012.
    [55] Jiyong Woo, Kibong Moon, Jeonghwan Song, Myounghoon Kwak, Jaesung Park, and Hyunsang Hwang. Optimized programming scheme enabling linear potentiation in filamentary hfo 2 rram synapse for neuromorphic systems. IEEE Transactions on Electron Devices, 63(12):5064–5067, 2016.
    [56] Shuang Wu, Guoqi Li, Feng Chen, and Luping Shi. Training and inference with integers in deep neural networks. arXiv preprint arXiv:1802.04680, 2018.
    [57] Yi Wu, Byoungil Lee, and H-S Philip Wong. Ultra-low power al 2 o 3-based rram with 1μa reset current. In Proceedings of 2010 International Symposium on VLSI Technology, System and Application, pages 136–137. IEEE, 2010.
    [58] Lixue Xia, Peng Gu, Boxun Li, Tianqi Tang, Xiling Yin, Wenqin Huangfu, Shimeng Yu, Yu Cao, Yu Wang, and Huazhong Yang. Technological exploration of rram crossbar array for matrix-vector multiplication. Journal of Computer Science and Technology, 31(1):3–19, 2016.
    [59] Lixue Xia, Wenqin Huangfu, Tianqi Tang, Xiling Yin, Krishnendu Chakrabarty, Yuan Xie, Yu Wang, and Huazhong Yang. Stuck-at fault tolerance in rram computing systems. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 8(1):102–115, 2017.
    [60] Lixue Xia, Mengyun Liu, Xuefei Ning, Krishnendu Chakrabarty, and Yu Wang. Faulttolerant training enabled by on-line fault detection for rram-based neural computing systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 38(9):1611–1624, 2018.
    [61] Yuhui Xu, Yongzhuang Wang, Aojun Zhou, Weiyao Lin, and Hongkai Xiong. Deep neural network compression with single and multiple level quantization. In Thirty- Second AAAI Conference on Artificial Intelligence, 2018.
    [62] Lee-Eun Yu, Sungho Kim, Min-Ki Ryu, Sung-Yool Choi, and Yang-Kyu Choi. Structure effects on resistive switching of al/tiox/al devices for rram applications. IEEE electron device letters, 29(4):331–333, 2008.
    [63] Baogang Zhang, Necati Uysal, Deliang Fan, and Rickard Ewetz. Handling stuck-atfaults in memristor crossbar arrays using matrix transformations. In Proceedings of the 24th Asia and South Pacific Design Automation Conference, pages 438–443, 2019.
    [64] Zhenhua Zhu, Jilan Lin, Ming Cheng, Lixue Xia, Hanbo Sun, Xiaoming Chen, Yu Wang, and Huazhong Yang. Mixed size crossbar based rram cnn accelerator with overlapped mapping method. In 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 1–8. IEEE, 2018.

    無法下載圖示
    校外:不公開
    電子論文及紙本論文均尚未授權公開
    QR CODE