簡易檢索 / 詳目顯示

研究生: 何偉立
He, Wei-Li
論文名稱: 使用記憶體內運算架構實現一個極低功耗之語音活動檢測神經網路
A Computing-in-Memory Architecture for Ultra-Low Power Voice Activity Detection Neural Network
指導教授: 張順志
Chang, Soon-Jyh
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 126
中文關鍵詞: 機器學習記憶體內運算語音活動檢測混合精度神經網絡類比計算軟硬體共同設計
外文關鍵詞: machine learning, computing-in-memory, voice activity detection, hybrid precision neural network, analog computation, software and hardware co-design
相關次數: 點閱:119下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文是第一個基於記憶體內運算實現專用於語音活動檢測神經網絡中分類器的設計。提出的靜態隨機存取記憶體完美地融入逐漸趨近式類比數位轉換器用來執行類比運算以降低能量消耗,且可避免資料誤寫。本設計從傳統的記憶體內運算電路中移除了數位至類比轉換器,以便後續將該分類器與前端類比特徵提取電路整合在一起,可以大大降低能量消耗和面積成本。透過軟硬體共同設計,提出了一個混合精確度的語音活動檢測神經網絡,且可以預測輸入範圍以防止逐漸趨近式類比數位轉換器量化時不必要的切換,並以超低的面積及能量消耗在極低信噪比的信號中執行語音活動檢測任務。本論文中一共設計並下線兩個晶片,分別是基於記憶體內運算之神經網路中的分類器電路和整合之完整語音活動檢測系統。
    第一個晶片——基於記憶體內運算實現之語音活動檢測神經網路中的分類器,採用台積電 180奈米 CMOS 標準 1P6M 製程實作晶片,其中晶片面積為 0.42 mm^2。在以 Dolphin Design 作為資料集且信噪比為3dB的輸入訊號情況下,此晶片的人聲抓取率(VDV) 及 噪音誤判率(NDV) 分別為 97% 和 5.2%,人聲抓取延遲為 30毫秒,能量消耗為 526奈瓦。
    第二個晶片——完整的語音活動檢測神經網絡,其為第一個晶片 (分類器) 與前端類比特徵提取電路整合而成的,同樣採用台積電180奈米CMOS標準1P6M製程製造,面積為1.1 mm^2。一樣在以 Dolphin Design 作為資料集且信噪比為3dB的輸入訊號情況下,此晶片的人聲抓取率(VDV) 及 噪音誤判率(NDV) 分別為 97% 和 3.4%,人聲抓取延遲為 32 毫秒,能量消耗為 2660 奈瓦。

    This thesis is the first design that proposes a computing-in-memory-based classifier dedicated to a voice activity detection neural network (VAD NN). The proposed 8T static random-access memory (SRAM) perfectly merges into SAR ADC and performs the analog computation to reduce the power consumption and even avoid read-disturbance. The digital-to-analog circuit (DAC) is removed from the convention CIM circuit to integrate this classifier with the front-end analog feature extractor circuit, which can greatly decrease the power and area consumption. Through software and hardware co-design, a hybrid precision voice activity detection neural network is proposed that can predict the input range to prevent SAR ADC unnecessary switching and execute classification tasks in extremely low SNR signals with ultra-low area and power consumption. We designed and tapped out two chips, including a computing-in-memory-based classifier circuit and a complete voice activity detection system.
    The first chip - a computing-in-memory-based classifier dedicated to a voice activity detection neural network - was fabricated in TSMC 180-nm CMOS standard 1P6M technology, and the CIM-based VAD NN occupies the area of 0.42 mm^2. The measurement result shows that VDV and NDV are 97% and 5.2%, respectively, and 30ms latency with 526nW power consumption on the Dolphin Design dataset with the 3dB SNR input signals.
    The second chip - a complete voice activity detection neural network - is developed after integrating with the front-end analog feature extractor, which was also fabricated in TSMC 180-nm CMOS standard 1P6M technology, whose area consumes 1.1 mm^2. The measurement result shows that VDV and NDV are 97% and 3.4%, respectively, and 32ms latency with 2660nW power consumption on the Dolphin Design dataset with the 3dB SNR input signals.

    摘 要 I Abstract III List of Tables IX List of Figures X Chapter 1  Introduction 1 1.1 Background and Motivation 1 1.2 Thesis Organization 3 Chapter2  Basics of Voice Activity Detection (VAD) Neural Network 4 2.1 Subfield in Artificial Intelligence 5 2.2 Neural Network Fundamentals 7 2.2.1 From Neural Networks to DNNs 7 2.2.2 Training Versus Inference 9 2.2.3 MLP and CNN 13 2.3 Compression of Neural Network 18 2.3.1 Pruning 18 2.3.2 Quantization 20 2.4 Voice Activity Detection Neural Network 23 2.5 Relative Works of Voice Activity Detector (VAD) 28 2.5.1 Analog Signal Processing VAD with NN-based Mixed-Signal Classifier 28 2.5.2 All-Analog Voice Activity Detector 32 2.5.3 Time-Domain CNN based VAD 34 2.5.4 Summary of VAD Relative Works 37 Chapter 3  Basics of Computing-in-Memory 39 3.1 Memory Hierarchy 40 3.2 Von-Neumann Bottleneck 41 3.3 In/Near Memory Computation 43 3.4 Sub-circuits in Computing-in-Memory Architecture 45 3.4.1 Digital to Analog Converter 46 3.4.2 SRAM Cell 51 3.4.3 Analog to Digital Converter 54 3.5 Relative Works of Computing-in-Memory (CIM) 60 3.5.1 The First Prototype of Computation Within SRAM 60 3.5.2 A Convolutional Computation Within SRAM 64 3.5.3 A LCC-based 6T SRAM CIM Macro 66 3.5.4 Summary of CIM Relative Works 68 Chapter 4  A Computing-in-Memory Architecture for Ultra-Low Power VAD Neural Network 70 4.1 Introduction 70 4.2 Proposed VAD Neural Network Architecture 72 4.2.1 Software Implementation of VAD Neural Network 72 4.2.2 Hardware Implementation of VAD Neural Network 76 4.3 Adopted Techniques of SAR ADC 85 4.3.1 The Monotonic Switching Method 85 4.3.2 Direct Switching Method and Compact Combinational Timing Control 87 4.4 Circuit Implementation 88 4.4.1 SRAM Array and Peripheral Circuits 88 4.4.2 Dynamic Comparator 90 4.4.3 Capacitive DAC 93 Chapter 5  Simulation and Measurement Results 96 5.1 The First Chip – A Classifier in VAD System 97 5.1.1 Chip Layout and Floor Plan - CIM-based Classifier 97 5.1.2 Post-Layout Simulation Results - CIM-based Classifier 99 5.1.3 Die Micrograph and Measurement Setup - CIM-based Classifier 103 5.1.4 Measurement Results - CIM-based Classifier 106 5.2 The Second Chip – Complete VAD Neural Network 110 5.2.1 Chip Layout and Floor Plan - Complete VAD NN 110 5.2.2 Post-Layout Simulation Results - Complete VAD NN 111 5.2.3 Die Micrograph and Measurement Setup - Complete VAD NN 112 5.2.4 Measurement Results - Complete VAD NN 114 Chapter 6  Conclusion and Future Work 119 Reference 122

    [1] G. E. Moore, “Cramming More Components Onto Integrated Circuits,” in Proceedings of the IEEE, vol. 86, no. 1, pp. 82-85, Jan. 1998, doi: 10.1109/JPROC.1998.658762.
    [2] M. Bohr, “A 30 Year Retrospective on Dennard's MOSFET Scaling Paper,” in IEEE Solid-State Circuits Society Newsletter, vol. 12, no. 1, pp. 11-13, Winter 2007, doi: 10.1109/N-SSC.2007.4785534.
    [3] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, May 2015, doi: 10.1038/nature14539.
    [4] Siri Team, “Hey Siri: An on-device DNN-powered voice trigger for apple’s personal assistant,” Apple Mach. Learn. J., vol. 1, no. 6, Oct. 2017. [Online]. Available: https://machinelearning.apple.com/2017/10/01/hey-siri.html.
    [5] Computer Vision Machine Learning Team, “An on-device deep neural network for face detection,” Apple Mach. Learn. J., vol. 1, no. 7, Nov. 2017. [Online]. Available: https://machinelearning.apple.com/2017/11/16/face-detection.html.
    [6] R. Sarikaya, G. E. Hinton and A. Deoras, “Application of Deep Belief Networks for Natural Language Understanding,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 4, pp. 778-784, April 2014, doi: 10.1109/TASLP.2014.2303296.
    [7] M. Horowitz, “Computing's energy problem (and what we can do about it),” 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014, pp. 10-14, doi: 10.1109/ISSCC.2014.6757323.
    [8] J. von Neumann, “First draft of a report on the EDVAC,” in IEEE Annals of the History of Computing, vol. 15, no. 4, pp. 27-75, 1993, doi: 10.1109/85.238389.
    [9] V. Sze, Y. -H. Chen, T. -J. Yang and J. S. Emer, “Efficient Processing of Deep Neural Networks: A Tutorial and Survey,” in Proceedings of the IEEE, vol. 105, no. 12, pp. 2295-2329, Dec. 2017, doi: 10.1109/JPROC.2017.2761740.
    [10] LeCun, Yann, John Denker, and Sara Solla, “Optimal brain damage,” Advances in neural information processing systems 2 (1989).
    [11] Han, Song, et al. “Learning both weights and connections for efficient neural network,” Advances in neural information processing systems 28 (2015).
    [12] Frankle, Jonathan, and Michael Carbin, “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” arXiv preprint arXiv:1803.03635 (2018).
    [13] E. H. Lee, D. Miyashita, E. Chai, B. Murmann and S. S. Wong, “LogNet: Energy-efficient neural networks using logarithmic computation,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 5900-5904, doi: 10.1109/ICASSP.2017.7953288.
    [14] H, Song, H. Mao, and W. Dally, “Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding,” in Computer Vision and Pattern Recognition, 2016.
    [15] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-based learning applied to document recognition,” in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998, doi: 10.1109/5.726791.
    [16] Ioffe, Sergey, and Christian Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proceedings of the International conference on machine learning, PMLR, 2015.
    [17] Y. -H. Chen, T. Krishna, J. S. Emer and V. Sze, “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks,” in IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127-138, Jan. 2017, doi: 10.1109/JSSC.2016.2616357.
    [18] C. Liu, S. Chang, G. Huang and Y. Lin, “A 10-bit 50-MS/s SAR ADC With a Monotonic Capacitor Switching Procedure,” in IEEE Journal of Solid-State Circuits, vol. 45, no. 4, pp. 731-740, April 2010, doi: 10.1109/JSSC.2010.2042254.
    [19] C. Liu, C. Kuo and Y. Lin, “A 10 bit 320 MS/s Low-Cost SAR ADC for IEEE 802.11ac Applications in 20 nm CMOS,” in IEEE Journal of Solid-State Circuits, vol. 50, no. 11, pp. 2645-2654, Nov. 2015, doi: 10.1109/JSSC.2015.2466475.
    [20] B. Razavi, “The StrongARM Latch [A Circuit for All Seasons],” in IEEE Solid-State Circuits Magazine, vol. 7, no. 2, pp. 12-17, Spring 2015, doi: 10.1109/MSSC.2015.2418155.
    [21] G. Huang, S. Chang, Y. Lin, C. Liu and C. Huang, “A 10b 200MS/s 0.82mW SAR ADC in 40nm CMOS,” in Proceedings of the IEEE Asian Solid-State Circuits Conference (A-SSCC), 2013, pp. 289-292, doi: 10.1109/ASSCC.2013.6691039.
    [22] C.-H. Kuo, “A 10-bit 120-MS/s SAR ADC with compact architecture and noise suppression technique,” M.S. thesis, Dept. Elect. Eng., National Cheng Kung Univ., Tainan, Taiwan, 2014.
    [23] J. Zhang, Z. Wang and N. Verma, “In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array,” in IEEE Journal of Solid-State Circuits, vol. 52, no. 4, pp. 915-924, April 2017, doi: 10.1109/JSSC.2016.2642198.
    [24] C. -J. Jhang, C. -X. Xue, J. -M. Hung, F. -C. Chang and M. -F. Chang, “Challenges and Trends of SRAM-Based Computing-In-Memory for AI Edge Devices,” in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 68, no. 5, pp. 1773-1786, May 2021, doi: 10.1109/TCSI.2021.3064189.
    [25] A. Biswas and A. P. Chandrakasan, “CONV-SRAM: An Energy-Efficient SRAM With In-Memory Dot-Product Computation for Low-Power Convolutional Neural Networks,” in IEEE Journal of Solid-State Circuits, vol. 54, no. 1, pp. 217-230, Jan. 2019, doi: 10.1109/JSSC.2018.2880918.
    [26] X. Si et al., “A Local Computing Cell and 6T SRAM-Based Computing-in-Memory Macro With 8-b MAC Operation for Edge AI Chips,” in IEEE Journal of Solid-State Circuits, vol. 56, no. 9, pp. 2817-2831, Sept. 2021, doi: 10.1109/JSSC.2021.3073254.
    [27] M. Price, J. Glass and A. P. Chandrakasan, “A scalable speech recognizer with deep-neural-network acoustic models and voice-activated power gating,” 2017 IEEE International Solid-State Circuits Conference (ISSCC), 2017, pp. 244-245, doi: 10.1109/ISSCC.2017.7870352.
    [28] K. M. H. Badami, S. Lauwereins, W. Meert and M. Verhelst, “A 90 nm CMOS, 6μW Power-Proportional Acoustic Sensing Frontend for Voice Activity Detection,” in IEEE Journal of Solid-State Circuits, vol. 51, no. 1, pp. 291-302, Jan. 2016, doi: 10.1109/JSSC.2015.2487276.
    [29] M. Yang, C. -H. Yeh, Y. Zhou, J. P. Cerqueira, A. A. Lazar and M. Seok, “Design of an Always-On Deep Neural Network-Based 1μW Voice Activity Detector Aided With a Customized Software Model for Analog Feature Extraction,” in IEEE Journal of Solid-State Circuits, vol. 54, no. 6, pp. 1764-1777, June 2019, doi: 10.1109/JSSC.2019.2894360.
    [30] U. Mukherjee, T. Halder, A. Kannan, S. Ghosh and S. Pavan, “A 28.5µW All-Analog Voice-Activity Detector,” 2021 IEEE International Symposium on Circuits and Systems (ISCAS), 2021, pp. 1-5, doi: 10.1109/ISCAS51556.2021.9401504.
    [31] F. Chen, K. -F. Un, W. -H. Yu, P. -I. Mak and R. P. Martins, “A 108nW 0.8mm^2 Analog Voice Activity Detector (VAD) Featuring a Time-Domain CNN as a Programmable Feature Extractor and a Sparsity-Aware Computational Scheme in 28nm CMOS,” 2022 IEEE International Solid- State Circuits Conference (ISSCC), 2022, pp. 1-3, doi: 10.1109/ISSCC42614.2022.9731720.
    [32] A. J. Bhavnagarwala, Xinghai Tang and J. D. Meindl, “The impact of intrinsic device fluctuations on CMOS SRAM cell stability,” in IEEE Journal of Solid-State Circuits, vol. 36, no. 4, pp. 658-665, April 2001, doi: 10.1109/4.913744.

    下載圖示 校內:2025-08-12公開
    校外:2025-08-12公開
    QR CODE