| 研究生: |
何偉立 He, Wei-Li |
|---|---|
| 論文名稱: |
使用記憶體內運算架構實現一個極低功耗之語音活動檢測神經網路 A Computing-in-Memory Architecture for Ultra-Low Power Voice Activity Detection Neural Network |
| 指導教授: |
張順志
Chang, Soon-Jyh |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2022 |
| 畢業學年度: | 110 |
| 語文別: | 英文 |
| 論文頁數: | 126 |
| 中文關鍵詞: | 機器學習 、記憶體內運算 、語音活動檢測 、混合精度神經網絡 、類比計算 、軟硬體共同設計 |
| 外文關鍵詞: | machine learning, computing-in-memory, voice activity detection, hybrid precision neural network, analog computation, software and hardware co-design |
| 相關次數: | 點閱:119 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文是第一個基於記憶體內運算實現專用於語音活動檢測神經網絡中分類器的設計。提出的靜態隨機存取記憶體完美地融入逐漸趨近式類比數位轉換器用來執行類比運算以降低能量消耗,且可避免資料誤寫。本設計從傳統的記憶體內運算電路中移除了數位至類比轉換器,以便後續將該分類器與前端類比特徵提取電路整合在一起,可以大大降低能量消耗和面積成本。透過軟硬體共同設計,提出了一個混合精確度的語音活動檢測神經網絡,且可以預測輸入範圍以防止逐漸趨近式類比數位轉換器量化時不必要的切換,並以超低的面積及能量消耗在極低信噪比的信號中執行語音活動檢測任務。本論文中一共設計並下線兩個晶片,分別是基於記憶體內運算之神經網路中的分類器電路和整合之完整語音活動檢測系統。
第一個晶片——基於記憶體內運算實現之語音活動檢測神經網路中的分類器,採用台積電 180奈米 CMOS 標準 1P6M 製程實作晶片,其中晶片面積為 0.42 mm^2。在以 Dolphin Design 作為資料集且信噪比為3dB的輸入訊號情況下,此晶片的人聲抓取率(VDV) 及 噪音誤判率(NDV) 分別為 97% 和 5.2%,人聲抓取延遲為 30毫秒,能量消耗為 526奈瓦。
第二個晶片——完整的語音活動檢測神經網絡,其為第一個晶片 (分類器) 與前端類比特徵提取電路整合而成的,同樣採用台積電180奈米CMOS標準1P6M製程製造,面積為1.1 mm^2。一樣在以 Dolphin Design 作為資料集且信噪比為3dB的輸入訊號情況下,此晶片的人聲抓取率(VDV) 及 噪音誤判率(NDV) 分別為 97% 和 3.4%,人聲抓取延遲為 32 毫秒,能量消耗為 2660 奈瓦。
This thesis is the first design that proposes a computing-in-memory-based classifier dedicated to a voice activity detection neural network (VAD NN). The proposed 8T static random-access memory (SRAM) perfectly merges into SAR ADC and performs the analog computation to reduce the power consumption and even avoid read-disturbance. The digital-to-analog circuit (DAC) is removed from the convention CIM circuit to integrate this classifier with the front-end analog feature extractor circuit, which can greatly decrease the power and area consumption. Through software and hardware co-design, a hybrid precision voice activity detection neural network is proposed that can predict the input range to prevent SAR ADC unnecessary switching and execute classification tasks in extremely low SNR signals with ultra-low area and power consumption. We designed and tapped out two chips, including a computing-in-memory-based classifier circuit and a complete voice activity detection system.
The first chip - a computing-in-memory-based classifier dedicated to a voice activity detection neural network - was fabricated in TSMC 180-nm CMOS standard 1P6M technology, and the CIM-based VAD NN occupies the area of 0.42 mm^2. The measurement result shows that VDV and NDV are 97% and 5.2%, respectively, and 30ms latency with 526nW power consumption on the Dolphin Design dataset with the 3dB SNR input signals.
The second chip - a complete voice activity detection neural network - is developed after integrating with the front-end analog feature extractor, which was also fabricated in TSMC 180-nm CMOS standard 1P6M technology, whose area consumes 1.1 mm^2. The measurement result shows that VDV and NDV are 97% and 3.4%, respectively, and 32ms latency with 2660nW power consumption on the Dolphin Design dataset with the 3dB SNR input signals.
[1] G. E. Moore, “Cramming More Components Onto Integrated Circuits,” in Proceedings of the IEEE, vol. 86, no. 1, pp. 82-85, Jan. 1998, doi: 10.1109/JPROC.1998.658762.
[2] M. Bohr, “A 30 Year Retrospective on Dennard's MOSFET Scaling Paper,” in IEEE Solid-State Circuits Society Newsletter, vol. 12, no. 1, pp. 11-13, Winter 2007, doi: 10.1109/N-SSC.2007.4785534.
[3] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, May 2015, doi: 10.1038/nature14539.
[4] Siri Team, “Hey Siri: An on-device DNN-powered voice trigger for apple’s personal assistant,” Apple Mach. Learn. J., vol. 1, no. 6, Oct. 2017. [Online]. Available: https://machinelearning.apple.com/2017/10/01/hey-siri.html.
[5] Computer Vision Machine Learning Team, “An on-device deep neural network for face detection,” Apple Mach. Learn. J., vol. 1, no. 7, Nov. 2017. [Online]. Available: https://machinelearning.apple.com/2017/11/16/face-detection.html.
[6] R. Sarikaya, G. E. Hinton and A. Deoras, “Application of Deep Belief Networks for Natural Language Understanding,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 4, pp. 778-784, April 2014, doi: 10.1109/TASLP.2014.2303296.
[7] M. Horowitz, “Computing's energy problem (and what we can do about it),” 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014, pp. 10-14, doi: 10.1109/ISSCC.2014.6757323.
[8] J. von Neumann, “First draft of a report on the EDVAC,” in IEEE Annals of the History of Computing, vol. 15, no. 4, pp. 27-75, 1993, doi: 10.1109/85.238389.
[9] V. Sze, Y. -H. Chen, T. -J. Yang and J. S. Emer, “Efficient Processing of Deep Neural Networks: A Tutorial and Survey,” in Proceedings of the IEEE, vol. 105, no. 12, pp. 2295-2329, Dec. 2017, doi: 10.1109/JPROC.2017.2761740.
[10] LeCun, Yann, John Denker, and Sara Solla, “Optimal brain damage,” Advances in neural information processing systems 2 (1989).
[11] Han, Song, et al. “Learning both weights and connections for efficient neural network,” Advances in neural information processing systems 28 (2015).
[12] Frankle, Jonathan, and Michael Carbin, “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” arXiv preprint arXiv:1803.03635 (2018).
[13] E. H. Lee, D. Miyashita, E. Chai, B. Murmann and S. S. Wong, “LogNet: Energy-efficient neural networks using logarithmic computation,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 5900-5904, doi: 10.1109/ICASSP.2017.7953288.
[14] H, Song, H. Mao, and W. Dally, “Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding,” in Computer Vision and Pattern Recognition, 2016.
[15] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-based learning applied to document recognition,” in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998, doi: 10.1109/5.726791.
[16] Ioffe, Sergey, and Christian Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proceedings of the International conference on machine learning, PMLR, 2015.
[17] Y. -H. Chen, T. Krishna, J. S. Emer and V. Sze, “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks,” in IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127-138, Jan. 2017, doi: 10.1109/JSSC.2016.2616357.
[18] C. Liu, S. Chang, G. Huang and Y. Lin, “A 10-bit 50-MS/s SAR ADC With a Monotonic Capacitor Switching Procedure,” in IEEE Journal of Solid-State Circuits, vol. 45, no. 4, pp. 731-740, April 2010, doi: 10.1109/JSSC.2010.2042254.
[19] C. Liu, C. Kuo and Y. Lin, “A 10 bit 320 MS/s Low-Cost SAR ADC for IEEE 802.11ac Applications in 20 nm CMOS,” in IEEE Journal of Solid-State Circuits, vol. 50, no. 11, pp. 2645-2654, Nov. 2015, doi: 10.1109/JSSC.2015.2466475.
[20] B. Razavi, “The StrongARM Latch [A Circuit for All Seasons],” in IEEE Solid-State Circuits Magazine, vol. 7, no. 2, pp. 12-17, Spring 2015, doi: 10.1109/MSSC.2015.2418155.
[21] G. Huang, S. Chang, Y. Lin, C. Liu and C. Huang, “A 10b 200MS/s 0.82mW SAR ADC in 40nm CMOS,” in Proceedings of the IEEE Asian Solid-State Circuits Conference (A-SSCC), 2013, pp. 289-292, doi: 10.1109/ASSCC.2013.6691039.
[22] C.-H. Kuo, “A 10-bit 120-MS/s SAR ADC with compact architecture and noise suppression technique,” M.S. thesis, Dept. Elect. Eng., National Cheng Kung Univ., Tainan, Taiwan, 2014.
[23] J. Zhang, Z. Wang and N. Verma, “In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array,” in IEEE Journal of Solid-State Circuits, vol. 52, no. 4, pp. 915-924, April 2017, doi: 10.1109/JSSC.2016.2642198.
[24] C. -J. Jhang, C. -X. Xue, J. -M. Hung, F. -C. Chang and M. -F. Chang, “Challenges and Trends of SRAM-Based Computing-In-Memory for AI Edge Devices,” in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 68, no. 5, pp. 1773-1786, May 2021, doi: 10.1109/TCSI.2021.3064189.
[25] A. Biswas and A. P. Chandrakasan, “CONV-SRAM: An Energy-Efficient SRAM With In-Memory Dot-Product Computation for Low-Power Convolutional Neural Networks,” in IEEE Journal of Solid-State Circuits, vol. 54, no. 1, pp. 217-230, Jan. 2019, doi: 10.1109/JSSC.2018.2880918.
[26] X. Si et al., “A Local Computing Cell and 6T SRAM-Based Computing-in-Memory Macro With 8-b MAC Operation for Edge AI Chips,” in IEEE Journal of Solid-State Circuits, vol. 56, no. 9, pp. 2817-2831, Sept. 2021, doi: 10.1109/JSSC.2021.3073254.
[27] M. Price, J. Glass and A. P. Chandrakasan, “A scalable speech recognizer with deep-neural-network acoustic models and voice-activated power gating,” 2017 IEEE International Solid-State Circuits Conference (ISSCC), 2017, pp. 244-245, doi: 10.1109/ISSCC.2017.7870352.
[28] K. M. H. Badami, S. Lauwereins, W. Meert and M. Verhelst, “A 90 nm CMOS, 6μW Power-Proportional Acoustic Sensing Frontend for Voice Activity Detection,” in IEEE Journal of Solid-State Circuits, vol. 51, no. 1, pp. 291-302, Jan. 2016, doi: 10.1109/JSSC.2015.2487276.
[29] M. Yang, C. -H. Yeh, Y. Zhou, J. P. Cerqueira, A. A. Lazar and M. Seok, “Design of an Always-On Deep Neural Network-Based 1μW Voice Activity Detector Aided With a Customized Software Model for Analog Feature Extraction,” in IEEE Journal of Solid-State Circuits, vol. 54, no. 6, pp. 1764-1777, June 2019, doi: 10.1109/JSSC.2019.2894360.
[30] U. Mukherjee, T. Halder, A. Kannan, S. Ghosh and S. Pavan, “A 28.5µW All-Analog Voice-Activity Detector,” 2021 IEEE International Symposium on Circuits and Systems (ISCAS), 2021, pp. 1-5, doi: 10.1109/ISCAS51556.2021.9401504.
[31] F. Chen, K. -F. Un, W. -H. Yu, P. -I. Mak and R. P. Martins, “A 108nW 0.8mm^2 Analog Voice Activity Detector (VAD) Featuring a Time-Domain CNN as a Programmable Feature Extractor and a Sparsity-Aware Computational Scheme in 28nm CMOS,” 2022 IEEE International Solid- State Circuits Conference (ISSCC), 2022, pp. 1-3, doi: 10.1109/ISSCC42614.2022.9731720.
[32] A. J. Bhavnagarwala, Xinghai Tang and J. D. Meindl, “The impact of intrinsic device fluctuations on CMOS SRAM cell stability,” in IEEE Journal of Solid-State Circuits, vol. 36, no. 4, pp. 658-665, April 2001, doi: 10.1109/4.913744.