| 研究生: |
蔣孟儒 Chiang, Meng-Ju |
|---|---|
| 論文名稱: |
一個應用於語音活動檢測之切換式電容聲音特徵萃取器 A Switched-Capacitor Based Acoustic Feature Extractor for Voice Activity Detection |
| 指導教授: |
張順志
Chang, Soon-Jyh |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2022 |
| 畢業學年度: | 110 |
| 語文別: | 英文 |
| 論文頁數: | 101 |
| 中文關鍵詞: | 語音活動檢測 、聲音特徵萃取器 、開關式電容濾波器 、運算放大器共享技術 |
| 外文關鍵詞: | voice activity detection, acoustic feature extractor, switched-capacitor filter, op-amp sharing technique |
| 相關次數: | 點閱:44 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文提出了一種應用於語音活動檢測之切換式電容聲音特徵萃取器設計。為了克服PVT變化衍生的問題,在系統中選擇開關式電容濾波器來實現電路,也透過運算放大器共享技術來降低功率消耗。除此之外,將頻帶從16個減少到6個,系統可以在不需要犧牲語音活動偵測正確率的情況下進一步降低功率消耗。
本次研究總共實現了兩顆晶片,包含了前端的特徵萃取電路以及整合型的語音活動檢測器。第一顆特徵萃取晶片以台積電180奈米CMOS製程製造,晶片中核心電路面積為0.65 mm2,在1.8伏特供應電壓下,功率消耗為2 μW。使用基於類神經網路實現的軟體端分類器對聲音以及噪音做辨識。在3dB信噪比的輸入訊號下,語音成功判斷率為97%,噪音誤觸率為3.8%,延遲時間為30毫秒。
第二顆整合型語音活動檢測器晶片,實作包含了特徵萃取電路與使用類神經網絡實現的分類器。晶片以台積電180奈米CMOS製程製造,晶片中核心電路面積為1.104 mm2,在1.8伏特供應電壓下,功率消耗為2.66 μW。在3dB信噪比的輸入訊號下,語音成功判斷率為97%,噪音誤觸率為3.4%,延遲時間為32毫秒。
This thesis presents a low-power acoustic feature extractor (AFE) for voice activity detection (VAD). To overcome the effect of Process-Voltage-Temperature (PVT) variations, a switched-capacitor (SC) band-pass filter (BPF) with the op-amp sharing technique is presented in this work to save power consumption. In addition, further reduction on power consumption is achieved by reducing the frequency bands from 16 to 6 without scarifying the voice activity detection accuracy of the whole system.
Two proof-of-concept chips are realized in this work, including an analog feature extractor (AFE) circuit and an integrated voice activity detector (VAD). The AFE chip is fabricated in 180-nm CMOS process, has a core area of 0.65 mm2 and consumes 2 μW under a 1.8-V supply voltage. Using a software neural network based classifier to distinguish the voice from background noise, it achieves an average of 97% voice detected as voice (VDV) and 3.8% noise detected as voice (NDV) at a 3dB signal-to-noise ratio (SNR) with 30 ms latency.
The VAD chip merges the AFE with a hardware classifier. The classifier is implemented based on an in-memory-computation neural network. The VAD chip is also fabricated in a 180-nm CMOS process, has a core area of 1.104 mm2 and consumes 2.66 μW under a 1.8-V supply voltage. Experiment results show that the proof-of-concept prototype achieves an average of 97% VDV and 3.4% NDV at a 3dB SNR with 32 ms latency.
[1] K. M. H. Badami, S. Lauwereins, W. Meert and M. Verhelst, “A 90 nm CMOS, 6 μW Power-Proportional Acoustic Sensing Frontend for Voice Activity Detection, ” IEEE Journal of Solid-State Circuits, vol. 51, no. 1, pp. 291-302, January 2016, doi: 10.1109/JSSC.2015.2487276.
[2] Aniruddha Bhandari, “Everything you Should Know about Confusion Matrix for Machine Learning,” April 2020 [Online]. Available: https://www.analyticsvidhya.com/blog/2020/04/confusion-matrix-machine-learning/
[3] P. P. Dahake, K. Shaw and P. Malathi, “Speaker dependent speech emotion recognition using MFCC and Support Vector Machine,” Proceedings of the International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT), 2016, pp. 1080-1084, doi: 10.1109/ICACDOT.2016.7877753.
[4] T. Barbu, “Comparing various voice recognition techniques,” Proceedings of the 5-th Conference on Speech Technology and Human Computer Dialogue, 2009 pp. 1 – 6.
[5] Roger Jang, “Audio Signal Processing and Recognition,” [Online]. Available: http://mirlab.org/jang/books/audioSignalProcessing/
[6] Y. X. Zou, W. Q. Zheng, W. Shi and H. Liu, “Improved Voice Activity Detection based on support vector machine with high separable speech feature vectors,” Proceedings of the 19th International Conference on Digital Signal Processing, 2014, pp. 763-767, doi: 10.1109/ICDSP.2014.6900767.
[7] J. Dey, M. S. Bin Hossain and M. A. Haque, “An Ensemble SVM-based Approach for Voice Activity Detection,” Proceedings of the 10th International Conference on Electrical and Computer Engineering, 2018, pp. 297-300, doi: 10.1109/ICECE.2018.8636745.
[8] M. Yang, C. -H. Yeh, Y. Zhou, J. P. Cerqueira, A. A. Lazar and M. Seok, “Design of an Always-On Deep Neural Network-Based 1-μW Voice Activity Detector Aided With a Customized Software Model for Analog Feature Extraction,” IEEE Journal of Solid-State Circuits, vol. 54, no. 6, pp. 1764-1777, June 2019, doi: 10.1109/JSSC.2019.2894360.
[9] D. A. Villamizar, D. G. Muratore, J. B. Wieser and B. Murmann, “An 800 nW Switched-Capacitor Feature Extraction Filterbank for Sound Classification,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 68, no. 4, pp. 1578-1588, April 2021.
[10] V. Sze, Y. Chen, T. Yang and J. S. Emer, “Efficient Processing of Deep Neural Networks: A Tutorial and Survey,” Proceedings of the IEEE, vol. 105, no. 12, pp. 2295-2329, Dec. 2017, doi: 10.1109/JPROC.2017.2761740.
[11] X. -L. Zhang and J. Wu, “Deep Belief Networks Based Voice Activity Detection,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 4, pp. 697-710, April 2013, doi: 10.1109/TASL.2012.2229986.
[12] X. -L. Zhang and D. Wang, “Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 2, pp. 252-264, Feb. 2016, doi: 10.1109/TASLP.2015.2505415.
[13] U. Mukherjee, T. Halder, A. Kannan, S. Ghosh and S. Pavan, “A 28.5µW All-Analog Voice-Activity Detector,” Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), 2021, pp. 1-5, doi: 10.1109/ISCAS51556.2021.9401504.
[14] E. Shi, X. Tang and K. P. Pun, “A 270 nW Switched-Capacitor Acoustic Feature Extractor for Always-On Voice Activity Detection,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 68, no. 3, pp. 1045-1054, March 2021, doi: 10.1109/TCSI.2020.3040020.
[15] M. Cho et al., “A 142nW Voice and Acoustic Activity Detection Chip for mm-Scale Sensor Nodes Using Time-Interleaved Mixer-Based Frequency Scanning,” IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Paper, 2019, pp. 278-280, doi: 10.1109/ISSCC.2019.8662540.
[16] F. Chen, K. -F. Un, W. -H. Yu, P. -I. Mak and R. P. Martins, “A 108nW 0.8mm2 Analog Voice Activity Detector (VAD) Featuring a Time-Domain CNN as a Programmable Feature Extractor and a Sparsity-Aware Computational Scheme in 28nm CMOS,” IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Paper, 2022, pp. 1-3, doi: 10.1109/ISSCC42614.2022.9731720.
[17] X. Zou, X. Xu, L. Yao and Y. Lian, “A 1-V 450-nW Fully Integrated Programmable Biomedical Sensor Interface Chip,” IEEE Journal of Solid-State Circuits, vol. 44, no. 4, pp. 1067-1077, April 2009, doi: 10.1109/JSSC.2009.2014707.