簡易檢索 / 詳目顯示

研究生: 蔣孟儒
Chiang, Meng-Ju
論文名稱: 一個應用於語音活動檢測之切換式電容聲音特徵萃取器
A Switched-Capacitor Based Acoustic Feature Extractor for Voice Activity Detection
指導教授: 張順志
Chang, Soon-Jyh
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 101
中文關鍵詞: 語音活動檢測聲音特徵萃取器開關式電容濾波器運算放大器共享技術
外文關鍵詞: voice activity detection, acoustic feature extractor, switched-capacitor filter, op-amp sharing technique
相關次數: 點閱:44下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文提出了一種應用於語音活動檢測之切換式電容聲音特徵萃取器設計。為了克服PVT變化衍生的問題,在系統中選擇開關式電容濾波器來實現電路,也透過運算放大器共享技術來降低功率消耗。除此之外,將頻帶從16個減少到6個,系統可以在不需要犧牲語音活動偵測正確率的情況下進一步降低功率消耗。
    本次研究總共實現了兩顆晶片,包含了前端的特徵萃取電路以及整合型的語音活動檢測器。第一顆特徵萃取晶片以台積電180奈米CMOS製程製造,晶片中核心電路面積為0.65 mm2,在1.8伏特供應電壓下,功率消耗為2 μW。使用基於類神經網路實現的軟體端分類器對聲音以及噪音做辨識。在3dB信噪比的輸入訊號下,語音成功判斷率為97%,噪音誤觸率為3.8%,延遲時間為30毫秒。
    第二顆整合型語音活動檢測器晶片,實作包含了特徵萃取電路與使用類神經網絡實現的分類器。晶片以台積電180奈米CMOS製程製造,晶片中核心電路面積為1.104 mm2,在1.8伏特供應電壓下,功率消耗為2.66 μW。在3dB信噪比的輸入訊號下,語音成功判斷率為97%,噪音誤觸率為3.4%,延遲時間為32毫秒。

    This thesis presents a low-power acoustic feature extractor (AFE) for voice activity detection (VAD). To overcome the effect of Process-Voltage-Temperature (PVT) variations, a switched-capacitor (SC) band-pass filter (BPF) with the op-amp sharing technique is presented in this work to save power consumption. In addition, further reduction on power consumption is achieved by reducing the frequency bands from 16 to 6 without scarifying the voice activity detection accuracy of the whole system.
    Two proof-of-concept chips are realized in this work, including an analog feature extractor (AFE) circuit and an integrated voice activity detector (VAD). The AFE chip is fabricated in 180-nm CMOS process, has a core area of 0.65 mm2 and consumes 2 μW under a 1.8-V supply voltage. Using a software neural network based classifier to distinguish the voice from background noise, it achieves an average of 97% voice detected as voice (VDV) and 3.8% noise detected as voice (NDV) at a 3dB signal-to-noise ratio (SNR) with 30 ms latency.
    The VAD chip merges the AFE with a hardware classifier. The classifier is implemented based on an in-memory-computation neural network. The VAD chip is also fabricated in a 180-nm CMOS process, has a core area of 1.104 mm2 and consumes 2.66 μW under a 1.8-V supply voltage. Experiment results show that the proof-of-concept prototype achieves an average of 97% VDV and 3.4% NDV at a 3dB SNR with 32 ms latency.

    摘 要 I Abstract II List of Tables VIII List of Figures IX Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Thesis Organization 3 Chapter 2 Fundamentals of Voice Activity Detection 4 2.1 Overview of Speech Recognition 4 2.2 Specification of Voice Activity Detection 7 2.2.1 Confusion Matrix 7 2.2.2 Voice Detected as Voice 9 2.2.3 Noise Detected as Voice 10 2.2.4 Latency 10 2.3 Fundamental of Feature Extraction 12 2.3.1 Application of Fast Fourier Transform 12 2.3.2 Mel-Frequency Cepstral Coefficients 17 2.4 Machine Learning Based Classifier 25 2.4.1 Support Vector Machine 25 2.4.2 Neural Network 31 2.5 Architecture of Voice Activity Detection 38 2.5.1 Development of VAD Architecture 38 2.5.2 State-of-the-art VAD 40 Chapter 3 A Switched-Capacitor Based Feature Extractor for Voice Activity Detection 49 3.1 Introduction 50 3.2 The Architecture of Proposed Feature Extractor 51 3.3 Circuit Implementation 57 3.3.1 Low Noise Amplifier 57 3.3.2 Switched-Capacitor Band-Pass Filter 60 3.3.3 Averaging Circuit 64 Chapter 4 Simulation and Measurement Results 67 4.1 Acoustic Feature Extractor 67 4.1.1 Percent Error of Output Feature 67 4.1.2 Layout and Chip Floor Plan 71 4.1.3 Simulation Results 74 4.1.4 Die Micrograph and Measurement Setup 78 4.1.5 Measurement Results 80 4.2 Voice Activity Detection 85 4.2.1 Layout and Chip Floor Plan 85 4.2.2 Simulation Results 88 4.2.3 Die Micrograph and Measurement Setup 91 4.2.4 Measurement Results 94 Chapter 5 Conclusion and Future Works 97 Reference 99

    [1] K. M. H. Badami, S. Lauwereins, W. Meert and M. Verhelst, “A 90 nm CMOS, 6 μW Power-Proportional Acoustic Sensing Frontend for Voice Activity Detection, ” IEEE Journal of Solid-State Circuits, vol. 51, no. 1, pp. 291-302, January 2016, doi: 10.1109/JSSC.2015.2487276.
    [2] Aniruddha Bhandari, “Everything you Should Know about Confusion Matrix for Machine Learning,” April 2020 [Online]. Available: https://www.analyticsvidhya.com/blog/2020/04/confusion-matrix-machine-learning/
    [3] P. P. Dahake, K. Shaw and P. Malathi, “Speaker dependent speech emotion recognition using MFCC and Support Vector Machine,” Proceedings of the International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT), 2016, pp. 1080-1084, doi: 10.1109/ICACDOT.2016.7877753.
    [4] T. Barbu, “Comparing various voice recognition techniques,” Proceedings of the 5-th Conference on Speech Technology and Human Computer Dialogue, 2009 pp. 1 – 6.
    [5] Roger Jang, “Audio Signal Processing and Recognition,” [Online]. Available: http://mirlab.org/jang/books/audioSignalProcessing/
    [6] Y. X. Zou, W. Q. Zheng, W. Shi and H. Liu, “Improved Voice Activity Detection based on support vector machine with high separable speech feature vectors,” Proceedings of the 19th International Conference on Digital Signal Processing, 2014, pp. 763-767, doi: 10.1109/ICDSP.2014.6900767.
    [7] J. Dey, M. S. Bin Hossain and M. A. Haque, “An Ensemble SVM-based Approach for Voice Activity Detection,” Proceedings of the 10th International Conference on Electrical and Computer Engineering, 2018, pp. 297-300, doi: 10.1109/ICECE.2018.8636745.
    [8] M. Yang, C. -H. Yeh, Y. Zhou, J. P. Cerqueira, A. A. Lazar and M. Seok, “Design of an Always-On Deep Neural Network-Based 1-μW Voice Activity Detector Aided With a Customized Software Model for Analog Feature Extraction,” IEEE Journal of Solid-State Circuits, vol. 54, no. 6, pp. 1764-1777, June 2019, doi: 10.1109/JSSC.2019.2894360.
    [9] D. A. Villamizar, D. G. Muratore, J. B. Wieser and B. Murmann, “An 800 nW Switched-Capacitor Feature Extraction Filterbank for Sound Classification,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 68, no. 4, pp. 1578-1588, April 2021.
    [10] V. Sze, Y. Chen, T. Yang and J. S. Emer, “Efficient Processing of Deep Neural Networks: A Tutorial and Survey,” Proceedings of the IEEE, vol. 105, no. 12, pp. 2295-2329, Dec. 2017, doi: 10.1109/JPROC.2017.2761740.
    [11] X. -L. Zhang and J. Wu, “Deep Belief Networks Based Voice Activity Detection,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 4, pp. 697-710, April 2013, doi: 10.1109/TASL.2012.2229986.
    [12] X. -L. Zhang and D. Wang, “Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 2, pp. 252-264, Feb. 2016, doi: 10.1109/TASLP.2015.2505415.
    [13] U. Mukherjee, T. Halder, A. Kannan, S. Ghosh and S. Pavan, “A 28.5µW All-Analog Voice-Activity Detector,” Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), 2021, pp. 1-5, doi: 10.1109/ISCAS51556.2021.9401504.
    [14] E. Shi, X. Tang and K. P. Pun, “A 270 nW Switched-Capacitor Acoustic Feature Extractor for Always-On Voice Activity Detection,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 68, no. 3, pp. 1045-1054, March 2021, doi: 10.1109/TCSI.2020.3040020.
    [15] M. Cho et al., “A 142nW Voice and Acoustic Activity Detection Chip for mm-Scale Sensor Nodes Using Time-Interleaved Mixer-Based Frequency Scanning,” IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Paper, 2019, pp. 278-280, doi: 10.1109/ISSCC.2019.8662540.
    [16] F. Chen, K. -F. Un, W. -H. Yu, P. -I. Mak and R. P. Martins, “A 108nW 0.8mm2 Analog Voice Activity Detector (VAD) Featuring a Time-Domain CNN as a Programmable Feature Extractor and a Sparsity-Aware Computational Scheme in 28nm CMOS,” IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Paper, 2022, pp. 1-3, doi: 10.1109/ISSCC42614.2022.9731720.
    [17] X. Zou, X. Xu, L. Yao and Y. Lian, “A 1-V 450-nW Fully Integrated Programmable Biomedical Sensor Interface Chip,” IEEE Journal of Solid-State Circuits, vol. 44, no. 4, pp. 1067-1077, April 2009, doi: 10.1109/JSSC.2009.2014707.

    下載圖示 校內:2025-08-12公開
    校外:2025-08-12公開
    QR CODE