成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	蔣孟儒 Chiang, Meng-Ju
論文名稱：	一個應用於語音活動檢測之切換式電容聲音特徵萃取器 A Switched-Capacitor Based Acoustic Feature Extractor for Voice Activity Detection
指導教授：	張順志 Chang, Soon-Jyh
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2022
畢業學年度：	110
語文別：	英文
論文頁數：	101
中文關鍵詞：	語音活動檢測、聲音特徵萃取器、開關式電容濾波器、運算放大器共享技術
外文關鍵詞：	voice activity detection, acoustic feature extractor, switched-capacitor filter, op-amp sharing technique
相關次數：	點閱：44 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文提出了一種應用於語音活動檢測之切換式電容聲音特徵萃取器設計。為了克服PVT變化衍生的問題，在系統中選擇開關式電容濾波器來實現電路，也透過運算放大器共享技術來降低功率消耗。除此之外，將頻帶從16個減少到6個，系統可以在不需要犧牲語音活動偵測正確率的情況下進一步降低功率消耗。
本次研究總共實現了兩顆晶片，包含了前端的特徵萃取電路以及整合型的語音活動檢測器。第一顆特徵萃取晶片以台積電180奈米CMOS製程製造，晶片中核心電路面積為0.65 mm2，在1.8伏特供應電壓下，功率消耗為2 μW。使用基於類神經網路實現的軟體端分類器對聲音以及噪音做辨識。在3dB信噪比的輸入訊號下，語音成功判斷率為97%，噪音誤觸率為3.8%，延遲時間為30毫秒。
第二顆整合型語音活動檢測器晶片，實作包含了特徵萃取電路與使用類神經網絡實現的分類器。晶片以台積電180奈米CMOS製程製造，晶片中核心電路面積為1.104 mm2，在1.8伏特供應電壓下，功率消耗為2.66 μW。在3dB信噪比的輸入訊號下，語音成功判斷率為97%，噪音誤觸率為3.4%，延遲時間為32毫秒。

This thesis presents a low-power acoustic feature extractor (AFE) for voice activity detection (VAD). To overcome the effect of Process-Voltage-Temperature (PVT) variations, a switched-capacitor (SC) band-pass filter (BPF) with the op-amp sharing technique is presented in this work to save power consumption. In addition, further reduction on power consumption is achieved by reducing the frequency bands from 16 to 6 without scarifying the voice activity detection accuracy of the whole system.
Two proof-of-concept chips are realized in this work, including an analog feature extractor (AFE) circuit and an integrated voice activity detector (VAD). The AFE chip is fabricated in 180-nm CMOS process, has a core area of 0.65 mm2 and consumes 2 μW under a 1.8-V supply voltage. Using a software neural network based classifier to distinguish the voice from background noise, it achieves an average of 97% voice detected as voice (VDV) and 3.8% noise detected as voice (NDV) at a 3dB signal-to-noise ratio (SNR) with 30 ms latency.
The VAD chip merges the AFE with a hardware classifier. The classifier is implemented based on an in-memory-computation neural network. The VAD chip is also fabricated in a 180-nm CMOS process, has a core area of 1.104 mm2 and consumes 2.66 μW under a 1.8-V supply voltage. Experiment results show that the proof-of-concept prototype achieves an average of 97% VDV and 3.4% NDV at a 3dB SNR with 32 ms latency.

摘  要	I
Abstract	II
List of Tables	VIII
List of Figures	IX
Chapter 1 Introduction	1
1　Motivation	1
2　Thesis Organization	3
Chapter 2 Fundamentals of Voice Activity Detection	4
1　Overview of Speech Recognition	4
2　Specification of Voice Activity Detection	7
2.1　Confusion Matrix	7
2.2　Voice Detected as Voice	9
2.3　Noise Detected as Voice	10
2.4　Latency	10
3　Fundamental of Feature Extraction	12
3.1　Application of Fast Fourier Transform	12
3.2　Mel-Frequency Cepstral Coefficients	17
4　Machine Learning Based Classifier	25
4.1　Support Vector Machine	25
4.2　Neural Network	31
5　Architecture of Voice Activity Detection	38
5.1　Development of VAD Architecture	38
5.2　State-of-the-art VAD	40
Chapter 3 A Switched-Capacitor Based Feature Extractor for Voice Activity Detection	49
1　Introduction	50
2　The Architecture of Proposed Feature Extractor	51
3　Circuit Implementation	57
3.1　Low Noise Amplifier	57
3.2　Switched-Capacitor Band-Pass Filter	60
3.3　Averaging Circuit	64
Chapter 4 Simulation and Measurement Results	67
1　Acoustic Feature Extractor	67
1.1　Percent Error of Output Feature	67
1.2　Layout and Chip Floor Plan	71
1.3　Simulation Results	74
1.4　Die Micrograph and Measurement Setup	78
1.5　Measurement Results	80
2　Voice Activity Detection	85
2.1　Layout and Chip Floor Plan	85
2.2　Simulation Results	88
2.3　Die Micrograph and Measurement Setup	91
2.4　Measurement Results	94
Chapter 5 Conclusion and Future Works	97
Reference	99
                                    

[1] K. M. H. Badami, S. Lauwereins, W. Meert and M. Verhelst, “A 90 nm CMOS, 6 μW Power-Proportional Acoustic Sensing Frontend for Voice Activity Detection, ” IEEE Journal of Solid-State Circuits, vol. 51, no. 1, pp. 291-302, January 2016, doi: 10.1109/JSSC.2015.2487276.
[2] Aniruddha Bhandari, “Everything you Should Know about Confusion Matrix for Machine Learning,” April 2020 [Online]. Available: https://www.analyticsvidhya.com/blog/2020/04/confusion-matrix-machine-learning/
[3] P. P. Dahake, K. Shaw and P. Malathi, “Speaker dependent speech emotion recognition using MFCC and Support Vector Machine,” Proceedings of the International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT), 2016, pp. 1080-1084, doi: 10.1109/ICACDOT.2016.7877753.
[4] T. Barbu, “Comparing various voice recognition techniques,” Proceedings of the 5-th Conference on Speech Technology and Human Computer Dialogue, 2009 pp. 1 – 6.
[5] Roger Jang, “Audio Signal Processing and Recognition,” [Online]. Available: http://mirlab.org/jang/books/audioSignalProcessing/
[6] Y. X. Zou, W. Q. Zheng, W. Shi and H. Liu, “Improved Voice Activity Detection based on support vector machine with high separable speech feature vectors,” Proceedings of the 19th International Conference on Digital Signal Processing, 2014, pp. 763-767, doi: 10.1109/ICDSP.2014.6900767.
[7] J. Dey, M. S. Bin Hossain and M. A. Haque, “An Ensemble SVM-based Approach for Voice Activity Detection,” Proceedings of the 10th International Conference on Electrical and Computer Engineering, 2018, pp. 297-300, doi: 10.1109/ICECE.2018.8636745.
[8] M. Yang, C. -H. Yeh, Y. Zhou, J. P. Cerqueira, A. A. Lazar and M. Seok, “Design of an Always-On Deep Neural Network-Based 1-μW Voice Activity Detector Aided With a Customized Software Model for Analog Feature Extraction,” IEEE Journal of Solid-State Circuits, vol. 54, no. 6, pp. 1764-1777, June 2019, doi: 10.1109/JSSC.2019.2894360.
[9] D. A. Villamizar, D. G. Muratore, J. B. Wieser and B. Murmann, “An 800 nW Switched-Capacitor Feature Extraction Filterbank for Sound Classification,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 68, no. 4, pp. 1578-1588, April 2021.
[10] V. Sze, Y. Chen, T. Yang and J. S. Emer, “Efficient Processing of Deep Neural Networks: A Tutorial and Survey,” Proceedings of the IEEE, vol. 105, no. 12, pp. 2295-2329, Dec. 2017, doi: 10.1109/JPROC.2017.2761740.
[11] X. -L. Zhang and J. Wu, “Deep Belief Networks Based Voice Activity Detection,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 4, pp. 697-710, April 2013, doi: 10.1109/TASL.2012.2229986.
[12] X. -L. Zhang and D. Wang, “Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 2, pp. 252-264, Feb. 2016, doi: 10.1109/TASLP.2015.2505415.
[13] U. Mukherjee, T. Halder, A. Kannan, S. Ghosh and S. Pavan, “A 28.5µW All-Analog Voice-Activity Detector,” Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), 2021, pp. 1-5, doi: 10.1109/ISCAS51556.2021.9401504.
[14] E. Shi, X. Tang and K. P. Pun, “A 270 nW Switched-Capacitor Acoustic Feature Extractor for Always-On Voice Activity Detection,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 68, no. 3, pp. 1045-1054, March 2021, doi: 10.1109/TCSI.2020.3040020.
[15] M. Cho et al., “A 142nW Voice and Acoustic Activity Detection Chip for mm-Scale Sensor Nodes Using Time-Interleaved Mixer-Based Frequency Scanning,” IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Paper, 2019, pp. 278-280, doi: 10.1109/ISSCC.2019.8662540.
[16] F. Chen, K. -F. Un, W. -H. Yu, P. -I. Mak and R. P. Martins, “A 108nW 0.8mm2 Analog Voice Activity Detector (VAD) Featuring a Time-Domain CNN as a Programmable Feature Extractor and a Sparsity-Aware Computational Scheme in 28nm CMOS,” IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Paper, 2022, pp. 1-3, doi: 10.1109/ISSCC42614.2022.9731720.
[17] X. Zou, X. Xu, L. Yao and Y. Lian, “A 1-V 450-nW Fully Integrated Programmable Biomedical Sensor Interface Chip,” IEEE Journal of Solid-State Circuits, vol. 44, no. 4, pp. 1067-1077, April 2009, doi: 10.1109/JSSC.2009.2014707.

校內：2025-08-12公開
校外：2025-08-12公開

簡易檢索 / 詳目顯示

相關論文