| 研究生: |
古光輝 Gu, Gaung-Hui |
|---|---|
| 論文名稱: |
泛在環境與具強健性之非文字相關語者辨識及SVM/SMO演算法之FPGA設計與實現 Ubiquitous and Robust Text-Independent Speaker Recognition and FPGA Implementation for SMO algorithm of SVM |
| 指導教授: |
王駿發
Wang, Jhing-Fa |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2008 |
| 畢業學年度: | 96 |
| 語文別: | 英文 |
| 論文頁數: | 75 |
| 中文關鍵詞: | 語者辨識 、支援向量機 |
| 外文關鍵詞: | SVM, Speaker Recognition |
| 相關次數: | 點閱:84 下載:4 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
基於SVM分類之語者辨識,我們提出具有強健性之非文字相關語者辨識系統應用於泛場環境中。將6隻麥克風分佈在泛場環境來接收不同位置之聲源,發話的語者將不受麥可風距離之限制並可隨意走動。利用混合器(Mixer)將多隻麥克風所接收到的聲源混合成單一個音源,利用SNR-Aware Subspace Speech Enhancement技術來抑制泛場環境中的噪音,並以Multi-Class SVM取代傳統的GMMs分類方法來進行聲音的訓練及測試。由實驗結果得知,所提出之架構應用於泛場環境中辨識率高達97.2%。
再者,我們將SVM之SMO演算法依硬體電路平行計算特性,分割成六個STEP實現成硬體電路並以FPGA來驗證。提出了有效率的使用快取與節省一半以上快取空間的方法來減少Kernel Function計算的時間,以及實現與演算法相同的啟發式選擇之硬體電路用來減少SVM訓練時間,由PC上實驗結果,採用啟發式選擇可節省2.17倍以上的訓練時間。由FPGA驗證結果得知,使用快取比未使用快取的訓練時間可以減少53%,且應用於泛場環境中辨識率高達92.5%。
A novel architecture for ubiquitous and robust of text-independent speaker recognition based on SVM approach is proposed. In this architecture, multiple far-field microphones of configuration is adopted to receive the pervasive speech signals, and the distance effect between speaker and microphone is supposed to be ignored. Then the multi-channel speech signals are added together through a mixer. In a ubiquitous computing environment, the received speech signal is usually heavily corrupted by background noises. An SNR-aware subspace speech of enhancement approach is used as a pre-processing to enhance the mixed informational signal as well as suppressing the noise. Considering the text-independent speaker recognition, this proposed work applies multi-class support vectors machine (SVM) instead of using conventional Gaussian mixture models (GMMs). In our experiments, the speaker recognition rate up to 97.2% with the proposed ubiquitous architecture of speaker recognition system.
Additionally, we proposed a hardware realization of speaker identification system based on sequential minimal optimization (SMO) algorithm of SVM. We also proposed more efficient method of cache table utilization, and intend to save more then one half of cache table space as well as to reduce processing time of kernel function. Moreover, the heuristics selection method of SMO algorithm is implemented into hardware design to reduce the training time. In our experiments, the training time can reduce 2.17 times less than non-use of heuristics selection method on PC. And our finding shows that the identification ratio up to 92.5% of accuracy and reduced 53% of training time in hardware implementation.
[1] D. Reynold and R.C. Rose, “Robust Text Independent Speaker Identification Using Gaussian Mixture Speaker Models,” Proc. IEEE Tran. Speech and Audio Processing, vol. 3, Jan. 1995, pp. 72-83.
[2] Lukáˇs Burget, Pavel Matˇejka, Petr Schwarz, Member, Ondrˇej Glembek, Student, and Jan Honza Cˇ ernocký, “Analysis of Feature Extraction and Channel Compensation in a GMM Speaker Recognition System,” IEEE transactions on speech, audio and language processing, vol. 15, no. 7, pp. 1979-1985, september 2007.
[3] Mikyong Ji, Sungtak Kim, Hoirin Kim, Member, IEEE, Keun-Chang Kwak, and Young-Jo Cho, “Reliable Speaker Identification Using Multiple Microphones in Ubiquitous Robot Companion Environment,” 16th IEEE International Conference on Robot & Human Interactive Communication, 2007.
[4] Qin Jin, Tanja Schultz, and Alex Waibel, “Far-Field Speaker Recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 7, Sep. 2007.
[5] Wan Vincent and Renals Steve, “Speaker verification using sequence discriminant support vector machines,” IEEE transactions on speech and audio processing, vol. 13, No. 2, march 2005.
[6] William M. Campbell, Joseph P. Campbell, Terry P. Gleason, Douglas A. Reynolds, and Wade Shen, “Speaker Verification Using Support Vector Machines and High-Level Features,” IEEE transactions on speech , audio and language processing, vol. 15, no. 7, september 2007.
[7] J.C. Wang, C.H.Yang, J.F. Wang, and H.P. Lee, “Robust speaker identification and verification,” IEEE Compu. Intell. Mag., pp.52-59, May 2007.
[8] C. M. Bishop, Pattern Recognition and Machine Learning, New York, NY :Springer Science+Business Media, pp. 325-358, 2006.
[9] D. Anguita, A. Boni, and S. Ridella. “A digital architecture for support vector machines: Theory, algorithm, and FPGA implementation,” IEEE Transactions on Neural Networks, 14(5):993–1009, September 2003.
[10] Woo-Yong Choi, Dosung Ahn, Sung Bum Pan, Kyo Il Chung, Yongwha Chung, and Sang-Hwa Chung, “SVM-Based Speaker Verification System for Match-on-Card and Its Hardware Implementation,” ETRI Journal, vol.28, no.3, pp.320-328, June 2006.
[11] A. Janin, D. Baron, J. Edwards, D. Ellis, D. Gelbart, N. Morgan, B. Peskin, T. Pfau, E. Shriberg, A. Stolcke, and C. Wooters, “The ICSI meeting corpus,” in Proc. ICASSP, 2003, pp. I-364–I-367.
[12] S. Y. Lung, “Wavelet feature selection based neural networks with application to the text independent speaker identification,” Pattern Recognition. vol. 39, pp. 1518–1521, Feb. 2006.
[13] Ephraim, Y. and Van Trees, H. L.: “A signal subspace approach for speech enhancement”. IEEE Transactions on Speech and Audio Processing. vol. 3, no. 4, pp. 251–266, July 1995.
[14] Wang Jia-Ching, Lee Hsiao-Ping, Wang Jhing-Fa, and Yang Chung-Hsien,: Critical Band Subspace-Based Speech Enhancement Using SNR and Auditory Masking Aware Technique. IEICE Transactions on Information and Systems. vol. E90-D, no. 7, pp. 1055–1062, July 2007.
[15] Rabiner, L. R. and Schafer, R. W.: Digital Processing of Speech Recognition Signals. Prentice-Hall Co. Ltd, 1978.
[16] Huang, X., Acero, A. and Hon, H.: Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice-Hall Co. Ltd, 2001.
[17] C. Cortes and V. Vapnik, “Support vector networks,”Machine Learning, vol. 20, pp. 273-297, 1995.
[18] V. Vapnik, The Nature of Statistical Learning Theory. New York: Springer, 1995.
[19] V. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.
[20] B. Schölkopf, S. Mika, C. Burges, P. Knirsch, K.-R. Müller, G. Rätsch, and A. Smola, ”Input space vs. feature space in kernel-based methods,” IEEE Transactions on Neural Networks, vol. 10, no. 5, pp. 1000-1017, 1999.
[21] Huanga Hui-Ling, Chang Fang-Lin: ESVM: Evolutionary support vector machine for automatic feature selection and Classification of micro array data. BioSystems 90 (2007) 516–528.
[22] .J.C.Platt, “Fast Training of Support Vector Machines using Sequential Minimal Optimization,” Advances in Kernel Methods – Support Vector Learning, pp. 185-208, 1999.
[23] .J.C.Platt, “Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines.” Technical Report MSR-TR-98-14, Microsoft Research, 1998.
[24] LIBSVM: Software tool for support vector classification, develop by CSIE, NTU July, 2007, Website: http://www.csie.ntu.edu.tw/ ~cjlin.