| 研究生: |
官大文 Kuan, Ta-Wen |
|---|---|
| 論文名稱: |
語者識別之超大型積體電路架構設計及其在智慧玄關應用之研究與實現 The Research and Realization of Speaker Recognition on VLSI Architecture Design and Intelligent Porch Application |
| 指導教授: |
王駿發
Wang, Jhing-Fa |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2012 |
| 畢業學年度: | 100 |
| 語文別: | 英文 |
| 論文頁數: | 98 |
| 中文關鍵詞: | 語者識別 、語者驗證 、支援向量機 、序列最小最佳化 、超大型積體電路架構設計 、軟硬體共同設計 、智慧玄關 、生物特徵融合 、多模生物辨識 |
| 外文關鍵詞: | Speaker Recognition, Support Vector Machine, Sequential Minimal Optimization, VLSI Design, SoC Design, Hardware/Software Co-design, Smart Home, Intelligent Porch System, Multi-modal Bio-metric System, Audio and Video experts, Embedded system |
| 相關次數: | 點閱:215 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著智慧生活的概念逐漸普及,智慧生活中的行動智慧裝置(如智慧型手機,PAD及筆記型電腦等等),體積愈來愈小,功能愈來愈強,讓使用者生活便捷,成為生活中不可或缺的一部分;同時智慧生活中的智慧家庭,各項智慧生活家電和門禁安全系統,讓使用者居家備感安全舒適及便捷。語者識別,扮演著行動智慧裝置擁有者的身分確認;及家庭智慧玄關系統,家庭成員身分驗證及識別的重要角色。本論文主要在探討語者識別應用於超大型積體電路架構設計及智慧玄關應用之研究與實現。
語者識別,包含語者辨識及語者驗證。語者辨識是屬於封閉集的有限語者的身分識別,識別過程中分數最高者,可被判別為目標語者。語者驗證是開放集的語者的身分驗證,在已知語者身分條件下,驗證語者聲學特性的是否高於特定門檻值的,以判別語者身分的信心程度。為了增進語者識別在實務應用的程度,本論文同時探討語者識別結合特定語句及非特定語句的語音辨識,以提高語者識別的正確率。
在研究過程中,我們首先在實驗室天花板,分散架設六支收音麥克風,建立一個類似居家,無所不在遠距的收音環境;並在個人電腦平台建立以支援向量基SVM分類器為主之語者辨識軟體模擬系統,以測試遠距聲學環境之語者辨識效能。接下來語者識別分兩個方向研究,一為硬體部分研究與實現,另一為奇美體驗屋的智慧玄關研究與實現。硬體部分包含實現於ARM板的嵌入式系統單晶片架構設計、FPGA板的超大型積體電路架構設計及ARM+FPGA的軟硬體共同設計架構。智慧玄關部分,語者識別結合特定語句之語音辨識,以提高語者識別的可用性。進一步,利用生物特徵之融合方法,除原有語者識別及語音辨識外,同時結合遠距人臉及身高影像辨識,以提高多模生物辨識系統應用於家庭智慧玄關身分辨識的強健性。
語者識別,可分為語者模型訓練及語者辨識分類二階段。為加速訓練語者模型,我們使用”支援向量機”之”序列最小最佳化”方法來實現。在語者辨識分類部分,採用一對一多類別方法來提高辨識率。為進一步提高語者模型訓練的效能,我們將”支援向量機”之”序列最小最佳化”方法,實現於超大型積體電路架構設計。實驗結果顯示,以超大型積體電路架構設計之語者模型訓練,訓練效能有顯著提升,同時語者辨識率幾乎與個人電腦軟體模擬結果相近。
In smart life, the development of smart portable devices and smart home appliances, have attracted the researchers to improve in their tiny size, high performance, interactive application, and powerful functionality. The speaker recognition plays the important role for the owner recognition in mobile device, and the enrollment authentication at smart home.
In this dissertation, we explore the speaker recognition in two fields, that is, the hardware implementation and the smart home application. In hardware realization, multiple platforms, such as ARM platform, FPGA platform and ARM+FPGA platform, are adopted to explore the speaker recognition, and realize into the embedded SoC system, VLSI architecture design and Hardware/Software co-design. In smart home, the speaker recognition is investigated in intelligent porch system to attain the nature way for home user authentication and to interact smartly with home appliances. However, the adverse and mismatch conditions influence the speaker expert, therefore, the speaker expert is proposed to fuse with other human cues, such as, speech expert, face expert and height detector, to reach the multi-modal and biometric recognition system for smart home.
In general, the speaker recognition can be categorized in two modalities, i.e. speaker identification and speaker verification. The speaker identification scores and determines the target speaker’s identity from unknown speaker in a close set of trained models, whereas the speaker verification verifies the claimed voice with corresponding claimed identity, through a confident threshold to determine the target speaker, such a task can be regarded as an open set.
Two critical phases are commonly addressed in speaker recognition, that is, model training and speaker recognition. Generally, the model training is time-consuming particularly in mobile device. This motives us to examine the training phase in hardware implementation to accelerate the training performance. In this dissertation, the Support Vector Machine (SVM) is exhibited for the speaker model training and classification, and the Sequential Minimum Optimization (SMO) algorithm in SVM, is used to accelerate the speaker model training. In order to realize the complex SMO algorithm on multiple hardware platforms, the SMO algorithm is analyzed and modified prior to the feasible steps and blocks, and then realized on several hardware platforms. The experimental results show that the VLSI design of SMO algorithm indeed accelerates the training speed, and the accuracy in speaker identification has no big difference compared with software simulation.
[1] D. A. Reynolds, “An Overview of Automatic Speaker Recognition Technology,” in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, vol. 4, Orlando, FL, 2002, pp. 4072–4075..
[2] W. M. Campbell, J. P. Campbell, T. P. Gleason, D. A. Reynolds, and W. Shen, “Speaker verification using support vector machines and high-level features,” IEEE Trans. Speech, Audio Lang. Process., vol. 15, no. 7, pp. 2085–2093, Sep. 2007.
[3] H. L. Huang and F. L. Chang, “Evolutionary support vector machine for automatic feature selection and classification of micro array data,” Elsevier Bioinform. BioSyst., vol. 90, no. 2, pp. 516–528, Sep. 2007.
[4] P. H. Chen, R. E. Fan, and C. J. Lin, “A study on SMO-type decomposition method for support vector machines,” IEEE Trans. Neural Netw., vol. 17, no. 4, pp. 893–908, Jul. 2006.
[5] N. Takahashi, J. Guo, and T. Nishi, “Global convergence of SMO algorithm for support vector regression,” IEEE Trans. Neural Netw., vol. 19, no. 6, pp. 1362–1369, Jun. 2008.
[6] A. Billionnet and S. Elloumi, “Using a mixed integer quadratic programming solver for the unconstrained quadratic 0–1 problem,” Math. Program., vol. 109, pp. 55–68, 2007.
[7] J. R. Bunch and L. Kaufman, “A computational method for the indefinite quadratic programming problem,” Linear Algebra Its Appl. 34, pp. 341–369, 1980.
[8] J. C. Platt, “Using sparseness and analytic QP to speed training of support vector machines,” in Advances in Neural Information Processing Systems 11, M. S. Kearns, S. A. Solla, and D. A. Cohn, Eds. Cambridge, MA: MIT Press, 1999.
[9] M. S. Bazaraa, H. D. Sherali, and C. M. Shetty, Nonlinear Programming: Theory and Algorithm. New York: Wiley, 1993.
[10] N. Takahashi and T. Nishi, “Rigorous proof of termination of SMO algorithm for support vector machines,” IEEE Trans. Neural Netw., vol. 16, no. 3, pp. 774–776, May 2005.
[11] D. Fradkin and I. Muchnik, “Support vector machines for classification,” DIMACS Series Discrete Math. Theoretical Comput. Sci., vol. 70, pp. 13–20, 2006.
[12] S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, and K. R. K. Murthy, “Improvements to Platt’s SMO algorithm for SVM classifier design,” Neural Comput., vol. 13, pp. 637–649, 2001.
[13] R. F. Osuna and F. Girosi, “Support vector machines: Training and applications,” Massachusetts Inst. Technol., Cambridge, AI Memo 1602, 1997b.
[14] D. Anguita, A. Boni, and S. Ridella, “A digital architecture for support vector machines: Theory, algorithm, and FPGA implementation,” IEEE Trans. Neural Netw., vol. 14, no. 5, pp. 993–1009, Sep. 2003.
[15] J. C. Platt, “Fast training of support vector machines using sequential minimal optimization,” in Advances in Kernel Methods of Support Vector Machine, B. Schölkopf, C. Burges, and A. Smola, Eds. Cambridge, MA: MIT Press, 1998.
[16] C. C. Chang, C. W. Hsu, and C.-J. Lin, “The analysis of decomposition methods for support vector machines,” IEEE Trans. Neural Netw., vol. 11, no. 4, pp. 1003–1008, Jul. 2000.
[17] T. Joachims, “Making large-scale SVM learning practical,” in Advances in Kernel Methods—Support Vector Learning, B. Schölkopf, C. J. C. Burges, and A. J. Smola, Eds. Cambridge, MA: MIT Press, 1998.
[18] J. C. Platt, “Sequential minimal optimization: A fast algorithm for training support vector machines,” Microsoft Research, Tech. Rep. MSR-TR-98-14, 1998.
[19] A. Hatch and A. Stolcke, “Generalized linear kernels for one-versus-all classification: application to speaker recognition,” in proc. of ICASSP, Toulouse, France, 2006.
[20] W. M. Campbell, J. P. Campbell, D. A. Reynolds, D. A. Jones, and T. R. Leek, “Phonetic speaker recognition with support vector machines,” in Neural Information Processing Systems 16, 2003.
[21] A. Solomonoff, C. Quillen, and W. Campbell, “Channel Compensation For SVM Speaker Recognition,” in Proc. of Odyssey: The Speaker and Language Recognition Workshop, Toledo, Spain, 2004.
[22] A. Solomonoff, W. Campbell, and I. Boardman, “Advances In Channel Compensation for SVM Speaker Recognition,” in Proc. of ICASSP, Philadelphia, PA, 2005.
[23] S. Chakrabartty and G. Cauwenberghs, “Sub-microwatt analog VLSI trainable pattern classifier,” IEEE Trans. Solid-State Circuits, vol. 42, no. 5, pp. 1169–1179, May 2007.
[24] D. Anguita, A. Boni, and S. Ridella, “Learning algorithm for nonlinear support vector machines suited for digital VLSI,” Electron. Lett., vol. 35, no. 16, pp. 1349–1350, 1999.
[25] S. Y. Peng, B. A. Minch, and P. Hasler, “Analog VLSI implementation of support vector machine learning and classification,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2008, pp. 860–863.
[26] S. Chakrabartty and G. Cauwenberghs, “Sub-microwatt analog VLSI support vector machine for pattern classification and sequence estimation,” in Adv. Neural Information Processing Systems (NIPS’2004). Cambridge, MA: MIT Press, 2005.
[27] R. Genov and G. Cauwenberghs, “Kerneltron: Support vector machine in silicon,” IEEE Trans. Neural Netw., vol. 14, no. 5, pp. 1426–1434, Sep. 2003.
[28] P. Kucher and S. Chakrabartty, “An energy-scalable margin propagation-based analog VLSI support vector machine,” in Proc. IEEE Int. Conf. Circuits Syst. (ISCAS), May 2007, pp. 1289–1292.
[29] J. Manikandan, B. Venkataramani, and V. Avanthi, “FPGA implementation of support vector machine based isolated digit recognition system,” in Proc. 22nd IEEE Int. Conf. VLSI Des., Jan. 2009, pp. 347–352.
[30] A. Fleury, M. Vacher, and N. Noury, “SVM-based multimodal classification of activities of daily living in health smart homes: Sensors, algorithms, and first experimental results,” IEEE Trans. on Information Technology in Biomedicine, vol. 14, no. 2, pp. 274–283, 2010.
[31] M. Jiméneza, H. Lamelaa, and J. Gimenoa, “An analogue circuit for sequential minimal optimization for support vector machines,” in Proc. SPIE, May 2008, vol. 6979, pp. 697909–697909-6.
[32] R. Genov, S. Chakrabartty, and G. Cauwenberghs, “Silicon support vector machine with on-line learning,” Int. J. Pattern Recog. Artificial Intell., vol. 17, no. 3, pp. 385–404, 2003.
[33] K. K. Cao, H. B. Shen, and H. F. Chen, “A parallel and scalable digital architecture for training support vector machines,” J. Zhejiang Univ. —Sci. C, vol. 11, no. 8, pp. 620–628, 2010.
[34] B. C. Catanzaro, N. Sundaram, and K. Keutzer, “Fast support vector machine training and classification on graphics processors,” in Proc. 25th Int. Conf. Mach. Learn., 2008, pp. 104–111.
[35] N. Poh and J. Kittler, “A Family of Methods for Quality-based multimodal biometric fusion using generative classifiers,” in IEEE 10th International Conference on Control, Automation, Robotics and Vision (ICARCV), Hanoi, pp. 1162–1167, 2008.
[36] L. Allano, A. C. Morris, H. Sellahewa, S. Garcia-Salicetti, J. Koreman, S. Jassim, B. Ly-Van, D. Wu, and B. Dorizzi, “Non intrusive multi-biometrics on a mobile device: a comparison of fusion techniques,” in Proc. of SPIE: Biometric Techniques for Human Identification III, 2006.
[37] S. Dass, K. Nandakumar, and A. Jain, “A principled approach to score level fusion in multimodal biometric systems,” in 5th Int’l. Conf. Audio- and Video-Based Biometric Person Authentication (AVBPA 2005), New York, pp. 1049–1058. 2005.
[38] A. Jain, K. Nandakumar, and A. Ross, “Score normalization in multimodal biometric systems,” Pattern Recognition, vol. 38, no. 12, pp. 2270–2285, 2005.
[39] K. Nandakumar, Y. Chen, S. C. Dass, and A. K. Jain, “Likelihood ratio based biometric score fusion,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 30, pp. 342–347, 2008.
[40] C. Sanderson and K. K. Paliwal, “Noise compensation in a person verification system using face and multiple speech features,” Pattern Recognition, vol. 36, no. 2, pp. 293–302, 2003.
[41] Y. Yemez, A. Kanak, E. Erzin, and A. M. Tekalp, “Multimodal speaker identification with audio-video processing,” in Proc. Int. Conf. Image Processing, Barcelona, Spain, pp. 5–8, 2003.
[42] A. Kanak, E. Erzin, Y. Yemez, and A. M. Tekalp, “Joint audio-video processing for biometric speaker identification,” in Proc. Int. Conf. Acoustic, Speech Signal Processing, Hong Kong, China, pp. 561–564, 2003.
[43] C. Sanderson and K. K. Paliwal, “Identity verification using speech and face information, ” Digital Signal Process. vol. 14, no. 5, pp. 449–480, 2004.
[44] N. A. Fox, R. Gross, J. F. Cohn, R. B. Reilly, “Robust biometric person identification using automatic classifier fusion of speech, mouth, and face Experts,” IEEE Trans. on multimedia, vol.9, pp.701-714, 2007.
[45] C. C. Chibelushi, F. Deravi, and J. S. D. Mason, “Adaptive classifier integration for robust pattern recognition,” IEEE Trans. Syst., Man, Cybern. B; Cybern., vol. 29, pp. 902–907, Dec. 1999.
[46] A. K. Jain and A. Ross, “Learning user-specific parameters in a multi-biometric system,” in Proc. Int. Conf. Image Processing., Rochester, NY, vol. 1, pp. 57–60, Sep. 22–25, 2002.
[47] V. Radov´a, J. Psutka,” An approach to speaker identification using multiple classifiers,” IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP), Munich, Germany, vol. 2, pp. 1135-1138, 1997.
[48] P. K. Varshney. Distributed Detection and Data Fusion, Springer-Verlag, 1997, New York.
[49] P. S. Aleksic and A. K. Katsaggelos, “Audio-visual biometrics,” Proceedings of the IEEE, 94(11), pp. 2025–2044, 2006.
[50] A. Ross, K. Nandakumar and A. K. Jain, Introduction to Multibiometrics, in Handbook of Biometrics.
[51] A. Ross. ,”An introduction to multibiometrics,” in Proceedings of the 15th European Signal Processing Conference (EUSIPCO), Poznan, Poland, pp. 20–24, 2007.
[52] S. W. Kim, M. C. Kim, S. H. Park, Y. K. Jin, W, S, Choi, “Gate reminder: a design case of a smart reminder,” in Proceedings of conference on Designing interactive systems: processes, practices, methods, and techniques, Cambridge, MA, USA, 2004.
[53] L. Burget, P. Matĕjka, P. Schwarz, O. Glembek, and J.H. Cĕrnocký,” Analysis of Feature Extraction and Channel Compensation in a GMM Speaker Recognition System”, IEEE transactions on speech, audio and language processing 15(7), 1979–1985 (2007)
[54] L. R. Rabiner, R. W. Schafer: Digital Processing of Speech Recognition Signals. Prentice-Hall Co. Ltd, Englewood Cliffs (1978)
[55] X. Huang, A. Acero and H. Hon: Spoken Language Processing: A Guide to Theory, Algorithmand System Development. Prentice-Hall Co. Ltd, Englewood Cliffs (2001)
[56] M. A. Aizerman,; E. M. Braverman, and L. I. Rozonoer, "Theoretical foundations of the potential function method in pattern recognition learning". Automation and Remote Control vol. 25, pp.821–837. 1964.
[57] B. E. Boser, I. M. Guyon, and V. N. Vapnik,” A training algorithm for optimal margin classifiers”, In Haussler, David (editor); 5th Annual ACM Workshop on COLT Pittsburgh, PA, pp. 144–152, 1992.
[58] N. S. Andrew, S. Dominic and W. Chris: ARM System Developer’s Guide: Designing and Optimizing System Software.
[59] C. L. Hart and J. S. Jang: Speech Recognition on 32-bit Fixed-point Processors: Implementation & Discussions, Master’s Thesis, Tsing Hua University, Hsinchu City, Taiwan (2005)
[60] J. F. Wang, T. W. Kuan, J. C. Wang, and G. H. Gu, ”Ubiquitous and robust text- independent speaker recognition for home automation digital life’, In: Sandnes, F.E., Zhang, Y., Rong, C., Yang, L.T., Ma, J. (eds.) UIC 2008. LNCS, vol. 5061, pp. 297–310. Springer, Heidelberg (2008)
[61] T. W. Kuan, J. F. Wang, J. C. Wang, P. C. Lin and G. H. Gu, “VLSI Design of an SVM Learning Core on Sequential Minimal Optimization Algorithm”, IEEE Transactions on VLSI systems, Vol. PP , Issue: 99, pp. 1-11, Feb. 2011.
[62] K. R. M¨uller, S. Mika, G. R¨asch, K. Tsuda, and B. Sch¨okopf.,” An introduction to kernel-based learning algorithms,” IEEE Transactions on Neural Networks, vol.12 (2), pp.181–201, 2001.
[63] H. Zen, T. Nose, J. Yamagishi, S. Sako, T. Masuko, A. Black and Tokuda. K,” The HMM-based speech synthesis system version 2.0.,” Speech Synthesis Workshop, Bonn, Germany, pp. 294–299.
[64] J. C. Wang, C. H. Yang, J. F. Wang, and H. P. Lee, “Robust speaker identification and verification,” IEEE Compu. Intell. Mag., pp.52-59, May 2007.
[65] E. M. Tapia, S. Intille, and K. Larson, “Activity recognition in the home using simple and ubiquitous sensors,” in Proc. Pervasive, April, 2004.
[66] C. M. Bishop, Pattern Recognition and Machine Learning, U.K. Springer Press, 2006.
[67] K. S. Goh, E. Y. Chang and B. Li, ”Using one-class and two-class SVMs for multi-class image annotation,” IEEE Trans. Know Data Eng, vol.17(10), pp. 1333-1346, 2005.
[68] K. Duan and S. S. Keerthi.,” Which is the best multi-class SVM method? An empirical study,” Technical Report CD-03-12, Control Division, Department of Mechanical Engineering, National University of Singapore, 2003.
[69] Introduction to the quartus II software Handbook, Altera, 2004.
[70] Q. Jin, T. Schultz, and A. Waibel, “Far-field speaker recognition,” IEEE Trans. on Audio, Speech, and Language Processing, vol. 15, no.7, pp. 2023–2032, 2007.
[71] J. F Wang, J. S. Peng, J. C. Wang, P. O. Lin and T. W. Kuan,” HW/SW Co-design for fast-trainable speaker identification system based on SMO,” in Proc. Int. Conf. ICSMC.,Oct. 2011, pp. 1621-1625.
[72] M. Li, T. Wen, “Hardware /software co-design for Viterbi decoder,” International Conference on Electronic Packaging Technology & High Density Packaging, pp.1-4., 2008
[73] C. W. Hsu and C. J. Lin, "A comparison of methods for multi-class support vector machines," IEEE Transactions on Neural Networks, vol. 13, pp. 415-425, 2002.
[74] J. F. Wang, T.W. Kuan, J.C. Wang and T.W. Sun,”Dynamic Fixed-Point Arithmetic Design of Embedded SVM-Based Speaker Identification System,” Lecture Notes in Computer Science, vol. 6064, pp.524-531, 2010
[75] J. Andrews “ARM SoC Verification Matrix Improves HW/SW Co-Verification,” Electronic Design Processes, April 25-27, 2004, Monterey, CA
[76] V. Wan and S. Renals, “Speaker verification using sequence discriminant support vector machines,” IEEE Trans. On Speech and Audio Processing, vol.13, no. 2, Mar.2005.
[77] W. M. Campbell, J. P. Campbell, D. A. Reynolds, D. A. Jones, and T. R. Leek, “High-level speaker verification with support vector machines," in Proc. ICASSP 2004.
[78] J. P. Campbell, “Speaker recognition: A tutorial,” in Proc. IEEE, vol. 85, no. 9, pp. 1437–1462, Sep. 1997.
[79] S. Young, et al. HTKbook (V3.2), Cambridge University Engineering Dept, 2002.
[80] C. H. Lee, F. K. Soong, K. K. Paliwal, editor. Automatic Speech and Speaker Recognition: Advanced Topics. Kluwer Academic Publishers, Boston, MA, 1996
[81] B. Juang, “Speech recognition in adverse environments”, Computer Speech and Language, Vol. 5, pp. 275-294, 1991.
[82] H. C. Tsai, W. C. Wang, J. C. Wang, and J. F. Wang,“ Long distance person identification using height measurement and face recognition,“ in Proc. Int. Conf. TENCON, 2009.
[83] P. Viola and M. J. Jones, “Robust real-time face detection,” Int. J. Comput. Vis., vol. 57, no. 2, pp. 137–154, 2004.
[84] Wikipedia, Average height around the world, Wikipedia website, the free encyclopedia.
[85] C. BenAbdelkader, R. Cutler, and L. Davis, “Person identification using automatic height and stride estimation,” 16th International Conference on Pattern Recognition, vol. 4, pp. 377 – 380, 2002.
[86] A. Criminisi, “Single-view metrology: algorithms and applications”, Proceedings of DAGM-Symposium on Pattern Recognition, pp. 224–239, Sept. 2002.
[87] T. W. Kuan, J. F. Wang, J. C. Wang, and G. H. Gu, “VLSI design of sequential minimal optimization algorithm for SVM learning,” in Proc. IEEE Int. Conf. on Circuits and Systems(ISCAS), vol. 5, pp. 2509 - 2512. 2009
校內:2017-06-11公開