| 研究生: |
許劭君 Hsu, Shao-Chun |
|---|---|
| 論文名稱: |
小波轉換處理語者之語音辨識 Voice Recognition of Speakers Using Wavelet Transform |
| 指導教授: |
王榮泰
Wang, Rung-Tai |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2012 |
| 畢業學年度: | 100 |
| 語文別: | 中文 |
| 論文頁數: | 61 |
| 中文關鍵詞: | 語者辨識 、小波轉換 、隱藏馬可夫模型 |
| 外文關鍵詞: | Speaker Identification, Wavelet Transform, Hidden Markov Model |
| 相關次數: | 點閱:128 下載:7 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於時代的進步,人類與機械的互動機會越來越多,故本研究架構一語者辨識系統,在去除辨識子句內容及時間長短的限制條件下判斷講者身分,並以遞迴學習機制降低誤判發生率,使講者免於記誦複雜的通關密語。
本系統依流程主要分為三部分:語音訊號前處理、特徵資料分群和辨識演算及遞迴學習,辨識系統主要以小波轉換及隱藏馬可夫模型作為主要演算依據,並以最近鄰居選擇、中心點分離及K群平均分群演算法配合辨識機制中的訓練演算流程建立語者特徵資料庫,在每次辨識時,記錄最佳的語者特徵值參數,回傳給資料庫作更新,以達到機器學習的功能。
辨識結果顯示,以小波轉換架構的辨識演算法,其辨識率可達80%以上,以隱藏馬可夫模型的機率狀態轉移進行辨識,其辨識率可達70%以上。
With the current progress of technological development, there is an increasing trend of human and machine interactions. This research builds a framework for speaker identification system. This system ignores the constraint of the content within the spoken sentences and the length of speaking time, which allows users no longer for memorizing lengthy or complicated passwords for identification. Besides, recursive learning process is used as well to decrease the failure rate.
This speaker identification system can be divided into three main parts, which are phonic signal preprocessing, feature extraction by clustering and identification algorithm with recursive learning according to algorithmic diagram. The main algorithms of the identification system are wavelet transform method and Hidden Markov Model. By using Nearest Neighbor Selection Rule, Centroid Splitting Algorithm and K-means Algorithm cluster the characteristics of voice to construct speaker characteristic database, which updates its date regularly by receiving the better characteristic written when any recognition has been done.
The recognition results show that using wavelet transform method an identification rate of eighty percent or above can be achieved, while with Hidden Markov Model Transition States an identification rate of seventy percent or above can be achieved.
[1]L. R. Bahl, P. F. Brown, P. V. de Souza, R. L. Mercer, “A New Algorithm for the Estimation of Hidden Markov Model Parameters”, Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on Vol. 1, pp. 493-497, 11-14 April 1988
[2]Tomoko Matsui, Sadaoki Furui, “Comparison of Text-Independent Speaker Recognition Methods Using VQ-Distortion and Discrete/Continuous HMMs”, Acoustics, Speech, and Signal Processing, 1992. ICASSP-92., 1992 IEEE International Conference on Vol. 2, pp.157-160, 23-26 March 1992
[3]Tomoko Matsui, Sadaoki Furui, “Concatenated Phoneme Models for Text-variable Speaker Recognition” , Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on Vol. 2, pp.391-394, 27-30 April 1993
[4] Tianhorng Chang, C. -C. Jay Kuo, “Texture Analysis and Classification with Tree-Structured Wavelet Transform”, IEEE Transactions on Image Processing, Vol. 2, No. 4, Oct. 1993
[5]Qiang Huo, Chorkin Chan, Chin-Hui Lee, “Bayesian Adaptive Learning of the Parameters of Hidden Markov Model for Speech Recognition”, IEEE Transactions on Speech and Audio Processing, Vol. 3, No. 5, Sep. 1995
[6]Douglas A. Reynolds, Richard C. Rose, “Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models”, IEEE Transactions on Speech and Audio Processing, Vol. 3, No. 1, Jan. 1995
[7]Z. Tufekci, J. N. Gowdy, “Feature extraction using discrete wavelet transform for speech recognition”, Southeastcon 2000. Proceedings of the IEEE Conference on pp. 116-123, 2000
[8]J. N. Gowdy, Z. Tufekci, “Mel-scaled discrete wavelet coefficients for speech recognition”, Acoustics, Speech, and Signal Processing, 2000. ICASSP '00., Proceedings. 2000 IEEE International Conference on Vol. 3, pp. 1351-1354, 2000
[9]O. Farooq, S. Datta, “Mel Filter-Like Admissible Wavelet Packet Structure for Speech Recognition”, IEEE Signal Processing Letters, Vol. 8, No. 7, July 2001
[10]Mihalis Siafarikas, Todor Ganchev, Nikos Fakotakis, “Wavelet Packet Based Speaker Verification”, The Speaker and Language Recognition Workshop Toledo, Spain, May 31 – June 3, 2004
[11]Hui Jiang, Xinwei Li, Chaojun Liu, “Large Margin Hidden Markov Models for Speech Recognition”, IEEE Transactions on Audio, Speech, and Language Processing, Vol. 14, No. 5, Sep. 2006
[12]Unathi Mahola, Fulufhelo V. Nelwamondo, Tshilidzi Marwala, “HMM Speaker Identification Using Linear and Non-linear Merging Techniques”, School of Electrical and information Engineering University of the Witwatersrand, Johannesburg, South Africa, May 11, 2007
[13]Howard Lei, Nikki Mirghafori, “Word-Conditioned HMM Supervectors for Speaker Recognition”, Proceedings of the 8th Annual Conference of the International Speech Communication Association (Interspeech 2007), Antwerp, Belgium, pp. 746-749, August 2007
[14]Howard Lei, “NAP, WCCN, a New Linear Kernel, and Keyword Weighting for the HMM Supervector Speaker Recognition”, ICSI Technical Report TR-08-006, August 2008
[15]A. K. Ariff, M. Alwi, Sh-Hussain, Salleh, “Malay Speaker Recognition System Based On Discrete HMM”, Computers, Communications, & Signal Processing with Special Track on Biomedical Engineering, 2005. CCSP 2005., 1st International Conference on pp.292-295, Nov. 14-16, 2005
[16]Wael Al-Sawalmeh, Khaled Daqrouq, Omar Daoud, Abdel-Rahman Al-Qawasmi, “Speaker Identification System-based Mel Frequency and Wavelet Transform using Neural Network Classifier”, European Journal of Scientific Research ISSN 1450-216X Vol.41, No.4, pp.515-525, 2010
[17]Mangesh S. Deshpande, Raghunath S. Holambe, “Speaker Identification Using Admissible Wavelet Packet Based Decomposition”, International Journal of Information and Communication Engineering 6:1, 2010
[18]Mahmoud I. Abdalla, Hanaa S. Ali, “Wavelet-Based Mel-Frequency Cepstral Coefficients for Speaker Identification using Hidden Markov Models”, Journal of Telecommunications, Vol. 1, Issue 2, March 2010
[19]Yu Shao, Chip-Hong Chang, “Bayesian Separation With Sparsity Promotion in Perceptual Wavelet Domain for Speech Enhancement and Hybrid Speech Recognition”, IEEE Transactions on System, Man, and Cybernetics-Part A: Systems and Humans, Vol. 41, No. 2, March 2011
[20]Bartosz Ziolko, Wojciech Kozlowski, Mariusz Ziolko, Rafal Samborski, David Sierra, Jakub Galka, “Hybrid Wavelet-Fourier-HMM Speaker Recognition”, International Journal of Hybrid Information Technology, Vol. 4, No. 4, October 2011
[21]Jia-Wei Liu, Jia-Ching Wang, Chang-Hong Lin, “Speaker Identification Using HHT Spectrum Features”, Conference on Technologies and Applications of Artificial Intelligence, 2011
[22]M. A. Anusuya, S. K. Katti, “Mel Frequency Discrete Wavelet Coefficients for Kannada Speech Recognition using PCA”, Proc. of Int. Conf. on Advances in Computer Science, 2010
[23]Keith L. Moore MSc PhD FIAC FRSM FAAA, Arthur F. Dalley PhD, Anne M.R. Agur B.Sc. (OT) M.Sc. PH.D, “Clinically Oriented Anatomy”, Lippincott Williams & Wilkins, 6th Edition, Feb. 9, 2009
[24]Frank H.Nttr, “Atlas of Human Anatomy”, Saunders, 4th Edition, June. 23, 2006
[25]L. R. Rabiner, B. H. Juang, “An Introduction to Hidden Markov Models”, Journal of the IEEE ASSp Magazine, 1986
[26]Raskesh Kumar, H. Parthasarthy, R. K. Khola, “Interpretation and Estimating the Parameters of Transition Probabilities in Speech Recognition System”, International Journal of Electronics Engineering, 2(2), pp.355-358, 2010
[27]Gao Bo, Kou Ziming, Yan Hongwei, “Research on Speaker Recognition Based on Wavelet Analysis and Search Tree”, Advanced Materials Research Vol. 159, pp. 68-71, 2011
[28]Nitin Trivedi, Vikesh Kumar, Saurabh Singh, Sachin Ahuja, Raman Chadha, “Speech Recognition by Wavelet Analysis”, International Journal of Computer Applications(0975 – 8887), Vol. 15, No.8, Feb. 2011
[29]John R. Deller, Jr., John H.L. Hansen, John G. Proakis, “Discrete-time Processing of Speech Signals”, New York, Institute of Electrical and Electronics Engineers :c2000.Wiley-interscience
[30]王小川,語音訊號處理(修訂版),全華科技圖書公司,民國九十六年
[31]國立中央大學數據分析方法研究中心http://www.ncu.edu.tw/~ncu34951/
[32]張智星教授個人網頁http://neural.cs.nthu.edu.tw/jang/
[33]吳金池著,“語者辨識系統之研究”,國立中央大學碩士論文,民國九十一年
[34]張柏雄著,“中文語音情緒之自動辨識”,國立成功大學碩士論文,民國九十一年
[35]陳厚君著,“經驗模態分解法之語音辨識”,國立中央大學碩士論文,民國九十四年
[36]趙俊超著,“改良式DTW語音辨識系統之FPGA實現與分析”,國立成功大學碩士論文,民國九十五年
[37]游祿勳著,“新生嬰兒哭聲情緒之辨識”,國立成功大學碩士論文,民國九十六年
[38]蔡沛任著,“應用語音屬性分析於構音障礙者之發音錯誤與修正回饋”,國立成功大學碩士論文,民國九十六年
[39]李國源著,“自適性隱藏馬可夫模型拓樸於語音辨識之應用”,國立成功大學碩士論文,民國九十七年
[40]溫家誠著,“多媒體應用之語音辨識系統”,國立中央大學碩士論文,民國九十七年
[41]蔡仲齡著,“含語者驗證之小型場所人臉辨識門禁系統的研發”,國立成功大學碩士論文,民國九十七年
[42]鄭力維著,“國語之韻律及聲調模型與其在語音辨識及韻律預測之應用”,國立臺灣大學碩士論文,民國九十七年
[43]張書龍著,“具辭書式情境感知之語音導引機器人”,國立成功大學碩士論文,民國九十八年
[44]梁鈺昕著,“搖籃曲音樂特性分析”,國立成功大學碩士論文,民國九十八年
[45]蘇育民著,“具音源方向與臉部辨識之主動式追蹤系統”,國立成功大學碩士論文,民國九十九年