簡易檢索 / 詳目顯示

研究生: 許劭君
Hsu, Shao-Chun
論文名稱: 小波轉換處理語者之語音辨識
Voice Recognition of Speakers Using Wavelet Transform
指導教授: 王榮泰
Wang, Rung-Tai
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2012
畢業學年度: 100
語文別: 中文
論文頁數: 61
中文關鍵詞: 語者辨識小波轉換隱藏馬可夫模型
外文關鍵詞: Speaker Identification, Wavelet Transform, Hidden Markov Model
相關次數: 點閱:128下載:7
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於時代的進步,人類與機械的互動機會越來越多,故本研究架構一語者辨識系統,在去除辨識子句內容及時間長短的限制條件下判斷講者身分,並以遞迴學習機制降低誤判發生率,使講者免於記誦複雜的通關密語。
    本系統依流程主要分為三部分:語音訊號前處理、特徵資料分群和辨識演算及遞迴學習,辨識系統主要以小波轉換及隱藏馬可夫模型作為主要演算依據,並以最近鄰居選擇、中心點分離及K群平均分群演算法配合辨識機制中的訓練演算流程建立語者特徵資料庫,在每次辨識時,記錄最佳的語者特徵值參數,回傳給資料庫作更新,以達到機器學習的功能。
    辨識結果顯示,以小波轉換架構的辨識演算法,其辨識率可達80%以上,以隱藏馬可夫模型的機率狀態轉移進行辨識,其辨識率可達70%以上。

    With the current progress of technological development, there is an increasing trend of human and machine interactions. This research builds a framework for speaker identification system. This system ignores the constraint of the content within the spoken sentences and the length of speaking time, which allows users no longer for memorizing lengthy or complicated passwords for identification. Besides, recursive learning process is used as well to decrease the failure rate.
    This speaker identification system can be divided into three main parts, which are phonic signal preprocessing, feature extraction by clustering and identification algorithm with recursive learning according to algorithmic diagram. The main algorithms of the identification system are wavelet transform method and Hidden Markov Model. By using Nearest Neighbor Selection Rule, Centroid Splitting Algorithm and K-means Algorithm cluster the characteristics of voice to construct speaker characteristic database, which updates its date regularly by receiving the better characteristic written when any recognition has been done.
    The recognition results show that using wavelet transform method an identification rate of eighty percent or above can be achieved, while with Hidden Markov Model Transition States an identification rate of seventy percent or above can be achieved.

    摘要 I Abstract II 致謝 III 目錄 V 表目錄 VII 圖目錄 VIII 第一章 緒論 1 1.1研究背景與動機 1 1.2研究目的 1 1.3文獻回顧 2 1.4論文架構 8 第二章 語者特徵模型建立 9 2.1語者特徵模型建立系統 9 2.2訊號前處理 10 2.2.1數位取樣 10 2.2.2端點偵測 10 2.2.3音框切割 11 2.2.4預強調 11 2.2.5視窗化 12 2.3結論 14 第三章 語者特徵參數擷取 15 3.1語者特徵參數擷取系統 15 3.2倒頻譜 15 3.3梅爾濾波器 18 3.4基頻 20 3.5向量量化 22 3.6結論 23 第四章 辨識機制 24 4.1辨識機制系統 24 4.2隱藏馬可夫模型(Hidden Markov Model) 25 4.2.1參數重估(Parameter Estimation) 28 4.2.2維特比演算法(Viterbi Algorithm) 31 4.3高斯語者模型(Gaussian Speaker Model) 34 4.4小波轉換(Wavelet Transform) 35 4.5希爾伯特-黃轉換(Hilbert-Huang Transform) 37 4.6語者辨識流程( Speaker Identification Sequence Diagram) 39 4.7機率分佈(Probability Distribution) 40 第五章 實驗與結果 41 5.1資料庫及實驗介紹 41 5.2辨識參數比對 42 5.3參數分群比對 44 5.4母小波參數比對 47 5.5機率分佈參數比對 49 5.6語者辨識實驗 51 5.7綜合討論與分析 53 5.8嬰兒哭聲情緒辨識 53 第六章 結論與未來展望 55 6.1結論 55 6.2未來展望 55 參考文獻 57

    [1]L. R. Bahl, P. F. Brown, P. V. de Souza, R. L. Mercer, “A New Algorithm for the Estimation of Hidden Markov Model Parameters”, Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on Vol. 1, pp. 493-497, 11-14 April 1988
    [2]Tomoko Matsui, Sadaoki Furui, “Comparison of Text-Independent Speaker Recognition Methods Using VQ-Distortion and Discrete/Continuous HMMs”, Acoustics, Speech, and Signal Processing, 1992. ICASSP-92., 1992 IEEE International Conference on Vol. 2, pp.157-160, 23-26 March 1992
    [3]Tomoko Matsui, Sadaoki Furui, “Concatenated Phoneme Models for Text-variable Speaker Recognition” , Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on Vol. 2, pp.391-394, 27-30 April 1993
    [4] Tianhorng Chang, C. -C. Jay Kuo, “Texture Analysis and Classification with Tree-Structured Wavelet Transform”, IEEE Transactions on Image Processing, Vol. 2, No. 4, Oct. 1993
    [5]Qiang Huo, Chorkin Chan, Chin-Hui Lee, “Bayesian Adaptive Learning of the Parameters of Hidden Markov Model for Speech Recognition”, IEEE Transactions on Speech and Audio Processing, Vol. 3, No. 5, Sep. 1995
    [6]Douglas A. Reynolds, Richard C. Rose, “Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models”, IEEE Transactions on Speech and Audio Processing, Vol. 3, No. 1, Jan. 1995
    [7]Z. Tufekci, J. N. Gowdy, “Feature extraction using discrete wavelet transform for speech recognition”, Southeastcon 2000. Proceedings of the IEEE Conference on pp. 116-123, 2000
    [8]J. N. Gowdy, Z. Tufekci, “Mel-scaled discrete wavelet coefficients for speech recognition”, Acoustics, Speech, and Signal Processing, 2000. ICASSP '00., Proceedings. 2000 IEEE International Conference on Vol. 3, pp. 1351-1354, 2000
    [9]O. Farooq, S. Datta, “Mel Filter-Like Admissible Wavelet Packet Structure for Speech Recognition”, IEEE Signal Processing Letters, Vol. 8, No. 7, July 2001
    [10]Mihalis Siafarikas, Todor Ganchev, Nikos Fakotakis, “Wavelet Packet Based Speaker Verification”, The Speaker and Language Recognition Workshop Toledo, Spain, May 31 – June 3, 2004
    [11]Hui Jiang, Xinwei Li, Chaojun Liu, “Large Margin Hidden Markov Models for Speech Recognition”, IEEE Transactions on Audio, Speech, and Language Processing, Vol. 14, No. 5, Sep. 2006
    [12]Unathi Mahola, Fulufhelo V. Nelwamondo, Tshilidzi Marwala, “HMM Speaker Identification Using Linear and Non-linear Merging Techniques”, School of Electrical and information Engineering University of the Witwatersrand, Johannesburg, South Africa, May 11, 2007
    [13]Howard Lei, Nikki Mirghafori, “Word-Conditioned HMM Supervectors for Speaker Recognition”, Proceedings of the 8th Annual Conference of the International Speech Communication Association (Interspeech 2007), Antwerp, Belgium, pp. 746-749, August 2007
    [14]Howard Lei, “NAP, WCCN, a New Linear Kernel, and Keyword Weighting for the HMM Supervector Speaker Recognition”, ICSI Technical Report TR-08-006, August 2008
    [15]A. K. Ariff, M. Alwi, Sh-Hussain, Salleh, “Malay Speaker Recognition System Based On Discrete HMM”, Computers, Communications, & Signal Processing with Special Track on Biomedical Engineering, 2005. CCSP 2005., 1st International Conference on pp.292-295, Nov. 14-16, 2005
    [16]Wael Al-Sawalmeh, Khaled Daqrouq, Omar Daoud, Abdel-Rahman Al-Qawasmi, “Speaker Identification System-based Mel Frequency and Wavelet Transform using Neural Network Classifier”, European Journal of Scientific Research ISSN 1450-216X Vol.41, No.4, pp.515-525, 2010
    [17]Mangesh S. Deshpande, Raghunath S. Holambe, “Speaker Identification Using Admissible Wavelet Packet Based Decomposition”, International Journal of Information and Communication Engineering 6:1, 2010
    [18]Mahmoud I. Abdalla, Hanaa S. Ali, “Wavelet-Based Mel-Frequency Cepstral Coefficients for Speaker Identification using Hidden Markov Models”, Journal of Telecommunications, Vol. 1, Issue 2, March 2010
    [19]Yu Shao, Chip-Hong Chang, “Bayesian Separation With Sparsity Promotion in Perceptual Wavelet Domain for Speech Enhancement and Hybrid Speech Recognition”, IEEE Transactions on System, Man, and Cybernetics-Part A: Systems and Humans, Vol. 41, No. 2, March 2011
    [20]Bartosz Ziolko, Wojciech Kozlowski, Mariusz Ziolko, Rafal Samborski, David Sierra, Jakub Galka, “Hybrid Wavelet-Fourier-HMM Speaker Recognition”, International Journal of Hybrid Information Technology, Vol. 4, No. 4, October 2011
    [21]Jia-Wei Liu, Jia-Ching Wang, Chang-Hong Lin, “Speaker Identification Using HHT Spectrum Features”, Conference on Technologies and Applications of Artificial Intelligence, 2011
    [22]M. A. Anusuya, S. K. Katti, “Mel Frequency Discrete Wavelet Coefficients for Kannada Speech Recognition using PCA”, Proc. of Int. Conf. on Advances in Computer Science, 2010
    [23]Keith L. Moore MSc PhD FIAC FRSM FAAA, Arthur F. Dalley PhD, Anne M.R. Agur B.Sc. (OT) M.Sc. PH.D, “Clinically Oriented Anatomy”, Lippincott Williams & Wilkins, 6th Edition, Feb. 9, 2009
    [24]Frank H.Nttr, “Atlas of Human Anatomy”, Saunders, 4th Edition, June. 23, 2006
    [25]L. R. Rabiner, B. H. Juang, “An Introduction to Hidden Markov Models”, Journal of the IEEE ASSp Magazine, 1986
    [26]Raskesh Kumar, H. Parthasarthy, R. K. Khola, “Interpretation and Estimating the Parameters of Transition Probabilities in Speech Recognition System”, International Journal of Electronics Engineering, 2(2), pp.355-358, 2010
    [27]Gao Bo, Kou Ziming, Yan Hongwei, “Research on Speaker Recognition Based on Wavelet Analysis and Search Tree”, Advanced Materials Research Vol. 159, pp. 68-71, 2011
    [28]Nitin Trivedi, Vikesh Kumar, Saurabh Singh, Sachin Ahuja, Raman Chadha, “Speech Recognition by Wavelet Analysis”, International Journal of Computer Applications(0975 – 8887), Vol. 15, No.8, Feb. 2011
    [29]John R. Deller, Jr., John H.L. Hansen, John G. Proakis, “Discrete-time Processing of Speech Signals”, New York, Institute of Electrical and Electronics Engineers :c2000.Wiley-interscience
    [30]王小川,語音訊號處理(修訂版),全華科技圖書公司,民國九十六年
    [31]國立中央大學數據分析方法研究中心http://www.ncu.edu.tw/~ncu34951/
    [32]張智星教授個人網頁http://neural.cs.nthu.edu.tw/jang/
    [33]吳金池著,“語者辨識系統之研究”,國立中央大學碩士論文,民國九十一年
    [34]張柏雄著,“中文語音情緒之自動辨識”,國立成功大學碩士論文,民國九十一年
    [35]陳厚君著,“經驗模態分解法之語音辨識”,國立中央大學碩士論文,民國九十四年
    [36]趙俊超著,“改良式DTW語音辨識系統之FPGA實現與分析”,國立成功大學碩士論文,民國九十五年
    [37]游祿勳著,“新生嬰兒哭聲情緒之辨識”,國立成功大學碩士論文,民國九十六年
    [38]蔡沛任著,“應用語音屬性分析於構音障礙者之發音錯誤與修正回饋”,國立成功大學碩士論文,民國九十六年
    [39]李國源著,“自適性隱藏馬可夫模型拓樸於語音辨識之應用”,國立成功大學碩士論文,民國九十七年
    [40]溫家誠著,“多媒體應用之語音辨識系統”,國立中央大學碩士論文,民國九十七年
    [41]蔡仲齡著,“含語者驗證之小型場所人臉辨識門禁系統的研發”,國立成功大學碩士論文,民國九十七年
    [42]鄭力維著,“國語之韻律及聲調模型與其在語音辨識及韻律預測之應用”,國立臺灣大學碩士論文,民國九十七年
    [43]張書龍著,“具辭書式情境感知之語音導引機器人”,國立成功大學碩士論文,民國九十八年
    [44]梁鈺昕著,“搖籃曲音樂特性分析”,國立成功大學碩士論文,民國九十八年
    [45]蘇育民著,“具音源方向與臉部辨識之主動式追蹤系統”,國立成功大學碩士論文,民國九十九年

    下載圖示 校內:2017-06-30公開
    校外:2017-06-30公開
    QR CODE