| 研究生: |
蘇尹廷 Su, Yin-Ting |
|---|---|
| 論文名稱: |
基於隱藏式馬可夫模型之語音辨認研究並應用於音樂點播系統 Research of Hidden Markov Model and Its Application to Audio-On-Demand System |
| 指導教授: |
廖德祿
Liao, Teh-Lu |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2010 |
| 畢業學年度: | 98 |
| 語文別: | 英文 |
| 論文頁數: | 67 |
| 中文關鍵詞: | 梅爾倒頻譜 |
| 外文關鍵詞: | HMM |
| 相關次數: | 點閱:67 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著科技日漸進步,各式各樣的電子多媒體與家電系統也逐漸蓬勃發展,而面對這些功能越來越強大的3C產品,語音辨識功能讓消費者更便利更直覺的使用這些產品。語音辨識目前最熱門的方法就是隱藏式馬可夫模型,這是由於隱藏式馬可夫模型擁有可以辨識連續語音及關鍵字的特性,因此比其它辨識演算法有更廣泛的使用空間。
本論文發展出一套可應用於MP3的語音辨識系統。本系統利用Visual C++作為實現語音辨識系統訊號處理與運算的平台,建立基於隱藏式馬可夫模型演算法之語音辨識系統,並且結合MP3播放系統設計出藉由人聲來操控播放音樂之數位語音控制產品。
在本實驗中,我們提出重新取樣端點偵測法,能更準確地擷取出我們所需要的語音;接著透過梅爾倒頻譜係數求得能夠代表語音的特徵參數;最後,藉由隱藏式馬可夫模型演算法與預先訓練完成的資料庫進行辨識。本實驗先選定5個歌手名字做為資料庫,接下來找11個人協助錄音,每個人分別錄製這5個人名,並使用這些錄音檔來訓練語音模型,之後所輸入的測試語音,便是以這些訓練出來的語音模型為依據,進行比對。
本論文平均辨識成功率可達90%,並成功整合於MP3上,成為一可透過人聲控制MP3播放的音樂曲目的產品。
As the advancement of science and technology, all kinds of electronic multimedia systems and appliances are developed vigorously, and speech recognition is an important way to make it more convenient for people to use these products. Hidden Markov model (HMM) is currently the most popular method in speech recognition, due to its characteristics of recognizing continuous speech and keywords, so it has more extensive application than other algorithms of speech recognition.
A VC-based speech recognition system with HMM algorithm is implemented in this paper. This system is combined with audio-on-demand system, which designs a digital music player that is controlled by the human voice.
In this study, we propose re-sampling endpoint detection for capturing the useful segment of the test speech more accurately, and then obtain the feature parameters that can represent the speech through Mel frequency cepstrum coefficient (MFCC). Finally, the test speech will be compared with the database that was trained in advance by HMM algorithm. In this experiment, we first have selected five singers’ names as the database, and then found eleven people to record. Each person records these five names respectively, and we use these audio files to train the speech models. Eventually we can match the test speech with these speech models which we have trained before.
[1] L.R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition”, Proceedings of The IEEE, Vol. 77, No. 2, February 1989.
[2] B.H. Juang and L.R. Rabiner, “Hidden Markov models for speech recognition”, Technometrics, Vol. 33, No. 3, pp. 251-272, August 1991.
[3] Y.Y. Qi and B.R. Hunt, “Voiced-unvoiced-silence classifications of speech using hybrid features and a network classifier”, IEEE Transactions on Speech and Audio Processing, Vol. 1, No. 2, April 1993.
[4] D. Huggins-Daines and A.I. Rudnicky, “A constrained Baum-Welch algorithm for improved phoneme segmentation and efficient training”, Language Technologies Institute Carnegie Mellon University, Pittsburgh, USA.
[5] H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition”, IEEE Transaction on Acoustics, Speech, and Signal Processing, Vol. Assp-26, No. 1, February 1978.
[6] S. Molau, M. Pitz, R. Schlüter, and H. Ney, “Computing Mel-frequency cepstral coefficients on the power spectrum”, Lehrstuhl für Informatik VI, Computer Science Department, RWTH Aachen-University of Technology, 52056 Aachen, Germany.
[7] S.J. Kia and G.G. Coghill, “A mapping neural network and its application to voiced-unvoiced silence classification”, Department of Electrical & Electronic Engineering, The University of Auckland, Auckland, New Zealand.
[8] L.R. Rabiner and B.H. Juang, “Fundamentals of Speech Recognition”, Prentice Hall Co. Ltd , pp 200-232, 1993.
[9] L.R. Rabiner and M.R. Sambur, “An algorithm for determining the endpoints of isolated utterances”, The Bell System Technique Journal, Vol. 54 , pp. 297-315, 1975.
[10] H.T. Hu, “Linear prediction analysis of speech signals in the presence of white Gaussian noise with unknown variance”, IEE Proc. Vision, Image Signal Process, Vol. 145 , pp. 303-308, 1998.
[11] 趙俊超,“改良式DTW語音辨識系統之FPGA實現與分析”,國立成功大學碩士論文,2006年.
[12] 侯君儒,“基於階層化隱藏式馬可夫模型演算法之語音辨識系統之設計”,國立中正大學碩士論文,2006年.
[13] 溫家誠,“多媒體應用之語音辨識系統”,國立中央大學碩士論文,2008年.
[14] 劉紋惠,“利用Viterbi演算法於MFCC特徵之國語數字辨識”,國立中興大學碩士論文,2004年.
[15] 黃俊仁,“隱藏式馬可夫模型語音辨識晶片之快速硬體雛型設計與驗證”,國立成功大學碩士論文,2000年.
[16] 林玄松,“Viterbi搜尋的最佳化以及多語系辨識”,國立清華大學碩士論文,2002年.
[17] 邱聖權,“強健性自動語音辨識之基於聽覺模型的梅爾倒頻譜參數擷取調整”,國立中山大學碩士論文,2009年.
[18] 王小川,語音訊號處理,全華科技圖書股份有限公司,台灣,2009年.
校內:2015-07-20公開