| 研究生: |
趙俊超 Zhao, Jun-Chao |
|---|---|
| 論文名稱: |
改良式DTW語音辨識系統之FPGA實現與分析 Improved DTW-based Speech Recognition System with Its FPGA Implementation and Analysis |
| 指導教授: |
廖德祿
Liao, Teh-Lu |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2007 |
| 畢業學年度: | 95 |
| 語文別: | 中文 |
| 論文頁數: | 74 |
| 中文關鍵詞: | 音框 、自相關函數 、線性預估係數 、語音辨識 |
| 外文關鍵詞: | DTW, FPGA, Itakura distance |
| 相關次數: | 點閱:100 下載:11 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文旨在研製一套以FPGA為基礎之語音辨識平台,基於硬體設計的需求提出改良式的辨識演算法,以提高運算的效率。隨著電腦的普及化,人機介面已逐漸發展成熟,除了ㄧ般的手控操作,更以達到人類與電腦間能直接藉由語言溝通為目標。由於語音處理技術的進步,許多提高語音辨識率的方法也不斷地被提出,故仍是一項發展中的技術。近年來由於SoC (System on Chip)的盛行,整合多功能的設計應用於各種數位產品,如:手機、PDA、數位相機等電子產品,已是目前的發展趨勢。若能結合語音處理的技術,將語音辨識應用於各種消費性電子產品中,透過使用者的語音輸入即可操作電子產品,取代傳統的手控方式,對使用者能提供不少的便利性,尤其是行動不便的使用者,更是一項福音。
此外,為了降低FPGA內部空間的浪費,本論文針對各項設計進行改良,以達到更有效率的FPGA容量應用。在程式編輯方面,是以硬體描述語言(Verilog)來實現語音辨識演算法,經由改良式動態時軸校正(dynamic time warping, DTW)演算法,實現硬體語音辨識工作。經軟體模擬與硬體測試後,驗證本論文所設計之語音辨識系統可以確實地進行辨識的工作,達到與軟體模擬時一樣的效果,其辨識率達九成,運算速度卻快了57倍左右。
As the popularization of the computer, human interface is developed gradually due to more and more requirements of the interaction between human and machine. One of the solutions is the speech process technology. Furthermore, based on the speech process technology, several methods are constantly proposed to raise recognition accuracy. Additionally, the design of various digital products which combined with multi-functional has applied by System on Chip (SoC) such as the cell-phone, PDA, digit camera, etc.
Inspired by the developments of speech process technology, a FPGA-based speech recognition system with improved DTW algorithm is proposed in this thesis. This system is verified by software simulation and hardware implementation. In practice, the speech recognition system designed in this thesis can execute speech recognition accurately and match the result with software simulation.
[1] A. Poritz, “Linear predictive hidden Markov models and the speech signal”, Proc. IEEE Internat. Conf. On Acoust., Speech, Signal Processing,Vol. 7, pp.1291-1294, 1982.
[2] C. Kim and K. Seo, “Robust DTW-based Recognition Algorithm for Hand-held Consumer Devices”,IEEE Transactions on Consumer 706 Electronics, Vol. 51, pp.699-709, 2005.
[3] H. T. Hu, “Linear prediction analysis of speech signals in the presence of white Gaussian noise with unknown variance”, IEE Proc. Vision, Image Signal Process, Vol. 145 , pp. 303-308, 1998.
[4] H. Sakoe and S. Chiba, “Dynamic Programming Optimization for Spoken Word Recognition”, IEEE Transactions on ASSP, Vol.26, pp 43-49, 1978.
[5] J. Makhoul, “Linear prediction: A tutorial review”, Proc. IEEE, Vol. 63, pp. 561-580, 1975.
[6] L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition”, IEEE Transactions on ASSP, Vol. 77, pp 257-286, 1989.
[7] L.R. Rabiner and B.H. Juang, “Fundamentals of Speech Recognition”, Prentice Hall Co. Ltd , pp 200-232, 1993.
[8] L. R. Rabiner and M. R. Sambur, “An algorithm for determining the endpoints of isolated utterances”, The Bell System Technique Journal, Vol. 54 , pp. 297-315, 1975.
[9] 王家慶,語音辨識與壓縮架構設計之研究,國立成功大學碩士論文,2003年.
[10] 林政源,The synthesis and implementation of Mandarin Chinese songs,國立清華 大學碩士論文,2001.
[11] 張錦展,即時語音辨識系統之超大型積體電路設計與實現,國立成功大學碩士論文, 2002年.
[12] 孫安南,語音辨識及語音壓縮系統之超大型積體電路架構與晶片設計,國立成功大學碩士論文, 1996年.
[13] 葉桂弘, 整合語音編碼與辨識之模組化設計及其FPGA實現,私立中原大學碩士論文,2003.
[14] 蔡明道,The Implementation And Application of Voice Controlled Mouse,國立成功大學碩士論文,2000.
[15] 王小川,“語音訊號處理”,全華科技圖書股份有限公司,台灣,2005.
[16] 劉紹漢, “Verilog FPGA晶片設計”, 全華科技圖書股份有限公司,台灣,2004.