| 研究生: |
杜俊逸 Du, Jyan-Yi |
|---|---|
| 論文名稱: |
基於AMDF之改良式音高特徵語音辨識演算法與嵌入式系統設計 An Embedded System Design for Speech Recognition using Improved AMDF-based Pitch Features |
| 指導教授: |
王駿發
Wang, Jhing-Fa |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2007 |
| 畢業學年度: | 95 |
| 語文別: | 英文 |
| 論文頁數: | 59 |
| 中文關鍵詞: | 音高 、動態時間校準 、語音辨識 |
| 外文關鍵詞: | Pitch contour, dynamic time, speech recognition |
| 相關次數: | 點閱:122 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
先前研究顯示人機介面之語音互動系統不僅提供了友善的使用者介面也提供了系統一種可回饋使用者反應的機制。因此本研究欲利用音高週期特徵設計一套語音辨識嵌入式系統,以提高語音互動人機介面在嵌入式系統的功能性與應用性。本研究所提出的語音辨識嵌入式系統可以在有限硬體資源的條件下完成設計,使具有小尺寸、低成本、低功率、即時性、高量產化及可嵌入性,並易應用於任何的資訊家電(IA)上。
在音高週期估測策略上,論文採用平均振幅差函數(Average Magnitude Difference Function, AMDF)求取所需的音高週期語音特徵以處理語音辨識。同樣用於求取音高週期之自相關函數(Autocorrelation Function, ACF),與之相比,AMDF基礎的音高週期在較低的計算量情況下仍具有相似的準確性。而研究中所採取的AMDF基礎音高週期估測有三種改良方式,ㄧ種是循環式(circular) AMDF (CAMDF),另一種是校正式(modified) AMDF (MAMDF),及最後一種是將上述兩者整合為一的混合式(modified and circular) AMDF (MCAMDF)。對於具有聲及無聲的混合訊號,CAMDF基礎的方法可以提供有聲的音高週期估測而且改善了混合訊號在音高週期估測的效能。本論文除了研究音高週期特徵用於語音辨識之外,動態時間校準比對方法(dynamic time warping, DTW)及其改良權重策略亦於論文中設計在語音辨識嵌入式系統上。
最後,本論文提出ㄧ個以AMDF基礎的音高週期特徵及改良式權重DTW辨識之低成本的語音互動嵌入式系統。在4組1秒期間的語音互動命令下,該系統辨識正確率大約可達70%~80%。除了語音辨識演算法外,其中用於語音訊號端點偵測的功能和音高特徵擷取的功能,亦都整合實現在本論文所採用的資源受限制嵌入式系統之中。而結果顯示了我們的設計適用於不同語言之不同的命令,未來並可更廣泛應用於人機語音互動系統,例如智慧型玩具、聲控遙控器、自我學習機器等等。
Previous researches show that speech interactive systems provide users not only a user-friendly interface but a feedback scheme in speech. Therefore, in this thesis, an embedded system design of pitch-based speech recognition is proposed for ubiquitous speech interactive applications. This embedded module only requires a few peripheral components for complete operation, and is characterized by small size, low cost, real-time operation, high producibility and embedability in any information appliance.
For pitch detection, AMDF (Average Magnitude Difference Function) is adopted for this work. Compared with the autocorrelation method, AMDF has the advantage of low computation and high precision. This work presents three kinds of improvement for AMDF-based pitch detection. One is the circular AMDF (CAMDF), one is the modified AMDF (MAMDF), and the other is modified and circular AMDF (MCAMDF). With the mixed segment containing voiced and unvoiced speech, the CAMDF-based method can give the pitch period of the voiced part and improve the performance on pitch extraction. Besides, dynamic programming and its modifications have been successfully adopted for speech recognizer, which is the kernel process for pitch-based speech recognition.
In conclusion, a cost effective embedded system design based on AMDF-based pitch features and weighted DTW recognition for speech interactivity on mobile handheld device is proposed in this work. The accuracy rate of speech recognition was about 70%~80% under 4 commands for 1-sec speech input. The start/end point detection of speech streams and program scheduling of feature extraction/saving are specially designed to suit with the resource-limit identity of an embedded system. This result shows that our design would be utilized for various command-driven speech interactive applications such as intelligent toys, hand-free remote controlling, machine learning, and speech retrieval, etc.
[1]. M. J. Ross et al., “Average Magnitude Difference Function Pitch Extractor,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP 22, pp. 353-362, 1974.
[2]. Jhing-Fa Wang, Chung-Hsien Wu, Shih-Hung Chang, Jau-Yien Lee Lee, ” A hierarchical neural network model based on a C/V segmentation algorithm for isolated Mandarin speech recognition,” IEEE Transactions , Signal Processing, vol. Issue 9,39, pp. 2141 – 2146, Sept. 1991.
[3]. Y. M. Zeng, Z. Y. Wu, H. B. Liu, and L. Zou, ” Modified AMDF pitch detection algorithm,” IEEE Int. Conf. Machine Learning and Cybernetics, pp. 470-473, Phoenix, AZ, Nov. 2003.
[4]. Wenyao Zhang, Gang Xu, Yugou Wang,” Pitch Estimation Based on Circular AMDF,” IEEE Int. Conf. Acoustics, Speech, and Signal Processing, pp. 341-344, Xi An, China, Nov. 2002.
[5]. C. Kim and K.Seo, “Robust DTW-based recognition algorithm for hand-held consumer devices,” IEEE Trans. Consumer Electronics, vol. 51 (2), pp. 699-709, 2005.
[6]. W. Hess, Pitch Determination of Speech Signal. New York: Springer-Verlag, 1983.
[7]. T. E. Tremain, “The Government Standard Linear Predictive Coding Algorithm: LPC-10,” Speech Technology Magazine, pp. 40-49, April 1982.
[8]. Bo Li, Ying-Ying Li, Cheng-You Wang, Chao-Jing Tang, Er-Yang Zhang, ” A new efficient pitch-tracking algorithm,” IEEE Int. Conf., Robotics, Intelligent Systems and Signal Processing, vol. 2, pp. 1102 – 1107, Oct. 2003.
[9]. Xiao-Dan Mei, Jengshyang Pan, Sheng-He Sun,” Efficient algorithms for speech pitch estimation,” International Symposium, Intelligent Multimedia, Video and Speech Processing, pp.421 – 424, May 2001.
[10]. Bei-qian Dai, Li Hui, Lu Wei,” A Pitch Detection Algorithm Based on AMDF and ACF,” IEEE International Conference, Acoustics, Speech and Signal Processing, pp.1 – 1, vol.1, 2006.
[11]. Rabiner, L., Cheng, M., Rosenberg, A., McGonegal, C., ” A comparative performance study of several pitch detection algorithms, ” IEEE Transactions, Acoustics, Speech, and Signal Processing, vol.24, Issue 5, pp.399 – 418, Oct 1976.
[12]. Rabiner, L., Cheng, M., Rosenberg, A., McGonegal, C., ” A comparative performance study of several pitch detection algorithms,” IEEE Transactions ,Acoustics, Speech, and Signal Processing, Vol. 24, Issue 5, pp.399 – 418, Oct 1976.
[13]. http://www.silabs.com/public/documents/
tpub_doc/dsheet/Microcontrollers/USB/en/
C8051F34x.pdf (C8051F34x Data Sheet )