簡易檢索 / 詳目顯示

研究生: 蔣開淵
Chiang, Kai-Yuan
論文名稱: 無所不在的聲控系統:使用隱藏式馬可夫模型 之關鍵詞辨識同子空間語音增強
Ubiquitous Voice Control System: Using HMM's Keyword Spotting Recognition With Subspace Speech Enhancement
指導教授: 王駿發
Wang, Jhing-Fa
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2006
畢業學年度: 94
語文別: 英文
論文頁數: 43
中文關鍵詞: 多麥克風無所不在聲控系統
外文關鍵詞: Ubiquitous, Multi-microphone, Voice control system
相關次數: 點閱:99下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   語音是人們相互溝通時所使用最簡單和最普遍的方式,因此『聲控』一直是人們的一個夢想,如果未來所有的3C產品都可以用聲音來控制,那生活將會變得非常便利。
      雖然現在市面上也有一些聲控產品,然而它們大多數所採用的方法是:語者與麥克風之間為近距離;因為當二者之間距離漸遠,辨識效果就會漸差。本論文提出一遠距離聲控系統,此系統是利用子空間語音增強方法來改善遠距離雜訊的影響。
      關於遠距離的收音,要得到高辨識率所遇到最大的困難為環境中的雜訊,故本論文將麥克風訊號經子空間語音增強後再行送至隱藏式馬可夫模型之關鍵詞辨識器,以解決長距離收音的問題;所使用的辨識器具語者獨立性,因其較大眾化又可免除使用者需事先訓練的複雜手續。
      在麥克風的配置方面,我們選用多麥克風平均分配的方式,而不用麥克風陣列,因麥克風陣列在一定點需要多隻麥克風,如此麥克風總數量會提高許多,因此本論文採用多麥克風平均分配的方式,然後再經由投票選出最後的結果以得到辨識率。
      在本論文中「無所不在的聲控系統」使用二種方式加以實現,一種是遠距離的主要方案,須事先於環境中架設六支有線麥克風,此即無所不在的訊息處理環境,在此方案中,所收到的訊號皆相當微小且深受雜訊影響導致直接辨識時辨識率為零,經放大及使用子空間語音增強後,辨識率提昇至約66%;另外一種是近距離的對照方案,使用者須事先配戴藍芽耳機,此即可穿戴的訊息處理環境,在此方案中辨識率約為58%,經放大及使用子空間語音增強後,辨識率約提昇至91%。

      Speech is the easiest and commonest way for communication for people, thus “voice control” is always the dream of people. If all the 3C products can be controlled by voice in the future, they will become very convenient for our life.
      Although there are some products with voice control today, most of them are used in a short-distance way to record command voice. However, when the longer the distance becomes, the greater the recognition rate decreases.
    In this thesis, we proposed a long-distance voice-control system which is using subspace speech enhancement method to reduce the effect of noise distortion. When the command signal is recorded in a long-distance way, the noise distortion in the environment is the main difficulty to get the high recognition performance. Hence we enhance the recorded signal by exploiting subspace speech enhancement method, and then use HMM’s keyword spotting recognizer to get the recognized result. The recognizer is user independent and simple because of its popularization and without training.
      In the placement of the microphones, we exploit uniform-microphone-location instead of microphone-array. Because microphones array need multi-microphone in a fixed point, the total numbers of microphone will become largest. Therefore we adopt uniform-microphone-location to record signals and then by voting to select out the candidate recorded signals, finally the recognition result can be obtained.
      In this thesis, we used two projects to implement “Ubiquitous Voice Control System” : one project is for long distance; we must set up six wired-microphone in the environment. This is called “Ubiquitous Computing Environment”. In this project, the signals recorded by microphones are very weak and the signals degraded by environmental noise greatly, therefore the recorded signals all fail to be recognized. After recorded signals were normalized and enhanced, the recognition rate has been improved to about 66 percent. The other project is a contrast for short distance; the user must wear Bluetooth Headset. This is called “Wearable Computing Environment”. In this project, the recognition rate of recorded signal is about 58 percent. After recorded signals were normalized and enhanced, the recognition rate has been improved to about 91 percent.

    摘要                                I ABSTRACT                             III 誌謝                                V CONTENTS                             VI TABLE LIST                            VIII FIGURE LIST                            IX CHAPTER 1 INTRODUCTION                     1 1.1 Background And Motivation                      1 1.2 Thesis Objectives                          4 1.2.1 Ubiquitous Computing Environment                  4 1.2.2 Wearable Computing Environment                   5 1.3 System Overview                          6 1.3.1 System Overview of “Ubiquitous Computing Environment”       6 1.3.2 System Overview of “Wearable Computing Environment”        7 1.4 Thesis Organization                         9 CHAPTER 2 RELATED TECHNIQUES                 10 2.1 Subspace Speech Enhancement                    10 2.2 HMM’s Keyword Spotting Recognition                13 2.2.1 Introduction Of HMM’s Keyword Spotting Recognition        13 2.2.2 System Architecture                        13 2.2.3 Two-level CBSM                         15 CHAPTER 3 SYSTEM ARCHITECTURE                 16 3.1 System Architecture of Our “Ubiquitous Computing Environment”   16 3.2 System Architecture of Our “Wearable Computing Environment”    20 CHAPTER 4 EXPERIMENTAL DESIGNS                 21 4.1 Experimental Equipments                      21 4.1.1 The Equipments of Our “Ubiquitous Computing Environment”     21 4.1.2 The Equipments of Our “Wearable Computing Environment”      21 4.1.3 Microphone (Audio-Technica AT9842)               22 4.1.4 Six-in Preamplifier Card                      23 4.1.5 QUATECH DAQP-16(16-bit PCMCIA Analog Input Card)         24 4.1.6 Bluetooth Headset                       27 4.1.7 Bluetooth USB Adapter                     28 4.2 Experimental Program Integration                 29 4.2.1 C Call Matlab                         29 4.2.2 User Interface                         29 4.3 Experimental Environment                      34 4.4 Experiment Results                         36 4.4.1 Experiment Results of The “Ubiquitous Computing Environment”   36 4.4.2 Experiment Results of The “Wearable Computing Environment”    37 4.5 Results Analysis                          38 CHAPTER 5 CONCLUSION AND FUTURE WORK           39 REFERENCES                            40

    [1]http://vismod.media.mit.edu/vismod/demos/smartroom/
    [2]http://www.liteye.com/
    [3]Sadaoki Furui, Tomohisa Ichiba, Takahiro Shinozaki, Edward W.D. Whittaker, and Koji Iwano, "Cluster-Based Modeling for Ubiquitous Speech Recognition," Proceedings of International Conference on Speech Communication and Technology (Interspeech 2005), Lisbon, Portugal, pp.2865-2868 (2005-9).
    [4]S. Furui: “Speech recognition technology in the ubiquitous/wearable computing environment”, Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Istanbul, pp. 3735-3738 (2000)
    [5] A. Pentland: “Wearable intelligence”, Scientific American, Vol. 276, No. 11 (Nov. 1998)
    [6]B. J. Rhodes, N Minar, and J. Weaver. “Wearable computing meets ubiquitous computing: Reaping the best of both worlds.” In Proc. of 3rd Intl. Symp. on Wearable Computers (ISWC'99), Oct 1999.
    [7]Stefanie Tellex. ,“Relational Interface for a Voice Controlled Wheelchair”. May 17, 2005.
    [8]Simpson, R. C.; Levine, S. P.,” Voice Control of a Powered Wheelchair”, pp. 122-125, JUNE 2002
    [9]Tamura, T.; Hisakuni, T.,” Developing a distance education support system - switching multiple media devices automatically under voice control”, Sept. 2004
    [10]Y. Ephraim, H. Lev-Ari and W. J. J. Roberts, “A brief survey of Speech Enhancement,’’ The Electronic Handbook, CRC Press, April 2005
    [11]J. S. Lim, Ed., “Speech Enhancement.” Englewood Cliffs, NJ: Prentice-Hall, 1983.
    [12]Jounghoon Beh, Robert H. Baran, and HANSEOK KO, “Dual Channel Based Speech Enhancement Using Novelty Filter for Robust Speech Recognition in Automobile Environments”, IEEE International Conference on Consumer Electronics., pp243-244, Jan, 2006
    [13]J. S. Lim and A. V. Oppenheim, “Enhancement and bandwidth compression of noisy speech,” Proc. IEEE, vol. 67, no. 12, pp. 1586-1604, Dec. 1979.
    [14]J. Makhoul et al., “Removal of Noise From Noise-Degraded Speech Signals, Panel on removal of noise from a speechhoke National Research Council.” Washington, DC: National Academy, 1989.
    [15]D. O’Shaughnessy, “Enhancing speech degraded by additive noise or interfering speakers,” IEEE Commun. Mag., pp. 46-52, Feb. 1989.
    [16]S. F. Boll, “Speech enhancement in the 1980’s: Noise suppression with pattem matching,” in Advances in Speech Signal Processing (S. Furui and M. M. Sondhi, Eds.). New York: Marcel Dekker, 1992.
    [17]Y. Ephraim, “Statistical model based speech enhancement systems,’’ Proc. IEEE, vol. 80, no. 10, pp. 1526-1555, Oct. 1992.
    [18]Y. Ephraim and H. L. Van Trees, “A signal subspace approach for speech enhancement,” IEEE Transactions on Speech and Audio Processing, vol. 3, no. 4, pp. 251-266, July 1995
    [19]A. Rezayee and S. Gazor, “An adaptive KLT approach for speech enhancement,” IEEE Trans. Speech Audio Processing, vol. 9, pp. 87-95, Feb. 2001.
    [20]Chung-Hsien Wu and Yeou-Jiunn Chen, “Multi-Keyword Spotting of Telephone Speech Using a Fuzzy Search Algorithm and Keyword-Driven Two-Level CBSM,” Speech Communication, Vol.33, 2001, pp.197-212. (NSC88-2622-E-006-007)(SCI, EI).
    [21]Huang, E.F., Wang, H.C.,. An efficient algorithm for syllable hypothesization in continuous Mandarin speech recognition. IEEE Trans. Speech Audio Processing 2 (3), 446±449,1994.
    [22]http://www.quatronix-cn.com/pdf/acq/DAQP_16.pdf
    [23]http://inc2.ucsd.edu/~tewon/my_person.html
    [24]http://web.media.mit.edu/~paris/ica.html

    下載圖示 校內:2008-08-21公開
    校外:2008-08-21公開
    QR CODE