簡易檢索 / 詳目顯示

研究生: 高宏仁
Kao, Hung-Jen
論文名稱: 整合盲訊號分離和子空間語音增強之泛在聲控系統
Integrating Blind Source Separation and Subspace Speech Enhancement for Ubiquitous Voice Control System
指導教授: 王駿發
Wang, Jhing-Fa
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2008
畢業學年度: 96
語文別: 英文
論文頁數: 68
中文關鍵詞: 語音辨識語音增強聲控
外文關鍵詞: Voice control, Speech enhancement, Speech recognition
相關次數: 點閱:110下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 語音是人們相互溝通時所使用最簡單和最普遍的方式,因此『聲控』一直是人們的一個夢想,如果未來所有的3C產品都可以用聲音來控制,那生活將會變得非常便利。
    雖然現在市面上也有一些聲控產品,然而它們大多數所採用的方法是:語者與麥克風之間為近距離,因為當二者之間距離漸遠,雜訊和遠距離會使得辨識效果漸差。本論文共提出二種遠距離聲控系統,第一種系統是先利用混音器來將每個地方的麥克風信號混合成單一訊號,提高收音範圍和處理速度,再使用單聲道子空間語音增強方法來改善遠距離背景白雜訊的影響,最後經過端點偵測,取出有效語音段,送至由HMM Tool Kit(HTK)訓練中文語料的語音辨識器。在麥克風的配置方面,我們選用多麥克風平均分配的方式取代麥克風陣列,將每支麥克風的信號送到混音器,取代使用價格昂貴的多通道錄音卡來錄音,降低佈置成本;再透過本論文的無所不在的聲控系統應用到家電控制加以實現。
    第二種系統是考慮到實際環境有時出現信噪比較低,例如雜訊為電視機聲、吵雜人聲等,單通道子空間語音增強的方法無法達到令人滿意的增強效能,因此針對環境雜訊的不同,我們另外提出整合未知聲音分離和單通道子空間語音增強方法,針對會出現信噪比較低的環境來設置麥克風陣列,增強語音效能以達到提高系統辨識率的目的。
    我們設定兩組不同的環境以及使用兩種不同的雜訊,實驗結果顯示運用此兩種系統可以得到較好的辨識率。

    Speech is the easiest and commonest way of communication for people, thus “voice control” is always pursued by people. If the 3C products can be controlled by voice in the future, they will become convenient and friendly in our life.
    Most of the voice-controlled products are used in a short-distance way to record command voice, since the long distance and interference will significantly degrade the performance of recognition. Hence we proposed two far-field ubiquitous voice-control systems to upgrade the noise reduction and recognition rate. For the stationary environment, a mixer is first exploited to mix the multi-channel signals to single–channel signal for increasing the scope of recording and speeding up the computation. Then single-channel subspace speech enhancement method is applied to reduce the background noise. Finally, the speech segments are retained by end-point detection, and recognized by HMM’s mandarin Chinese keyword recognizer.
    About the consideration of the microphone setup in this system, we exploit uniform-microphone-location instead of microphone-array to reduce the numbers of microphone and cost of related recorded devices.
    The second system is for low signal-noise-ratio (SNR) and a nonstationary environment. We further proposed a novel architecture combining convolutive blind source separation with subspace speech enhancement to suppress babble noise and background noise using microphone-array setup.
    We set up two noisy environments containing different noises and two microphone setups. The experimental results show that superior recognition rates can be obtained in two systems.

    摘要 I Abstract III 誌謝 V Contents VI Table List VIII Figure List IX Chapter 1 Introduction 1 1.1 Background and Motivation 1 1.2 Related Works 3 1.3 Thesis Objectives 5 1.3.1 Ubiquitous Voice Control for Stationary Noise Environment 6 1.3.2 Ubiquitous Voice Control for Nonstationary Noise Environment 6 1.4 Thesis Organization 8 Chapter 2 Microphone Layout and Design for Ubiquitous Environment 9 2.1 Design Criteria for Microphone Layout 9 2.2 Microphone Layout in NCKU Aspire Home 10 Chapter 3 Ubiquitous Voice Control for Stationary Noise Environment 12 3.1 Introduction to Subspace Speech Enhancement 12 3.2 Introduction to HMM’s keyword recognizer 15 3.3 Microphone Layout in Our Lab 17 3.4 System Flow of “Ubiquitous Voice Control for Stationary Noise Environment” 18 Chapter 4 Ubiquitous Voice Control for Nonstationary Noise Environment 24 4.1 Introduction to Convolutive Blind Source Separation 24 4.2 Microphone Layout in Our Lab 26 4.3 System Flow of “Ubiquitous Voice Control for Nonstationary Noise Environment” 27 Chapter 5 Experimental Designs 31 5.1 Experimental Equipments 31 5.1.1 The Equipments of Our “Ubiquitous Voice Control for Stationary Noise Environment” 31 5.1.2 Microphone Phantom Power Supply Board 31 5.1.3 Ten-Channel Microphone Mixer (MX-802) 33 5.1.4 Digital IO Controller 34 5.1.5 The Equipments of Our “Ubiquitous Voice Control for Nonstationary Noise Environment” 38 5.1.6 Microphone (Sony ZS90) 38 5.1.7 Multi-channel Microphone Pre-Amplifier Board 39 5.2 Experimental Program Integration 41 5.2.1 C Call Matlab and C# Call C 41 5.2.2 User Interface 42 5.2.3 Implementation of Voice Activity Detection 42 5.2.4 Integrating the Motion Detection for Home Security 43 5.2.5 Integration with Other System 45 5.3 Experimental Environment 46 5.4 Experiment Results 49 5.4.1 Experiment Results of The “Ubiquitous Voice Control for Stationary Noise Environment” 49 5.4.2 Experiment Results of The “Ubiquitous Voice Control for Nonstationary Noise Environment” 50 5.5 Results Analysis and Comparison 51 Chapter 6 Conclusions and Future Works 53 References 54

    References
    [1] http://vismod.media.mit.edu/vismod/demos/smartroom/
    [2] http://www.liteye.com/
    [3] Docio-Fernandez, Laura / Gelbart, David / Morgan, Nelson (2003): "Far-field ASR on inexpensive microphones", In EUROSPEECH-2003, 2141-2144.
    [4] Kai-Yuan Chiang,” Ubiquitous Voice Control System: Using HMM's Keyword Spotting Recognition With Subspace Speech Enhancement”, Department of Electrical Engineering National Cheng Kung University,Tainan,Taiwan,R.O.C, July 2005
    [5] K‥ohler J. Multilingual phone models for vocabulary-independent speech recognition tasks. Speech Communication, 35(1-2):21–30, Aug. 2001.
    [6] C.L. Huang, C-H Wu, “Generation of Phonetic Units for Mixed-Language Speech Recognition Based on Acoustic and Contextual Analysis”, Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, R.O.C. (2007)
    [7] Stefanie Tellex. ,“Relational Interface for a Voice Controlled Wheelchair”. May 17, 2005.
    [8] Simpson, R. C.; Levine, S. P.,” Voice Control of a Powered Wheelchair”, pp. 122-125, JUNE 2002
    [9] http://neural.cs.nthu.edu.tw/jang/
    [10] Y. Ephraim, H. Lev-Ari and W. J. J. Roberts, “A brief survey of Speech Enhancement,’’ The Electronic Handbook, CRC Press, April 2005
    [11] J. S. Lim, Ed., “Speech Enhancement.” Englewood Cliffs, NJ: Prentice-Hall, 1983.
    [12] Jounghoon Beh, Robert H. Baran, and HANSEOK KO, “Dual Channel Based Speech Enhancement Using Novelty Filter for Robust Speech Recognition in Automobile Environments”, IEEE International Conference on Consumer Electronics., pp243-244, Jan, 2006
    [13] J. S. Lim and A. V. Oppenheim, “Enhancement and bandwidth compression of noisy speech,” Proc. IEEE, vol. 67, no. 12, pp. 1586-1604, Dec. 1979.
    [14] J. Makhoul et al., “Removal of Noise From Noise-Degraded Speech Signals, Panel on removal of noise from a speechhoke National Research Council.” Washington, DC: National Academy, 1989.
    [15] D. O’Shaughnessy, “Enhancing speech degraded by additive noise or interfering speakers,” IEEE Commun. Mag., pp. 46-52, Feb. 1989.
    [16] S. F. Boll, “Speech enhancement in the 1980’s: Noise suppression with pattem matching,” in Advances in Speech Signal Processing (S. Furui and M. M. Sondhi, Eds.). New York: Marcel Dekker, 1992.
    [17] Y. Ephraim, “Statistical model based speech enhancement systems,’’ Proc. IEEE, vol. 80, no. 10, pp. 1526-1555, Oct. 1992.
    [18] Y. Ephraim and H. L. Van Trees, “A signal subspace approach for speech enhancement,” IEEE Transactions on Speech and Audio Processing, vol. 3, no. 4, pp. 251-266, July 1995
    [19] A. Rezayee and S. Gazor, “An adaptive KLT approach for speech enhancement,” IEEE Trans. Speech Audio Processing, vol. 9, pp. 87-95, Feb. 2001.
    [20] http://www.advantech.tw/products/S-232-to-S-422-485-Converters/mod_GF-5V6L.aspx
    [21] L. Parra and C. Spence, “Convolutive blind source separation of nonstationary sources,” IEEE Transactions on Speech and Audio Processing, vol. 8, no. 3, pp. 320-327, May 2000.
    [22] Hsiao-Ping Lee, Jhing-Fa Wang, Jia-Ching Wang, and Hung-Jen Kao, “Critical-Band Convolutive Blind Signal Separation With Linear- Prediction Based Signal Selection”, The 8-th Phonetics Conference of China (PCC 2008), Beijing, April 18-20, 2008.
    [23] Shoji Makino, Hiroshi Sawada, Shoko Araki, “Blind Audio Source Separation Based on Independent Component Analysis,” ICASSP, April 2007
    [24] H. Sawada, R. Mukai, S. Araki, S. Makino, "Real-time blind extraction of dominant target sources from many background interferences," International Workshop on Acoustic Echo and Noise Control (IWAENC 2005), Sep. 2005.
    [25] http://www.codeproject.com/KB/audio-video/Motion_Detection.aspx
    [26] http://www.advantech.tw/products/15-ch-Digital-I-O-Module/mod_1-2MLDEW.aspx
    [27] I. Potamitis, K. Georgila, N. Fakotakis, G. Kokkinakis,” An Integrated System for Smart-Home Control of Appliances Based on Remote Speech Interaction” , EUROSPEECH 2003, pp. 2197-2200 , Sept. 1-4, 2003.
    [28] http://www.geocities.com/ferocious_1999/md/micpreamp2.html

    下載圖示 校內:2009-08-20公開
    校外:2011-08-20公開
    QR CODE