簡易檢索 / 詳目顯示

研究生: 丁川偉
Ting, Chuan-Wei
論文名稱: 因素分析模型於語音辨識之研究
Factor Analysis and Modeling for Speech Recognition
指導教授: 簡仁宗
Chien, Jen-Tzung
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 117
中文關鍵詞: 語音因素分析
外文關鍵詞: speech recognition, factor analysis
相關次數: 點閱:115下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在建立語音辨識系統的過程中,統計分析與機器學習扮演了相當重要的角色,而在語音辨識的領域當中,常存在有因訓練環境與測試環境不同所產生的不匹配問題,因此,系統的強健性一直是語音處理上一門嚴峻的課題。在文獻當中,已有大量的方法被提出並探討強健性的議題,一般來說,可以依語音處理的順序將這些方法分為層面處理:在訊號空間處理、在語音特徵參數空間處理以及在語音模型上做處理。在訊號空間上做處理,主要的觀點是在進入辨識的程序前,將噪音抑制或增強雜訊語音中的語音訊號部份,進而使訓練與測試的資料皆處於匹配的乾淨訊號空間以解決不匹配問題;對於語音特徵參數空間的處理方式上,則是儘可能取萃取出較不受語者變異或是語音環境所影響的語音特徵參數;而在模型空間上,則是將目前所訓練好的模型,利用與測試語料條件較為接近的調適語料,調適至適合測試語料的環境。
    本論文主要依據機器學習的觀點,並提出了因素分析(factor analysis)模型在多種不同的層面解決自動化語音辨識中強健性的議題,更進一步來說,我們所提出的研究包含有:(1)建立了一套新穎的子空間模型化與子空間選擇的方法,藉由從語音訊號的主要子空間與次要子空間中所萃取出的細致語音資訊,提升噪訊語音辨識的效能;(2)提出一個因素分析串流式隱藏式馬可夫模型(factor analyzed streamed hidden Markov model, FASHMM)架構,除了對原始特徵參數做轉換外,更對於每個因素之特徵參數以個別的馬可夫模型做為描述;(3)設計出具備自我學習能力的可調變式活化因素分析於隱藏式馬可夫模型拓撲機制,由新進資料當中,自動找出不存在於既有模型中的語音變異性,並依據變異特性以不同的拓撲架構新增於既有模型中以完成模型的更新;(4)在TIMIT、AURORA2與WSJ資料庫,實現及評估本論文所提之各種方法。
    在語音辨識的實驗中我們採用了不同的安排來評估所提出的各種方法在語音訊號增強、語音特徵參數轉換與模型分流化、模型選擇以及隱藏式馬可夫模型拓撲上的效能。實驗結果顯示所提出的方法在不同的評估當中皆達到令人滿意的效果。在語音訊號增強技術中,除了利用因素分析的特性,我們於兩個子空間中個別最小化語音失真的能量以達到更佳之語音品質,並且根據因素分析的特性,利用假設檢定的機制設計了選取最佳子空間的法則,此兩種技術皆可有效提升噪訊語音訊號的訊噪比以及噪訊語音的辨識率;在語音特徵參數的串流化中,我們採用因素分析的特徵參數轉換,並以個別的馬可夫鍊描述每個共同因子與特殊因子,來達到串流式的隱藏式馬可夫模型,實現過程中考量不同特徵參數個數與狀態個數,都顯著地提高了語音辨識率;另外在隱藏式馬可夫模型拓撲上,我們在每個不同的訓練子集合中,循序地學習到各種不同的發音辨異性,且調整與學習模型的拓撲,此外我們並提出一個以因素分析為出發點,量測隱藏式馬可夫模型中狀態間相似度的法則,增進隱藏式馬可夫模型拓撲中自適學習的效能。本論文中所提出的所有方法與分析探討,可提供機器學習和語音辨識之學者重要的研究參考。

    Statistical analysis and machine learning play a crucial role for building flexible speech recognition systems. The robustness issue is known as a highly-impacting topic in speech recognition because the mismatch between training and test environments always exists in real-world applications. In the literature, there are quite many works proposed to deal with the robustness issue. In general, these works tackled this issue by using the statistical learning methods in three spaces; signal space, feature space, and model space. In signal space, we may reduce the noise interference or enhance the noisy speech for resolving the mismatch problem prior to the recognition stage. In feature space, our objective is to find a robust feature representation which is insensitive to the variations due to the noises, channels and speakers. In the model space, we aim to adapt the current model to meet the test conditions, or equivalently capture the characteristics from adaptation data that is closer to new environments.
    In this dissertation, we are motivated from the machine learning perspective and present the factor analysis (FA) approaches in signal space, feature space and model space for dealing with the robustness issue in automatic speech recognition. More specifically, we present several studies including the works of: (1) developing a novel subspace modeling and selection approach which is proposed by extracting the delicate speech information from principal subspace and minor subspace of speech signals for noisy speech recognition, (2) developing a FA streamed hidden Markov model (FASHMM) framework where the acoustic features are analyzed and transformed by FA principle, and the individual Markov chain is applied to the transformed features corresponding to the same common factor, (3) building a flexible FA-activated HMM topology with a self-learning capability so as to learn the new pronunciation variations from the ceaselessly input data, and (4) implementing and evaluating the proposed methods by using the TIMIT, AURORA2 and WSJ speech corpora.
    We conduct different sets of experiments to evaluate the performance of speech recognition by using the proposed new methods including speech enhancement, acoustic feature streaming, model selection, and HMM topology. Experimental results showed that the proposed methods achieved the desirable performance in different evaluations. In speech enhancement, we minimized the energies of speech distortion in the principal subspace as well as in the minor subspace so as to estimate the clean speech with residual information. Following the FA principle, we explored the optimal subspace selection via solving the hypothesis test problems. We increased the signal-to-noise ratios (SNRs) and improved the recognition accuracies in noisy speech recognition. In acoustic feature streaming, we performed the FA feature transformation and adopted the individual Markov chain for streamed HMM modeling. Speech recognition performance was significantly improved by different realizations of streaming in the numbers of features and states. In HMM topology, we sequentially learned the pronunciation variations and adapted the topology at different learning epochs. An FA similarity measure between two HMM states was proposed and shown effective in adaptive learning of HMM topology. All of the methods proposed in this dissertation are helpful for the researchers or scientists working on the related topics.

    中文摘要 I ABSTRACT III 致  謝 V TABLE OF CONTENTS VI LIST OF TABLES IX LIST OF FIGURES X Chapter 1 Introduction 1 1.1 Motivations 8 1.2 Outline of This Dissertation 8 1.3 Contributions of This Dissertation 10 Chapter 2 Background Survey 12 2.1 Statistical Speech Recognition 12 2.2 Factor Analysis 13 2.2.1 Maximum Likelihood Estimation 13 2.2.2 Principal Component Method 14 2.2.3 Principal Factor Analysis 15 Chapter 3 Factor Analyzed Subspace Modeling and Selection 17 3.1 Subspace Modeling 17 3.1.1 Modeling of Noisy Signal 17 3.1.2 Estimation of Clean Signal 20 3.2 Subspace Selection 22 3.2.1 Selection via Testing Equivalence of Eigenvalues 23 3.2.2 Selection via Testing Diagonal Covariance Matrix 25 3.2.3 Illustration and Implementation 26 3.3 Experiments 29 3.3.1 Experimental Setup 29 3.3.2 Evaluation of Waveforms and SNRs of Enhanced Speech 31 3.3.3 Effects of Subspace Modeling on Noisy Speech Recognition 33 3.3.4 Effects of Subspace Selection on Noisy Speech Recognition 35 3.4 Summary 38 Appendix 3-A 38 Appendix 3-B 40 Chapter 4 Factor Analysis for Streamed Hidden Markov Modeling 42 4.1 Factor Analysis 42 4.1.1 Factor Analysis of Acoustic Features 42 4.1.2 FA Parameter Estimation 45 4.2 Streamed Hidden Markov Models 48 4.2.1 HMM Topologies 48 4.2.2 Factor Analysis Streamed HMM 50 4.2.3 FA Parameter Sharing 53 4.2.4 FASHMM Viterbi Algorithm 55 4.3 Experiments 58 4.3.1 Experimental Setup 58 4.3.2 Recognition Results of HMM, SFHMM and FASHMM 60 4.3.3 Evaluation of FA Parameter Sharing in FASHMM 62 4.3.4 Evaluation of FASHMM for Phone Recognition 64 4.3.5 Evaluation of FASHMM for Noisy Speech Recognition 66 4.4 Summary 68 Appendix 4-A 69 Appendix 4-B 70 Chapter 5 Adaptive Factor Analyzed HMM Topology 71 5.1 Related Works 71 5.1.1 Similarities between Two GMMs 71 5.1.2 HMM Topology Learning 74 5.2 Adaptive Factor Analyzed HMM Topology and Parameters 76 5.2.1 Adaptive HMM Topology in State Level 76 5.2.2 Adaptive HMM Topology in Gaussian Level 81 5.2.3 HMM Topologies and Adaptive Learning Algorithm 86 5.3 Experiments 93 5.3.1 Experimental Setup 93 5.3.2 Adaptive HMM Topology 94 5.3.3 Evaluation of AHMMT for Phone Recognition 95 5.3.4 Evaluation of AHMMT for Word Recognition 97 5.4 Summary 100 Chapter 6 Conclusions and Future Works 102 Bibliography 105 作者簡歷 (Author’s Biographical Notes) 114

    Akaike, H., “A new look at the statistical model identification”, IEEE Transactions on Automatic Control, vol. 19, no. 6, pp. 716-723, 1974.
    Anderson, T. W., “Asymptotic theory for principal component analysis”, Annals of Mathematical Statistics, vol. 34, pp.122-148, 1963.
    Anderson, T. W., Introduction to Multivariate Statistical Analysis 2nd Edition, New York: Wiley, 1984.
    Attias, H., “Independent Factor Analysis”, Neural Computation, vol. 11, no.4, pp. 803-851, 1999.
    Basilevsky, A., Statistical Factor Analysis and Related Methods - Theory and Applications, John Wiley & Sons, 1994.
    Biem, A., Ha, J.-Y. and Subrahmonia, J., “A Bayesian model selection criterion for HMM topology optimization”, Proc. of International Conference on Acoustic, Speech, and Signal Processing (ICASSP), vol. 1, pp. 13-17, 2002.
    Boll, S. F., “Suppression of acoustic noise in speech using spectral subtraction”, IEEE Transactions on Acoustic, Speech and Signal Processing, vol. ASSP-27, pp. 113–120, 1979.
    Bourland, H. and Dupont, S., “A new ASR approach based on independent processing and recombination of partial frequency bands”, Proc. of International Conference on Spoken Language Processing (ICSLP), pp. 426-429, 1996.
    Box, G. E. P., “A general distribution theory for a class of likelihood criteria”, Biometrika, vol. 36, pp.317-346, 1949.
    Campbell, M. W., Assaleh, K. T., and Brown, C. C., “Speaker recognition with polynomial classifiers”, IEEE Transactions on Speech and Audio Processing, vol. 10, no. 4, pp. 205-212, 2002.
    Chien, J.-T., “Online hierarchical transformation of hidden Markov models for speech recognition”, IEEE Transactions on Speech and Audio Processing, vol. 7, no. 6, pp. 656-667, 1999.
    Chien, J.-T., “Decision tree state tying using cluster validity criteria”, IEEE Transactions on Speech and Audio Processing, vol. 13, no. 2, pp. 182-193, 2005.
    Chien, J.-T. and Chen, B.-C., “A new independent component analysis for speech recognition and recognition”, IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 4, pp. 1245-1254, 2006.
    Chien, J.-T. and Furui, S., “Predictive hidden Markov model selection for speech recognition”, IEEE Transactions on Speech and Audio Processing, vol. 13, no. 3, pp. 377-387, 2005.
    Chien, J.-T. and Huang, C.-H., “Bayesian learning of speech duration models”, IEEE Transactions on Speech and Audio Processing, vol. 11, no. 6, pp. 558-567, 2003.
    Chein, J.-T. and Liao, C.-P., “Maximum confidence hidden Markov modeling for face recognition”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 4, pp. 606-616, 2008.
    Chien, J.-T. and Ting, C.-W., “Speaker identification using probabilistic PCA model selection”, Proc. of International Conference on Spoken Language Processing (ICSLP), vol. 3, pp. 1785-1788, 2004.
    Chein, J.-T. and Ting, C.-W., “Subspace modeling and selection for noisy speech recognition”, Proc. of International Conference on Spoken Language Processing (INTERSPEECH), pp. 789-792, 2006.
    Chien, J.-T. and Ting, C.-W., “Factor analyzed subspace modeling and selection”, IEEE Transactions on Audio, Speech and Language Processing, vol. 16, no. 1, pp. 239-248, 2008.
    Chien, J.-T. and Ting, C.-W., “Acoustic factor analysis for streamed hidden Markov modeling”, IEEE Transactions on Audio, Speech and Language Processing, vol. 17, no. 7, pp. 1279-1291, 2009.
    Dempster, A. P., Laird, N. M., and Robin, D. B., “Maximum likelihood from incomplete data via the EM algorithm”, Journal of the Royal Statistical Society (B), vol. 39, no. 1, pp. 1-38, 1977.
    Deoras, A. N. and Hasegawa-Johnson, M., “A factorial HMM approach to simultaneous recognition of isolated digits spoken by multiple talkers on one audio channel”, Proc. of International Conference on Acoustic, Speech, and Signal Processing (ICASSP), vol. 1, pp.861-864, 2004.
    Droppo, J. and Acero, A., “Maximum mutual information SPLICE transform for seen and unseen conditions”, Proc. of European Conference on Speech Communication and Technology (INTERSPEECH), pp. 989-992, 2005.
    Droppo, J., Deng, L., and Acero, A., “Evaluation of the SPLICE algorithm on the Aurora2 database”, Proc. of European Conference on Speech Communication and Technology (EUROSPEECH), pp. 217-220, 2001.
    Dupont, S. and Luettin, J., “Audio-visual speech modeling for continuous speech recognition”, IEEE Transactions on Multimedia, vol. 2, no. 3, pp. 141-151, 2000.
    Ephraim, Y. and Malah, D., “Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator”, IEEE Transactions on Acoustic, Speech and Signal Processing, vol. ASSP-32, no. 6, pp.1109-1121, 1984.
    Ephraim, Y. and Van Trees, H. L., “A signal subspace approach for speech enhancement”, IEEE Transactions on Speech and Audio Processing, vol. 3, no. 4, pp. 251-266, 1995.
    Falkhausen, M., Reininger, H., and Wolf, D., “Calculation of distance measures between hidden Markov models”, Proc. of European Conference on Speech Communication and Technology (EUROSPEECH), pp. 1487-1490, 1995.
    Furao, S. and Hasegawa, O., “An incremental network for on-line unsupervised classification and topology learning”, Neural Networks, vol. 19, pp. 90-106, 2006.
    Furui, S., “Recent advances in speaker recognition”, Pattern Recognition Letters, vol. 18, pp. 859-872, 1997.
    Gales, M. J. F., “Maximum likelihood linear transformations for HMM-based speech recognition”, Computer Speech and Language, vol. 12, no. 2, pp. 75-98, 1998.
    Gales, M. J. F. and Young, S. J., “Robust continuous speech recognition using parallel model combination”, IEEE Transactions on Speech and Audio Processing, vol. 4, no. 5, pp. 352-359, 1996.
    Gauvain, J.-L. and Lee, C.-H., “Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains”, IEEE Transactions on Speech and Audio Processing, vol. 2, no. 2, pp.291-298, April 1994.
    Ghahramani, Z. and Jordan, M. I., “Factorial hidden Markov models”, Machine Learning, 29, pp. 245-275, 1997.
    Hämäläinen, A., Bosch, L., and Boves, L., “Modeling pronunciation variation using multi-path HMMs for syllables”, Proc. of International Conference on Acoustic, Speech, and Signal Processing (ICASSP), pp. 781-784, 2007.
    Haykin, S., Neural Networks: A Comprehensive Foundation 2nd Edition, Prentice Hall, 1998.
    He, J., Liu, L., and Gunther, P., “A discriminative training algorithm for VQ-based speaker identification”, IEEE Transactions on Speech and Audio Processing, vol. 7, no. 3, pp. 353-356, 1999.
    Hershey, J. R. and Olsen, P. A., “Approximating the Kullback-Leibler divergence between Gaussian mixture models”, Proc. of International Conference on Acoustic, Speech, and Signal Processing (ICASSP), pp. 317-320, 2007.
    Hirsch, H. G. and Pearce, D., “The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions”, Proc. of ISCA ITRW ASR2000, Paris-France, September 2000.
    Hu, Y. and Loizou, P. C., “A generalized subspace approach for enhancing speech corrupted by colored noise”, IEEE Transactions on Speech and Audio Processing, vol. 11, no. 4, pp. 334-341, 2003.
    Hwang, M.-Y. and Huang, X., “Dynamically configurable acoustic models for speech recognition”, Proc. of International Conference on Acoustic, Speech, and Signal Processing (ICASSP), pp. 669-672, 1998.
    Jitsuhiro, T. and Nakamura, S., “Variational Bayesian approach for automatic generation of HMM topology”, Proc. of IEEE Automatic Speech Recognition and Understanding Workshop, pp. 77-82, 2003.
    Jolliffe, I. T., Principal Component Analysis, Springer-Verlag, 1986.
    Kim, H.-C., Kim, D., and Bang, S.-Y., “Extensions of LDA by PCA mixture model and class-wise features”, Pattern Recognition vol. 36, pp. 1095-1105, 2003.
    Kumar, N. and Andreou, A. G., “Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition”, Speech Communication, vol. 26, no. 4, pp. 283-297, 1998.
    Lamel, L., Kassel, R., and Seneff, S., “Speech database development: design and analysis of the acoustic-phonetic corpus”, Proc. of the DARPA Speech Recognition Workshop, pp. 100-109, 1986.
    Lee, K. F. and Hon, H. W., “Speaker-independent phone recognition using hidden Markov models”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 11, pp. 1641-1648, 1989.
    Leggetter, C. J. and Woodland, P. C., “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models”, Computer Speech and Language, vol. 9, pp.171-185, 1995.
    Logan, B. and Moreno, P., “Factorial HMMs for acoustic modeling”, Proc. of International Conference on Acoustic, Speech, and Signal Processing (ICASSP), pp.813-816, 1998.
    Mackay, D. J. C., “Bayesian interpolation”, Neural Computation, vol. 4, pp. 405-447, 1992.
    Mak, B. and Chan, K.-W., “Pruning hidden Markov models with optimal brain surgeon”, IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, pp. 993-1003, 2005.
    Markov, K. and Nakamura, S., “Never-ending learning with dynamic hidden Markov network”, Proc. of European Conference on Speech Communication and Technology (INTERSPEECH), pp.1437-1440, 2007.
    Merhav, N., “The estimation of the model order in exponential families”, IEEE Transactions on Information Theory, vol. 35, no. 5, pp. 1109-1114, 1989.
    Nadas, A., Nahamoo, D., and Picheny, M. A., “Speech recognition using noise-adaptive prototypes”, IEEE Transactions on Acoustic, Speech, and Signal Processing, vol. 37, no. 10, pp. 1495-1503, 1989.
    Nagao, H., “On some test criteria for covariance matrix”, The Annals of Statistics, vol. 1, no. 4, pp. 700-709, 1973.
    Ostendorf, M. and Singer, H., “HMM topology design using maximum likelihood successive state splitting”, Computer Speech and Language, vol. 11, pp. 17-41, 1997.
    Printz, H. and Olsen, P., “Theory and practice of acoustic confusability”, Proc. of ISCA ITRW ASR2000, pp. 77-84, 2000.
    Rabiner, L. R. and Juang, B.-H., Fundamentals of Speech Recognition, Englewood Cliffs, NJ: Prentice-Hall, 1993.
    Rencher, A. C., Methods of Multivariate Analysis, John Wiley & Sons, 1995.
    Reyes-Gomez, M. J., Raj, B., and Ellis, D. P. W., “Multi-channel source separation by factorial HMMs”, Proc. of International Conference on Acoustic, Speech, and Signal Processing (ICASSP), pp. 664-667, 2003.
    Reynolds, D. A., “Speaker identification and verification using Gaussian mixture speaker models”, Speech Communication, vol. 17, pp. 91-108, 1995.
    Reynolds, D. A., and Rose, R.C., “Robust text-independent speaker identification using Gaussian mixture speaker models”, IEEE Transactions on Speech and Audio Processing, vol. 3, pp. 72-83, 1995.
    Rissanen, J., “A universal prior for integers and estimation by minimum description length”, The Annals of Statistics, vol. 11, no. 2, pp. 416-431, 1983.
    Roch, M. and Hurtig, R. R., “The integral decode: a smoothing technique for robust HMM-based speaker recognition”, IEEE Transactions on Speech and Audio Processing, vol. 10, no. 5, pp. 315-324, 2002.
    Rosti, A.-V. I. and Gales, M. J. F., “Factor analyzed hidden Markov models for speech recognition”, Computer Speech and Language, vol. 18, no. 2, pp. 181-200, 2004.
    Saul, L. K. and Rahim, M. G., “Maximum likelihood and minimum classification error factor analysis for automatic speech recognition”, IEEE Transactions on Speech and Audio Processing, vol. 8, no. 2, pp. 115-125, 2000.
    Schwarz, G., “Estimating the dimension of a model”, The Annals of Statistics, vol. 6, no. 2, pp. 461-464, 1978.
    Silva, J. and Narayanan, S., “Average divergence distance as a statistical discrimination measure for hidden Markov models”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 3, pp. 890-906, 2006.
    Singh, R., Raj, B. and Stern, R., “Structured redefinition of sound units by merging and splitting for improved speech recognition”, Proc. of International Conference on Spoken Language Processing (ICSLP), 2000.
    Srivastava, M. S., Methods of Multivariate Statistics, John Wiley & Sons, 2002.
    Takami, J. and Sagayama, S., “A successive state splitting algorithm for efficient allophone modeling”, Proc. of International Conference on Acoustic, Speech, and Signal Processing (ICASSP), vol. 1, pp. 573-576, 1992.
    Ting, C.-W. and Chien, J.-T., “Factor analysis of acoustic features for streamed hidden Markov modeling”, Proc. of IEEE Automatic Speech Recognition and Understanding Workshop, pp. 30-35, 2007.
    Ting, C.-W. and Chien, J.-T., “Factor analyzed HMM topology for speech recognition”, Proc. of International Conference on Spoken Language Processing (INTERSPEECH), 2009.
    Ting, C.-W., Lee, K.-Y., and Chien, J.-T., “Adaptive HMM topology for speech recognition”, Proc. of International Conference on Spoken Language Processing (INTERSPEECH), pp. 127-1240, 2008.
    Tipping, M. E. and Bishop, C. M., “Mixtures of probabilistic principal component analyzers”, Neural Computation, vol. 11, pp. 443-482, 1999.
    Varga, A. P. and Moore, R. K., “Hidden Markov model decomposition of speech and noise”, Proc. of International Conference on Acoustic, Speech, and Signal Processing (ICASSP), pp. 845-848, 1990.
    Vasko Jr., F. C., El-Jaroudi, A., and Boston, J. R., “An algorithm to determine hidden Markov topology”, Proc. of International Conference on Acoustic, Speech, and Signal Processing (ICASSP), vol. 6, pp. 3577-3580, 1996.
    Vertanen, K., “Baseline WSJ acoustic models for HTK and SPHINX: training recipes and recognition experiments”, Technical Report, Cavendish Laboratory, 2006.
    Vetter, R., Virag, N., Renevey, P., and Vesin, J.-M., “Single channel speech enhancement using principal component analysis and MDL subspace selection”, Proc. of European Conference on Speech Communication and Technology (EUROSPEECH), pp. 2411-2414, 1999.
    Vihola, M., Harju, M., Salmela, P., Suontausta, J., and Savela, J., “Two dissimilarity measures for HMMs and their application in phoneme model clustering”, Proc. of International Conference on Acoustic, Speech, and Signal Processing (ICASSP), pp. 933-936, 2002.
    Virtanen, T., “Speech recognition using factorial hidden Markov models for separation in the feature space”, Proc. of International Conference on Spoken Language Processing (INTERSPEECH), pp.89-92, 2006.
    Wang, W. and O’Shaughnessy, D., “Noise adaptation for robust AURORA 2 noisy digit recognition using statistical data mapping”, Proc. of International Conference on Spoken Language Processing (ICSLP), pp. 125-128, 2004.
    Watanabe, S., Sako, A., and Nakamura, A., “Automatic determination of acoustic model topology using variational Bayesian estimation and clustering for large vocabulary continuous speech recognition”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 3, pp. 855- 872, 2006.
    Wu, J. and Huo, Q., “An environment compensated minimum classification error training approach and its evaluation on Aurora2 database”, Proc. of International Conference on Spoken Language Processing (ICSLP), pp. 453-457, 2002.
    Xu, M. and Golay, M. W., “Data-guided model combination by decomposition and aggregation”, Machine Learning, vol. 63, pp. 43-67, 2006.
    Yapanel, U., Hansen, J. H. L., Sarikaya, R., and Pellom, B., “Robust digit recognition in noise: an evaluation using the AURORA corpus”, Proc. of European Conference on Speech Communication and Technology (EUROSPEECH), pp. 209-212, 2001.
    Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., and Woodland, P., The HTK Book, Cambridge University Speech Group, 2000.

    下載圖示 校內:立即公開
    校外:2009-07-27公開
    QR CODE