簡易檢索 / 詳目顯示

研究生: 江彥伯
Chiang, Yen-Po
論文名稱: 關聯向量機於語音辨識之研究
Relevance Vector Machine for Speech Recognition
指導教授: 簡仁宗
Chien, Jen-Tzung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2010
畢業學年度: 98
語文別: 中文
論文頁數: 72
中文關鍵詞: 關聯向量語音辨識
外文關鍵詞: RVM, relevance vector machine, speech recognition
相關次數: 點閱:92下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隱藏式馬可夫模型(hidden Markov model)已成功的應用於語音辨識系統中,然而透過最大相似度(maximum likelihood)進行參數估測並沒有考慮到其它相似類別資料的分布,導致在辨識上面容易和相似的類別造成混淆,導致辨識率下降。在本篇論文中,我們透過利用關聯向量機(relevance vector machine)和將各類別的資料分成多個小群,藉由最大分類事後機率資訊法則找出相關語料(relevance tokens),該語料將可有效的反應各類別的統計特性同時亦保有與進爭類別較佳的鑑別能力,利用相關語料作為我們重新估測語音模型的訓練資料來達到比以往更好的分類效果。在實驗方面,我們使用TIMIT為實驗語料庫,使用上述的方法對聲學模型做訓練及進行語音辨識。初步實驗顯示本論文提出的方法可有效的改進語音辨識的正確率。

    The hidden Markov model (HMM) has been successfully applied to speech recognition and many other applications. However, the parameters which are estimated by maximum likelihood (ML) method do not consider the similarities between classes. The resulting performance of speech recognition is significantly degraded. In this paper, we present the relevance vector machine (RVM) and perform Bayesian sparse learning for acoustic hidden Markov modeling. The relevance tokens of different classes are identified according to a posteriori distribution. These tokens are representative with good class discrimination between classes. The hidden Markov models (HMMs) are estimated from the relevance vectors which are more robust than support vectors in using support vector machines. In the experiments, we investigate the proposed RVM-HMMs for speech recognition on TIMIT corpus. The preliminary results show the superiority of the proposed method compared to other methods.

    中文摘要 II Abstract III 致謝 IV 章節目錄 V 圖目錄 VII 表目錄 VIII 第一章 序論 1 1.1 前言 1 1.2 研究動機與目的 2 1.3 章節概要 4 第二章 語音辨識 6 2.1 語音辨識系統 6 2.2 動態時間匹配 7 2.3 隱藏式馬可夫模型 9 2.3.1 評估問題(The evaluation problem) 11 2.3.2 解碼問題(The decoding problem) 13 2.3.3 估測問題(The estimation problem) 14 2.3.3.1最大化相似度估測(maximum likelihood, ML): 14 2.3.3.2 最大化事後機率參數估測(maximum a posteriori , MAP): 17 2.4 支持向量機應用於語音辨識 23 2.4.1 Large margin estimation 23 2.4.2 Soft margin estimation 25 第三章 關聯向量機 28 3.1 關聯向量機概述 28 3.2 關聯向量機回歸 30 3.3 關聯向量機分類 35 第四章 關聯向量機應用於語音辨識 44 4.1 動態時間匹配核心結合關聯向量機 44 4.2 資料分群 52 4.3 關聯向量隱藏式馬可夫模型 53 第五章 實驗 55 5.1 實驗設定 55 5.2 實驗結果及討論 60 第六章 結論與未來研究方向 66 6.1 結論 66 6.2 未來研究方向 67 參考文獻 69

    [1] A. Agarwal and B. Triggs, “3D human pose from silhouettes by relevance vector regression”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 882–888, 2004.
    [2] L. R. Bahl, P. F. Brown, P. V. De Souza, R. L. Mercer, “Maximum mutual information estimation of hidden Markov model parameters for speech recognition”, In Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP’86), pp. 49–52, 1986.
    [3] C. M. Bishop, Pattern Recognition and Machine Learning, New York: Springer. 2006.
    [4] B. Boser, I. Guyon, and V. N. Vapnik. “A training algorithm for optimal margin classifiers”, In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144-152, 1992.
    [5] J.-C. Chen and J.-T. Chien. “Bayesian large margin hidden Markov models for speech recognition”, In Proceedings of International Conference on Acoustics, Speech, and Signal Processing, pp. 3765-3768, 2009.
    [6] S. Chen, S. R. Gunn and C. J. Harris, “The relevance vector machine technique for channel equalization applications,” IEEE Transactions on Neural Networks, vol. 12, pp. 1529-1532, 2001.
    [7] Z. Chen and H. Tang, “Sparse Bayesian approach to classification.” IEEE Networking, Sensing and Control Proceedings, pp. 914–917, 2005.
    [8] C. C. Cheng, F. Sha, and L. K. Saul, “A fast online algorithm for large margin training of continuous density hidden Markov models,” In Proceedings of Interspeech, pp. 668–671, 2009.
    [9] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm”, Journal of the Royal Statistical Society (B), vol. 39, no. 1, pp. 1-38, 1977.
    [10] J. L. Gauvain and C.-H. Lee, “Maximum a posteriori estimation for multivariate Gaussian mixture observation of Markov chains”, IEEE Transactions on Speech and Audio Processing, vol. 2, no. 2, pp. 291-298, 1994.
    [11] J. Hamaker, J. Picone, and A. Ganapathiraju, “A sparse modeling approach to speech recognition based on relevance vector machines,” In Proceedings of the International Conference of Spoken Language Processing, vol. 2, pp. 1001-1004, 2002.
    [12] X. D. Huang, Y. Ariki, and M.A. Jack, Hidden Markov Models for Speech Recognition. Edinburgh Univ. Press, 1990.
    [13] S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,” IEEE Trans. on Signal Processing, vol. 56, no. 6, pp. 2346-2356, June 2008.
    [14] H. Jiang, X. Li and C. Liu, “Large margin hidden Markov models for speech recognition”, IEEE Trans. Audio, Speech and Language Processing, vol. 14, no. 5, pp.1584-1595, 2006.
    [15] B.-H. Juang, W. Chou and C.-H. Lee, “Minimum classification error rate methods for speech recognition”, IEEE Trans. Speech and Audio Processing, vol. 5, no. 2, pp.257–265, 1997.
    [16] L. Lamel, R. Kassel, and S. Seneff, “Speech database development: design and analysis of the acoustic-phonetic corpus”, In Proc. of the DARPA Speech Recognition Workshop, pp. 100-109, 1986.
    [17] N. D. Lawrence and R. Herbrich. A sparse Bayesian compression scheme - the informative vector machine. NIPS Workshop on Kernel Methods, 2001.
    [18] J. Li, M. Yuan, and C. Lee, “Approximate test risk bound minimization through soft margin estimation,” IEEE Trans. on Speech, Audio and Language Processing, vol. 15, no. 8, pp. 2392–2404, 2007.
    [19] X. Li, H. Jiang, and C.-J. Liu, Large margin HMMs for speech recognition, In Proc. of 2005 IEEE International Conference on Acoustic, Speech, and Signal Processing, pp.513-516, 2005.
    [20] X. Li, Large Margin Hidden Markov Models for Speech Recognition, M.S. thesis, Department of Computer Science and Engineering, York University, Canada, 2005.
    [21] C. Liu, H. Jiang and X. Li, “Discriminative training of CDHMMs for maximum relative separation margin,” In Proc. of 2005 IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP'2005), pp.V513-516, Philadelphia, Pennsylvania, March 2005.
    [22] D. J. C. MacKay, “The evidence framework applied to classification networks”, Neural Computation, vol. 4, no. 5, pp. 720-736, 1992.
    [23] D. J. C. MacKay, “Bayesian methods for back propagation networks.” In E. Domany, J. L. van Hemmen, and K. Schulten, editors, Models of Neural Networks III, Chapter 6, pp. 211-254. Springer, 1994.
    [24] I. T. Nabney. Efficient training of RBF networks for classification. In Proceedings of the Ninth International Conference on Artificial Neural Networks (ICANN99), pp. 210-215. 1999.
    [25] R. M. Neal, Bayesian Learning for Neural Networks, Springer, 1996.
    [26] J. Qui˜nonero-Candela and L. K. Hansen, “Time series prediction based on the relevance vector machine with adaptive kernels,” In Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp 985-988, 2002.
    [27] L. R. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition, Englewood Cliffs, NJ: Prentice-Hall, 1993.
    [28] H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-26, no. 1, pp. 43–49, 1978.
    [29] A. Schmolck and R. Everson, “Smooth relevance vector machine: a smoothness prior extension of the RVM,” Machine Learning, vol. 68, no. 2, pp. 107–135, 2007.
    [30] B. Scholkopf, C. J. C. Burges, and A. J. Smola, editors. Advances in Kernel Methods: Support Vector Learning. MIT Press, 1999.
    [31] F. Sha and L. K. Saul, “Large margin Gaussian mixture modeling for phonetic classification and recognition”, In Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 265-268, 2006.
    [32] H. Shimodaira and K.-I. Noma and M. Nakai and S. Sagayama “Dynamic time-alignment kernel in support vector machine”, Advances in Neural Information Processing Systems, pp. 921-928, 2001.
    [33] G. S. V. S. Sivaram, S. K. Nemala “Sparse coding for speech recognition”. In Proceedings of International Conference on Acoustics, Speech, and Signal Processing, pp. 4346 - 4349, 2010.
    [34] A. Thayananthan, R. Navaratnam, B. Stenger, P. Torr, and R. Cipolla. Multivariate relevance vector machines for tracking. In European Conference on Computer Vision, pp. 124–138, 2006.
    [35] M. E. Tipping, “Sparse Bayesian learning and the relevance vector machine” Journal of Machine Learning Research, pp.211-244, 2001.
    [36] M. E. Tipping and A.C. Faul, “Fast marginal likelihood maximisation for sparse Bayesian models,” In Proceedings of Ninth Int. Workshop Artificial Intelligence and Statistics, 2003.
    [37] M. Valstar and M. Pantic. “Combined support vector machines and hidden Markov models for modeling facial action temporal dynamics”, In Proceedings of IEEE Intel Workshop on HCI, pp. 118-127, 2007.
    [38] V. N. Vapnik. Statistical Learning Theory. Wiley, 1998.
    [39] S. Young, G. Evermann, M. Gales, T. Hain, etc, The HTK book (V 3.4), 2006.
    [40] X. Wang, M. Ye, C. J. Duanmu, Classification of data from electronic nose using relevance vector machines, Sensors and Actuators B: Chemical 140, 2009, pp.143–148.
    [41] R. Weiss and D. Ellis, “Estimating single-channel source separation masks: relevance vector machine classifiers vs. pitch based masking,” in Proc. Workshop Statistical Perceptual Audition (SAPA’06), pp. 31-36, 2006.
    [42] O. Williams, A. Blake, and R. Cipolla. “A sparse probabilistic learning algorithm for real-time tracking. In Proceedings of International Conference on Computer Vision, 2003.
    [43] D. Wipf and S. Nagarajan “Sparse estimation using general likelihoods and non-factorial priors” Advances in Neural Information Processing Systems, pp. 963-970, 2009.

    下載圖示 校內:2020-12-31公開
    校外:2020-12-31公開
    QR CODE