簡易檢索 / 詳目顯示

研究生: 馬秀雯
Ma, Hsiu-Wen
論文名稱: 使用音訊與視訊資料融合方法於生物驗證
Audio and visual data fusion for biometrics
指導教授: 簡仁宗
Chien, Jen-Tzung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2004
畢業學年度: 92
語文別: 中文
論文頁數: 104
中文關鍵詞: 最大相似度對數相似度階層式高斯混合模型資料融合
外文關鍵詞: hierarchical gaussian mixture model, data fusion, maximum likelihood, log-likelihood
相關次數: 點閱:74下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   在資訊科技發達的今日,如何正確地驗證使用者身份是確保個人隱私及資訊安全的重要課題。在本篇論文中,我們針對視覺性生物特徵--人臉影像與聽覺性生物特徵--語音,提出了使用高斯混合模型所建構的階層式分類器(hierarchical classifier)的資料融合策略(data fusion),以達更可信且更強健性的驗證,進而增進系統效能。
      先建立各資料類型的高斯混合模型,利用各資料類型的資料與其第一層的高斯混合模型的對數相似度函數(log-likelihood)來建立第二層的高斯混合模型,將各種可能的隨機變化(randomness)用高斯混合模型來表示,當某一子系統受到雜訊干擾導致辨識率較差時或是辨識器的結果差異大時,第二層的高斯混合模型分類器可以考慮其差異性有效彌補其他子系統的辨識結果而達成正確決策。在本論文中,我們利用最大相似度(maximum likelihood) 建立階層式分類器並使用EM 演算法推導出最佳參數組,我們也發展最小分類錯誤(minimum classification error)分類器,達到資料融合的鑑別式訓練效果。
      實驗結果顯示,將本篇論文所提的方法應用於人臉及語音資料庫中,在共32人的資料庫實驗中,我們得到不錯的驗證效果。利用人臉驗證及語者驗證的融合技術再結合言語資訊確認系統,更能設計出兼具系統友善度及強健性之多串流(multi-stream)多模組(multi-modal)驗證效能。

      In real applications, it is important to correctly verify a user identity so that we can keep our personal privacy information safe and secure. This thesis proposes a hierarchical classifier of Gaussian mixture model (GMM) using audio and visual as inputs to establish multi-modal for user authentication system.
      There are two layers of GMMs in the proposed hierarchical classifier. Using the concept of divide-and-conquer, the first-layer GMM is designed to separately perform pattern classification for different stream data. The output likelihood streams serve as a new pattern which integrates the features from different modalities. This new pattern reflects complex characteristics for a testing user. Accordingly, we can motivated to present a second-layer GMM to classify this integrated pattern. When the observation data are deteriorated by noises or different illuminations, the proposed hierarchical classifier is able to effectively combine the stream data and make a correct decision. In this thesis, we present the maximum likelihood and the minimum classification error approaches to build the hierarchical GMM classifier. Importantly, we use expectation-maximization(EM) algorithm to jointly estimate the optimal parameters in different layers.
      Experimental results show that the hierarchical GMM classifier works well on MHMC multi-modal database. This database contains speech and face data from thirty-two persons, including twenty-five males and seven females. We also find that the proposed classifier achieves robust data fusion performance in presence of different noise and illumination conditions. The performances of user verification and identification are investigated.

    第一章 緒論 9 1.1 研究動機 9 1.2 論文主要內容 13 1.3 融合驗證系統流程 14 1.4 章節概要 14 第二章 相關研究探討 15 2.1 前言 15 2.2 人臉辨識 15 2.2.1動態人臉偵測 15 2.2.2 像素色彩轉換 16 2.2.3相似機率計算比對 18 2.2.4 取出影像輪廓並二元化、眼睛輪廓特徵判斷 18 2.3二維小波轉換 21 2.4 語者辨識 23 2.4.1 文句內容相關性 26 2.5 使用者言語資訊確認系統 30 2.5.1 語句分段 32 2.5.2 次音節假設檢定 34 2.5.3 可信度測量 34 第三章 資料融合相關技術 36 3.1 機率法則之資料融合方法 37 3.2 階層式混合專家系統 41 3.3 決策樣版(decision templates)之資料融合方法 42 第四章 階層式高斯混合模型 45 4.1架構圖 48 4.2 高斯混合模型(Gaussian Mixture Model) 49 4.2.1 簡介 49 4.2.2 背景模型 51 4.2.3 Cohort model 52 4.3 階層式高斯混合模型的資料融合方法 53 4.3.1 建立階層式高斯混合模型 54 4.3.2 最大相似度(maximum likelihood)參數估測 57 4.4 最小分類錯誤之鑑別性訓練 66 第五章 實驗 69 5.1 系統架構 69 5.2 人臉特徵擷取 69 5.3 語音特徵擷取 70 5.4 資料庫 71 5.5 實驗環境設定 72 5.6實驗設定 73 5.7 實驗結果 73 5.8 系統展示介紹 82 第六章 結論及未來展望 86 6.1結論 86 6.2未來研究方向 86 參考文獻 88 附錄一、矩陣代數 97 附錄二、MHMC語音資料庫語音內容 98 附錄三、MHMC人臉資料庫範本 99

    [1] L.A. Alexandre, A.C. Campilho, M. Kamel, “On combining classifiers using sum and product rules”, Pattern Recognition Letters 22 pp. 1283-1289, 2001
    [2] M. Alissali and P. Deleglise and A. Rogozan, “Asynchronous integration of visual information in an automatic speech recognition system”, Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 34-37, 1996
    [3] S. Bhattacharyya, T. Srikanthan, and Pramod Krishnamurthy, “Ideal GMM parameters & posterior log likelihood for speaker verification”, Proceedings of IEEE Signal Processing Society Workshop, pp.471-480, 10-12 Sept. 2001
    [4] I. Bloch, “Information combination operators for data fusion: a comparative review with classification”, IEEE Transactions on Systems Man Cybernet.—Part A: Systems Humans 26, pp.52-67,1996
    [5] H. Bourlard and S. Dupont, “Subband-based speech recognition”, IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 2, pp. 1251-1254, 1997
    [6] L. Breiman, “Bagging Predictors,” Machine Learning, Vol. 26, pp. 123-140, 1996
    [7] M.J. Carey, E.S. Parris, S.J. Bennett and L.Thomas, “A comparison of model estimation techniques for speaker verification”, Proc. of International Conference on Acoustic, Speech and Signal Processing .Vol. 2, pp. 1083 –1086, 1997
    [8] I.M. Chagnolleau, G. Durou and F. Bimbot, “Application of time-frequency principal component analysis to text-independent speaker identification,” IEEE Transactions on Speech and Audio Processing, Vol. 10 No.6, pp. 371 –378, 2002
    [9] C.C. Chiang and H.C. Fu, “A divide-and-conquer methodology for modular supervised neural network design”, IEEE International Conference on Neural Networks, pp. 119-124, 1994
    [10] J.-T. Chien, H.-C. Wang and L.-M. Lee, “A novel projection-based likelihood measure for noisy speech recognition” Speech Communication, vol. 24, no. 4, pp. 287-297, 1998
    [11] J.T Chien, C.C Wu, “ Discriminant waveletfaces and nearest feature decisions for face recognition”. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp.1644 –1649, 2002
    [12] L. Daza and E. Acuna, “Combining Classifiers based on Gaussian Mixture,” in the Proc. of the international conference on Computer, Communication and Control Technologies, 2003
    [13] R. O. Duda, P. E. Hart and D.G. Stork, Pattern classification, John Wiley Sons, Inc, 2nd ed., 2001
    [14] S. Dupont and J. Luettin, “Using the multi-stream approach for continuous audio-visual speech recognition: Experiments on the M2VTS database”, Proceedings of International Conference on Spoken Language Processing (ICSLP), vol. 4, pp.1283-1286, 1998
    [15] S. Dupont and H. Bourlard, “Using multiple time scales in a multi-stream speech recognition system”, Proceedings of 5th European Conference on Speech Communication and Technology (EUROSPEECH), vol. 1, pp. 3-6, 1997
    [16] S. Fine, J. Navratil and R.A. Gopinath, “A hybrid GMM/SVM approach to speaker identification”, Proc. of International Conference on Acoustic, Speech and Signal Processing, Vol. 1, pp. 417 –420, 2001
    [17] Y. Freund and R. Schapire, “Experiments with a new boosting algorithm,” in the Proc. of 3rd Machine Learning international conference, pp. 148-156, 1996
    [18] K. Fukunaga, “Introduction to Statistical Pattern Recognition”, Academic Press, second edition, 1991.
    [19] S. Furui, “Cepstral analysis technique for automatic speaker verification”, in IEEE Trans. Acoust. Speech Signal Process. 29(2), pp. 254-272, 1981
    [20] M. J. F. Gales and S. J. Young, “Robust continuous speech recognition using parallel model combination”, IEEE Trans. Speech and Audio Processing, vol. 4, pp. 352-359, 1996
    [21] Y. Gong, “Speech recognition in noisy environments: A survey”, Speech Communication, vol. 16, pp. 261-291, 1995
    [22] R.C. Gonzalez and R.E. Woods, “Digital Image Processing” , Princeton 2nd ed., 2003
    [23] F. Goudail, E. Lange, T. Iwamoto, K. Kyuma and N. Otsu, "Face recognition system using local autocorrelations and multiscale integration", IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.18, no.10, pp.1024-1028, 1996
    [24] J.F. Grandin and M. Marques, “Robust data fusion,” Proceedings of the Third International Conference on Information Fusion, Vol.1, 10-13 July 2000
    [25] C. Griffin, T. Matsui and S. Furui, “Distance measures for text-independent speaker recognition based on MAR model”, Proc. of International Conference on Acoustic, Speech and Signal Processing, Vol. 1, pp. I-309-312, 1994
    [26] L. Hong and A.K. Jain, " Integrating Faces and Fingerprints For Personal Identification", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.20, No.12, pp 1295-1307, 1998.
    [27] T. Isobe and J. Takahashi, “A new cohort normalization using local acoustic information for speaker verification”, Proc. of International Conference on Acoustic, Speech and Signal Processing, vol. 2, pp. 841 -844, 1999
    [28] H. Jiang, and L. Deng, “A Bayesian Approach to the Verification Problem: Applications to Speaker Verification”, IEEE Transactions on Speech and Audio Processing, Vol.9, No.8, pp.874- 884, November 2001
    [29] K. Jonsson, J. Matas, J. Kittler and Y.P. Li, “Learning Support Vectors for Face Verification and Recognition,” the Proc. of 4th IEEE International conference on Automatic Face and Gesture Recognition, pp. 208-213, March 2000
    [30] M.I. Jordan and R.A. Jacobs, “Hierarchical mixtures of experts and the EM algorithm”, Neural Computation, 6(2),pp.181-214, 1994
    [31] H. Kazuyuki, H. Masashi, H. Ken-ichi, M. Hiroshi, M. Taketoshi, and Y. Shuji, “Fast algorithm for online linear discriminant analysis,” IEICE Transactions on Fundamental of Electronics, Communications and Computer Science,Vol.E-A, pp. 1431-14412001
    [32] J. Kittler, F.M. Alkoot, “Sum versus Vote Fusion in Multiple Classifier Systems”, IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 110-115, 2003
    [33] J. Kittler, M. Hatef, R.P.W. Duin and J. Matas, “On combining classifiers,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.20, No.3, pp. 226-239, March 1998
    [34] R. Kuhn, J.C. Junqua, P. Nguyen and N. Niedzielski, “Rapid speaker adaptation in eigenvoice space”, IEEE Transactions on Speech and Audio Processing, Vol. 8 No. 6, pp. 695 -707, Nov. 2000
    [35] R.Kuhn, P. Nguyen, J.C. Junqua, R. Boman, N. Niedzielski, S. Fincke, K. Field and M. Contolini, “Fast speaker adaptation using a priori knowledge”, Proc. of International Conference on Acoustics, Speech, and Signal Processing, vol. 2 pp. 749 –752, 1999
    [36] L.I. Kuncheva, “A Theoretical Study on Six Classifier Fusion Strategies”, IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 281-286, 2002
    [37] L.I. Kuncheva, J.C. Bezdek and R.P.W. Duin, “Decision templates for multiple classifier fusion: an experimental comparison,” Pattern Recognition, pp. 299-314, 2001
    [38] L. Lam and C.Y. Suen, “Optimal combination of pattern classifiers,”Pattern Recognition Letter, pp. 945-954, 1995
    [39] Q. Li, B.H. Juang and Q. Zhou, “Automatic Verbal Information Verification for User Authentication”, IEEE Transactions on Speech and Audio Processing, Vol. 8 No. 5, Page(s): 585 -596, Sept. 2000
    [40] Q. Li and B.H. Juang, “Speaker verification using verbal information verification for automatic enrolment”, Proc. of International Conference on Acoustic, Speech and Signal Processing. Vol. 1, pp. 133 –136, 1998
    [41] S. Z. Li and J.Lu. “Face Recognition Using the Nearest Feature Line Method”. IEEE Trans. Neural Networks, vol.10, no.2, pp.439-443, March 1999
    [42] X. Li, K. Chen, “Mandarin verbal information verification”, Proc. of International Conference on Acoustic, Speech and Signal Processing, Vol. 1, pp. 833-1 -833-6, 2002
    [43] X. Li, Chang and E. B. Dai, “Improving speaker verification with figure of merit training”, Proc. of International Conference on Acoustic, Speech and Signal Processing, Vol. 1, pp. I-693 -I-696, 2002
    [44] C.P. Liao, H.J. Lin, C.C. Huang and J.T. Chien, “Multiple human face detection in complex background”, Proc. of 2002 Computer Graphics Workshop, Tainan-Taiwan, June 2002.
    [45] M. Liu, E. Chang, and B.q. Dai, "Hierarchical Gaussian Mixture Model for Speaker Verification", Proceedings International Conference on Spoken Language Processing, pp.1353-1356,2002
    [46] G. Matteo, M. Dario, and M. Davide, “On the error-reject trade-off in biometric verification systems”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.19, pp.786-796, July 1997
    [47] B.F. Mombot and J.B. Choquel, “A new probabilistic and entropy fusion approach for management of information sources,” Information Fusion, pp. 35-47, 2004
    [48] B.F. Mombot and J.B. Coquel, “An Entropy Method for Multisource Data Fusion,” in the Proc. of the 3rd international conference on Information Fusion, Fusion 2000,Vol. 2, pp. 17-23, 2000
    [49] J. Neyman and E. S. Pearson, “On the use and interpretation of certain test criteria for purpose of statistical inference”, Biometrika, pp.175-240, 1928
    [50] A. Poritz, “Linear predictive hidden Markov models and the speech signal”, Proc. of International Conference on Acoustic, Speech and Signal Processing ,Vol. 1, pp. 1291-1294, 1982
    [51] C. Rama, C.L. Wilson, and S. Saan, “Human and machine recognition of faces: a survey”, Proceedings of the IEEE , Vol.83, pp.705-741, May 1995
    [52] D.A Reynolds and R. C Rose, “Robust test-independent speaker identification using Gaussian mixture speaker models”, IEEE Trans. Speech Audio Process. Vol. 3 pp. 72-83, 1995
    [53] D.A. Reynolds, “Speaker identification and verification using Gaussian mixture speaker models”, Speech Communication, Vol. 17, pp. 91-108, 1995
    [54] D.A. Reynolds, F.Q. Thomas and B.D. Robert, “Speaker Verification Using Adapted Gaussian Mixture Models”, Digital Signal Processing Vol. 10, pp. 19-41, 2000
    [55] B. Roberto, and F. Daniele, “Person identification using multiple cues”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.17, pp.955-966, Oct. 1995
    [56] A. Rogozan and P. Deleglise, “Adaptive fusion of acoustic and visual sources for automatic speech recognition”, Speech Communication, vol. 26, pp. 149-161, 1998
    [57] A. Rosenberg, F. Soong, “Evaluation of a vector quantization talker recognition system in text independent and text dependent models”, Computer Speech and Language, Vol. 22, pp. 143-157, 1987
    [58] C. Sanderson, “Automatic Person Verification Using Speech and Face Information”, PhD Thesis, 2002
    [59] M. Tomlinson, M. Russel and N. Brooke, “Integrating audio and visual information to provide highly robust speech recognition”, IEEE Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), pp.821-824, 1996
    [60] N. Ueda, “Optimal Linear Combination of Neural Networks for Improving Classification Performance,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22 No.2, pp. 207-215, Feb. 2000
    [61] K. Veeramachaneni, L. A. Osadciw, and P. K. Varshney, “Adaptive multimodal biometric fusion algorithm using particle swarm.” Proceedings of SPIE Vol. 5099: Multisensor, Multisource Information Fusion: Architectures, Algorithms, and Applications 2003, pp. 211-221, 2003
    [62] R. Viswanathan and P.K. Varshney, "Distributed detection with multiple sensors: Part I - fundamentals," Proceedings of the IEEE, vol. 85, no. 1, pp. 54-63, Jan. 1997
    [63] Y. Wang, T. Tan and A. K. Jain, "Combining Face and Iris Biometrics for Identity Verification", Proc. of 4th Int'l Conf. on Audio- and Video-Based Biometric Person Authentication (AVBPA), pp. 805-813, Guildford, UK, June 9-11, 2003
    [64] K.Woods, W.P. Kegelmeyer, K. Bowyer, “Combination of multiple classifiers using local accuracy estimates”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, pp. 405-410, 1997
    [65] Y. Xia, L.Henry, and B.Eloi, “Neural data fusion algorithms based on a linearly constrained least square method”, IEEE Transactions on Neural Networks, volume 13, No. 2 pp. 320-329, 2002
    [66] S.B. Yacoub, Y.Abdeljaoued, and E. Mayoraz, “Fusion of Face and Speech Data for Person Identity Verification,”IEEE Transaction on Neural Network, pp.1065-1075, 1999
    [67] Z.R. Yang and M. Zwolinski, “Mutual Information Theory for Adaptive Mixture Models”, IEEE Transactions on Pattern Analysis and Machine Intelligence 23(4),pp.396-403, 2001
    [68] S. Young, J. Jansen, J. Odell, D. Ollason, and P Woodland. The HTK Book (Version 2.0). ECRL, 1995
    [69] 吳佳珍, “以鑑別性小波參數為主之人臉辨識研究系統”, 國立成功大學資訊工程學系碩士論文, Jun 2001
    [70] 洪倩玉,”建立動態線性鑑別式分析於線上人臉辨識與驗證,”國立成功大學資訊工程研究所碩士論文,2003
    [71] 李孝健,”以特徵聲音調整為主之使用者言語資訊確認技術,”國立成功大學資訊工程研究所碩士論文,2003
    [72] FaceOn2000, 星創科技, http://www.faceon.com.tw
    [73] 言豐-語音身分辨別器, Infotalk corp. http://www.infotalkcorp.com/

    下載圖示 校內:立即公開
    校外:2004-07-20公開
    QR CODE