簡易檢索 / 詳目顯示

研究生: 宋豪靜
Sung, Hao-Ching
論文名稱: 以支援向量機為基礎之新穎語者切換偵測方法
A Novel Approach for Speaker Change Detection Based on Support Vector Machine
指導教授: 王駿發
Wang, Jhing-Fa
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2005
畢業學年度: 93
語文別: 英文
論文頁數: 49
中文關鍵詞: 語者切換偵測支援向量機
外文關鍵詞: Speaker Change Detection, Support Vector Machine, SVM
相關次數: 點閱:115下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   在我們生活的環境中,每一種聲音都有其獨特性。我們常常可以藉由音訊的特性來區別不同的語者,進而作音訊的切音。如果能針對切音段作分類及辨識,對於資造的收集將會有很大的幫助。此外,聲音資訊在現今的電腦與多媒體應用中,是不可或缺的一部份,有非常大量的資訊是以殷訊的檔案格式紀錄下來,運用聲音切音,也有助於搜尋我們想要的音訊片段。
      本論文中,提出以支援向量機為基礎之新穎語者切換偵測方法。我們提出的新穎方法與傳統不同,首先,我們設計一串的實驗及評估,來針對新穎及傳統方法來偵測不同語者切換作比較,接著,提出一套語者偵測切換系統,最後用中文及英文音訊做實驗,我們的系統有較好的效果。

     Different kinds of sounds have different properties in our life environment. So we often discriminate different speakers by recognizing and understanding these properties of speech signals, and segment the speech. It will be a great help to us for speech segmentation if we can classify and identify in accordance with the speech segmentations. Besides, as mentioned in the former article, large amount of information is recorded in files format of speech. making use of speech segmentation will be contribution to us for searching the speech segmentation we want
     In this thesis, we propose a novel approach for speaker change detection based on Support Vector Machine. Our method is completely different traditional method. First, we design a serious of experiments and evaluations for detecting abilities of the speaker change by comparison between traditional and novel algorithms. Next, we propose a system for the speaker change detection. After experiments of Chinese speech and English speech, our system has the better performance.

    CHAPTER 1 INTRODUCTION 1 1.1 Background 1 1.2 Thesis Objectives 2 1.3 Thesis Organization 2 CHAPTER 2 PREVIOUS WORK 3 2.1 Algorithms for Audio Segmentation 3 2.2 Introduction to Bayesian Information 7 2.3 Bayesian Information Criterion Segmentation 8 CHAPTER 3 PROPOSE SPEAKER CHANGE DETECTION USING SUPPORT VECTOR MACHINE 12 3.1 Introduction to Support Vector Machine 12 3.1.1 Optimal Hyperplane for Linear Separable Patterns 13 3.1.2 Optimal Hyperplane for Nonseparable Patterns 16 3.2 Change Detection using Support Vector Machine 18 CHAPTER 4 DETECTABILITY COMPARISON BETWEEN SVM AND BIC 23 4.1 Experiment 1: Use different window size to detect the speaker change point 23 4.2 Experiment 2: Use same window size to detect the speaker change point 26 4.3 Experiment 3: Use the same window size to scan a speaker change point of the audio stream 28 4.4 Experiment 4: Use the same window size to define the misclassification ratio and the threshold for the same speaker of the audio stream 31 CHAPTER 5 SPEAKER CHANGE DETECTION SYSTEM 33 5.1 Potential Speaker Change ( SC ) Detection process 33 5.1.1 Type I Potential SC 33 5.1.2 Type II Potential SC 34 5.2 Speaker Change ( SC ) Confirmation Process 36 5.3 Speaker Change ( SC ) Merging Process 37 CHAPTER 6 SYSTEM PERFORMANCE EVALUATION & DISCUSSION 39 6.1 Training The Penalty And The Threshold of Misclassification Ratio 40 6.2 Performance Evaluation Under Various Confirmation And Merging Procedures 42 CHAPTER 7 CONCLUSION AND FUTURE WORK 46 REFERENCES 47

    [1] R. Bakis, S. Schen, P. Gopalakrishnan, R. Gopinath, S. Maes, and L. Polymenakos, ” Transcription of broadcast news show with the IBM large vocabulary speech recognition system, “ in Proc. IEEE ICASSP-97: Int. Conf. Acoust., Speech, and Signal Proc., Munich, Germany, 1997, pp. 711-714
    [2] J. Gauvain, L. Lamel, and G. Adda, “ The LIMSI broadcast news transcription system, “ Speech commun., vol. 11, no.4, pp. 345-348
    [3] M. Siegler, U. Jain, B.Raj, and R. Stern, “Automatic segmentation, classification and clustering of broadcast news audio, “ in Proc. DARPA Speech Recognition Workshop, Feb, 1997,pp. 97-99
    [4] T. Hain, S. E. Hahnson, A. tuerk, P. C. Woodland, and S. J. Young, “ Segment generation and clustering in the HTK broadcast news transcription system, ” in Proc. 1998 DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, VA, 1998, pp. 133-137
    [5] G.. Schwarz, “ Estimating the dimension of a model, “ Ann. Statist., vol.6, pp. 461-464, 1978
    [6] A.Raftery, “ Bayesian Model Selection in Social Research. Technical Report, ” Dept. of Statistic, Univ. Washington, Seattle, 1994
    [7] H. Akaike, ” A new look at the statistical identification model, “ IEEE Trans. Automat. Contr., vol. 19, pp. 716-723,1974
    [8] S. Chen, E. Eide, M.Gales, R. Gopinath, D. Kanevsky, and P. Olsen, “ Recent improvements to IBM’s speech recognition system for automatic transcription of broadcast news, “ in Proc. DARPA Broadcast News Transcription Workshop, 1999
    [9] W. Chou and W. Reichl, “ Decision tree state tying based on penalized Bayesian Information Criterion, “ in Proc. IEEE ICASSP-99; Int. Conf. Acoust., Speech, Signal Process., 1999, pp.345-348
    [10] W. Reichl and W. Chou, “ Decision tree state tying based on segmental clustering for acoustic modeling, “ in Proc. IEEE ICASSP-98: Int. Conf. Acoust., Speech, Signal Process., May 1998, pp. 801-804.
    [11] S. Chen and P. Gopalakrishnan, “ Speaker, environment and channel change detection and clustering via the Bayesian information criterion, “ in Proc. DARPA Broadcast News Transcription Understanding Workshop, Feb. 1998, pp. 127-132
    [12] C. Cortes and V. Vapnik, “ Support vector networks, ”Machine Learning, vol. 20, pp. 273-297, 1995.
    [13] V. Vapnik, The Nature of Statistical Learning Theory. New York: Springer, 1995.
    [14] V. Vapnik, Statistical Learning Theory. New York: Wiley, 1998
    [15] B. Schölkopf, S. Mika, C. Burges, P. Knirsch, K.-R. Müller, G. Rätsch, and A. Smola, “ Input space vs. feature space in kernel-based methods, ” IEEE Transactions on Neural Networks, vol. 10, no. 5, pp. 1000-1017, 1999
    [16] E. Osuna, R. Freund, and F. Girosi, “ Support vector machines: Training and applications, ” Tech. Rep. AIM-1602, MIT A.I. Lab.,1996
    [17] V. Vapnik, Estimation of Dependences Based on Empirical Data. Springer-Verlag, 1982
    [18] C. J. C. Burges, “ A tutorial on support vector machines for pattern recognition, ” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121-167, 1998.
    [19] Smola and B. Schölkopf, “ A tutorial on support vector regression, ” Tech. Rep. NC2-TR-1998-030, Neural and Computational Learning II, 1998
    [20] J. C. Burges and B. Schölkopf, “ Improving the accuracy and speed of support vector learning machines, ” in Advances in Neural Information Processing Systems 9 (M. Mozer, M. Jordan, and T. Petsche, eds.), pp. 375-381, Cambridge, MA: MIT Press, 1997.
    [21] G. Fung, O. L. Mangasarian, and J. Shavlik, “ Knowledge-based support vector machine classifiers, ” in Advances in Neural Information Processing, 2002.
    [22] T. Joachims, “ Text categorization with support vector machines: learning with many relevant features, ” in Proceedings of ECML-98, 10th European Conference on Machine Learning (C. Nédellec and C. Rouveirol, eds.), (Chemnitz, DE), pp. 137-142, Springer Verlag, Heidelberg, DE, 1998.
    [23] K. Crammer and Y. Singer, “ On the learnability and design of output codes for multiclass problems, ” in Computational Learning Theory, pp. 35-46, 2000
    [24] K.-R. Müller, A. Smola, G. Rätsch, B. Schölkopf, J. Kohlmorgen, and V. Vapnik, “ Predicting time series with support vector machines, ” in Articial Neural Networks – ICANN’97 (W. Gerstner, A. Germond, M. Hasler, and J.-D. Nicoud, eds.), pp. 999-1004, 1997.
    [25] S. Mukherjee, E. Osuna, and F. Girosi, “ Nonlinear prediction of chaotic time series using support vector machines, ” in 1997 IEEE Workshop on Neural Networks for Signal Processing, pp. 511-519, 1997.
    [26] F. E. H. Tay and L. Cao, “ Application of support vector machines in financial time series forecasting, ” Omega, vol. 29, pp. 309-317, 2001.
    [27] L. J. Cao, K. S. Chua, and L. K. Guan, “ c-ascending support vector machines for financial time series forecasting, ” in 2003 International Conference on Computational Intelligence for Financial Engineering (CIFEr2003), (Hong Kong), pp. 317-323, 2003.
    [28] H. Drucker, C. J. C. Burges, L. Kaufman, A. Smola, and V. Vapnik, “ Support vector regression machines, ” in Advances in Neural Information Processing Systems, vol. 9, p. 155, The MIT Press, 1997.
    [29] R. Fletcher, Practical methods of optimization. Chichester and New York: John Wiley and Sons, 1987.
    [30] M. Aizerman, E. Braverman, and L. Rozonoer, “ Theoretical foundations of the potential function method in pattern recognition learning, ” Automations and Remote Control, vol. 25, pp. 821-837, 1964.
    [31] N. J. Nilsson, Learning machines: Foundations of trainable pattern classifying systems. McGraw-Hill, 1965.
    [32] R. Courant and D. Hilbert, Methods of Mathematical Physics. Interscience, 1953.
    [33] S. Wegmann, P. Zhan, and L. Gillick, “ Progress in broadcast news transcription at Dragon systems, “ in Proc. IEEE ICASS-99: Inter. Conf. Acoust., Speech, Signal Process., May 1999,1912
    [34] P. Zhan, S. Wegmann, and L. Gillick, “ Dragon system’ 1998 broadcast news transcription system for Mandarin, “ in Proc. DARPA Broadcast News Transcription Workshop,1998
    [35] Zhou, B.; Hansen, J.H.L. “Efficient Audio Stream Segmentation via the Combined T2 Statistic and Bayesian Information Criterion, “Speech and Audio Processing, IEEE Transactions on Volume 13, Issue 4, July 2005 Page(s):467 – 474
    [36] Wang, Hsin-min / Cheng, Shih-sian: " METRIC-SEQDAC: a hybrid approach for audio segmentation, " 1617-1620.

    下載圖示 校內:立即公開
    校外:2005-07-21公開
    QR CODE