| 研究生: | 徐正書 Hsu, Cheng-Shu | 
|---|---|
| 論文名稱: | 基於支向機與MPEG-7低階聲音描述子之家庭環境聲音分類器 Home Environmental Audio Classifier Based on SVM and MPEG-7 Audio Low-level Descriptors | 
| 指導教授: | 王駿發 Wang, Jhing-Fa | 
| 學位類別: | 碩士 Master | 
| 系所名稱: | 電機資訊學院 - 電機工程學系 Department of Electrical Engineering | 
| 論文出版年: | 2004 | 
| 畢業學年度: | 92 | 
| 語文別: | 英文 | 
| 論文頁數: | 40 | 
| 中文關鍵詞: | 聲音分類 、支援向量機 | 
| 外文關鍵詞: | MPEG-7, SVM, audio classification | 
| 相關次數: | 點閱:63 下載:1 | 
| 分享至: | 
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 | 
  在我們生活的環境中,每一種聲音都有其獨特性。我們常常可以藉由環境聲音的特質來辨識出聲音,進而判斷周遭的狀況。例如火災發生時的警報聲。如果能針對這些聲音資訊做分類及辨識,對於瞭解周遭的環境將有很大的幫助。尤其對於聽障者或自動保全系統。此外,聲音資訊在現今的電腦與多媒體應用中,是不可或缺的一部分。有非常大量的資訊是以音訊的檔案格式紀錄下來。運用聲音分類,也有助於搜尋我們想要的聲音片段。
  本論文中,提出一基於支援向量機與MPEG-7低階聲音描述子之家庭環境聲音分類器。我們使用spectrum centroid, spectrum spread和spectrum flatness這三個MPEG-7低階聲音描述子作為系統的音訊特徵,並提出一結合支援向量機與最近鄰居法的分類器。我們收集了十二類共五百七十二筆的音訊資料庫。針對此資料庫,我們的分類器最高可以達到85.1%的辨識正確率。
  Different kinds of sound have different properties in our life environment, and we can make out surroundings by recognizing and understanding these properties of environmental sounds. For example, when we hear the fire alarm sound, we can judge there must be fire happening. It will be a great help to us for monitoring surrounding environment if we can classify and identify in accordance with the sound information, especially for the deaf person and security system. Besides, as mentioned in the former article, large amount of information is recorded in files format of audio. Making use of audio classification will be contributive to us for searching the audio segment we want.
  In this thesis, we present a home environmental audio classifier based on support vector machine (SVM) and MPEG-7 audio low-level descriptors. We take three MPEG-7 audio low-level descriptors, spectrum centroid, spectrum spread and spectrum flatness, as the features for sound classification and propose a classification method combined SVM and KNN (K-nearest neighbor). We collect an audio database contained 527 wav files of 12 classes. Experiments demonstrate the proposed sound classifier can achieve an 85.1% classification rate.
[1]	Tzanetakis, G., G. Essl, and P. Cook,“ Automatic Musical Genre Classification of Audio Signals, ” In Proc. Int. Symposium on Music Information Retrieval (ISMIR),Bloomington, Indiana (2001)
[2]	Tzanetakis, G., and P. Cook, “A Framework for Audio Analysis Based on Classification and Temporal Segmentation,”  In Proc. EUROMICRO Conf., vol. 2 (1999) 61-67
[3]	Tzanetakis, G., and Cook, P. “Multifeature audio segmentation for browsing and annotation,” In Proc. IEEE Workshop on Applications of Signal Processing to Audio and   Acoustics, WASPA99, New Paltz, NY (1999)
[4]	Tzanetakis, G., and Cook, P. “Sound analysis using MPEG-compressed audio,” In Proc. Int. conf on Audio, Speech and Signal Processing, ICASSP (2000)
[5]	Wold, E. et al., “Content-based classification, search, and retrieval of audio,” IEEE Multimedia, vol. 3, no. 2 (1996) 27-36
[6]	Jonathan T. Foote. et al., “Content-Based Retrieval of Music and Audio,” Multimedia Storage and Archiving Systems II, Proc. of SPIE, Vol. 3229, (1997) 138-147,
[7]	Guodong Guo & Stan Z. Li , “Content-Based Audio Classification and Retrieval by Support Vector Machines,” IEEE Transactions on Neural Network, Vol. 14, No. 1, January (2003)
[8]	Sung-Bae Cho & Hong-Hee Won , “Machine Learning in DNA Microarray Analysis for Cancer Classification,” Proceedings of the First Asia-Pacific bioinformatics conference on Bioinformatics (2003)
[9]	Dell Zhang & Wee Sun Lee , “Question Classification using Support Vector Machines,” Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval , July (2003)
[10]	Mpeg-7 overview (version 9). ISO/IEC JTC1/SC29/WG11 N5525, March 2003.
[11]	Text of international standard ISO/IEC 15938-4 information technology - multimedia content description interface - part 4: Audio. ISO/IEC 15938-4, 2002.
[12]	C. Cortes and V. Vapnik, “Support vector networks,”Machine Learning, vol. 20, pp. 273-297, 1995.
[13]	V. Vapnik, The Nature of Statistical Learning Theory. New York: Springer, 1995.
[14]	V. Vapnik, Statistical Learning Theory. New York: Wiley, 1998
[15]	B. Schölkopf, S. Mika, C. Burges, P. Knirsch, K.-R. Müller, G. Rätsch, and A. Smola, “Input space vs. feature space in kernel-based methods,” IEEE Transactions on Neural Networks, vol. 10, no. 5, pp. 1000-1017, 1999
[16]	E. Osuna, R. Freund, and F. Girosi, “Support vector machines: Training and applications,” Tech. Rep. AIM-1602, MIT A.I. Lab.,1996
[17]	V. Vapnik, Estimation of Dependences Based on Empirical Data. Springer-Verlag, 1982
[18]	C. J. C. Burges, “A tutorial on support vector machines for pattern recognition,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121-167, 1998.
[19]	Smola and B. Schölkopf, “A tutorial on support vector regression,” Tech. Rep. NC2-TR-1998-030, Neural and Computational Learning II, 1998
[20]	J. C. Burges and B. Schölkopf, “Improving the accuracy and speed of support vector learning machines,” in Advances in Neural Information Processing Systems 9 (M. Mozer, M. Jordan, and T. Petsche, eds.), pp. 375-381, Cambridge, MA: MIT Press, 1997.
[21]	M. Schmidt, “Identifying speaker with support vector networks,” in Interface '96 Proceedings, (Sydney), 1996.
[22]	S. Ben-Yacoub, Y. Abdeljaoued, and E. Mayoraz, “Fusion of face and speech data for person identity verification,” IEEE Transactions on Neural Networks, vol. 10, no. 5, pp. 1065-1074, 1999
[23]	E. Osuna, R. Freund, and F. Girosi, “An improved training algorithm for support vector machines,” in 1997 IEEE Workshop on Neural Networks for Signal Processing, pp. 276-285, 1997
[24]	G. Fung, O. L. Mangasarian, and J. Shavlik, “Knowledge-based support vector machine classifiers,” in Advances in Neural Information Processing, 2002.
[25]	T. Joachims, “Text categorization with support vector machines: learning with many relevant features,” in Proceedings of ECML-98, 10th European Conference on Machine Learning (C. Nédellec and C. Rouveirol, eds.), (Chemnitz, DE), pp. 137-142, Springer Verlag, Heidelberg, DE, 1998.
[26]	K. Crammer and Y. Singer, “On the learnability and design of output codes for multiclass problems,” in Computational Learning Theory, pp. 35-46, 2000
[27]	K.-R. Müller, A. Smola, G. Rätsch, B. Schölkopf, J. Kohlmorgen, and V. Vapnik, “Predicting time series with support vector machines,” in Articial Neural Networks - ICANN'97 (W. Gerstner, A. Germond, M. Hasler, and J.-D. Nicoud, eds.), pp. 999-1004, 1997.
[28]	S. Mukherjee, E. Osuna, and F. Girosi, “Nonlinear prediction of chaotic time series using support vector machines,” in 1997 IEEE Workshop on Neural Networks for Signal Processing, pp. 511-519, 1997.
[29]	F. E. H. Tay and L. Cao, “Application of support vector machines in financial time series forecasting,” Omega, vol. 29, pp. 309-317, 2001.
[30]	L. J. Cao, K. S. Chua, and L. K. Guan, “c-ascending support vector machines for financial time series forecasting,” in 2003 International Conference on Computational Intelligence for Financial Engineering (CIFEr2003), (Hong Kong), pp. 317-323, 2003.
[31]	H. Drucker, C. J. C. Burges, L. Kaufman, A. Smola, and V. Vapnik, “Support vector regression machines,” in Advances in Neural Information Processing Systems, vol. 9, p. 155, The MIT Press, 1997.
[32]	R. Fletcher, Practical methods of optimization. Chichester and New York: John Wiley and Sons, 1987.
[33]	M. Aizerman, E. Braverman, and L. Rozonoer, “Theoretical foundations of the potential function method in pattern recognition learning,” Automations and Remote Control, vol. 25, pp. 821-837, 1964.
[34]	N. J. Nilsson, Learning machines: Foundations of trainable pattern classifying systems. McGraw-Hill, 1965.
[35]	R. Courant and D. Hilbert, Methods of Mathematical Physics. Interscience, 1953.
[36]	C.-W. Hsu and C.-J. Lin. “A comparison of methods for multi-class support vector machines,” IEEE Transactions on Neural Networks, vol. 13, pp. 415-425, 2002