| 研究生: | 林家宏 Lin, Jia-Hung | 
|---|---|
| 論文名稱: | 基於影音句法特徵分析之多媒體資料語義分割 Semantic Segmentation of Multimedia Data Based on Audiovisual Syntactic Features | 
| 指導教授: | 葉家宏 Yeh, Chia-Hung 郭致宏 Kuo, Chih-Hung | 
| 學位類別: | 碩士 Master | 
| 系所名稱: | 電機資訊學院 - 電機工程學系 Department of Electrical Engineering | 
| 論文出版年: | 2006 | 
| 畢業學年度: | 94 | 
| 語文別: | 英文 | 
| 論文頁數: | 72 | 
| 中文關鍵詞: | 攝影機移動 、聲音分割 、貝氏訊息準則 、梅爾倒頻譜係數 | 
| 外文關鍵詞: | MFCC, BIC, audio segmentation, camera motion | 
| 相關次數: | 點閱:127 下載:1 | 
| 分享至: | 
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 | 
本論文提出了以多媒體語義為特徵切割其影音片段。在聲音部分,當語者或場景改變時,此系統可分割出各個語者或場景的語段。在影像部分,提出可偵測攝影時攝影機所做的動作,常常導演依故事情節會以鏡頭做變化,故往後可以利用此偵測方式來辨別故事的表達。
在影像切割部分,此系統從動態估計取得動態向量建立動態向量場,分析攝影機移動特徵對動態向量場的變化,建立出我們所偵測攝影機移動的系統。本論文提出了兩種方式來偵測攝影機移動特徵,一為以散度定理做偵測,依鏡頭拉近拉遠所造成的動態向量場,並可以由此方式取出另外兩動態特徵,此方式只要加減運算即可以估計攝影機移動特徵。另一個方法為以無限脈衝回應濾波器估計攝影機移動特徵,分析此三種動態向量場的變化,考慮其輸入方式就可估計此三種攝影機移動特徵。
在聲音分割部分,我們採用梅爾倒頻譜係數(MFCC)作為輸入特徵向量,並以固定視窗大小的貝氏訊息準則(BIC)來辨別語者改變,以固定視窗大小的貝氏訊息準則可以加快辨別速度。由於所分析出來的值往往有些雜訊所影響,故我們在採用一個最大值濾波器濾除雜訊影響,最後,再以一後處理來降低偵測錯誤率。
In this thesis, we propose techniques for audio and visual segmentation in syntactic features for multimedia. In audio segmentation, we detect the speaker/scene change boundaries between different speakers/scenes, and then separate the audio segments. In visual segmentation, we propose a detection method of the camera motion. The director uses the camera motion and audio intonation to dramatize movies or teleplays. We analyze these characteristics so that we can know dramatization of video.
In visual segmentation, this algorithm uses the motion vectors to estimate the parameters of camera motion. We built the motion vector fields and analyze the fields with camera motion characteristics. We propose two methods which are divergence method and Infinite Impulse Response (IIR) method to estimate the camera motion. We use the divergence theory to estimate the parameter of zoom scenes, and also extract the parameters of pan and tilt by modified equations.
In audio segmentation, we use a feature vector of Mel-Frequency Cepstrum
Coefficients (MFCC) and Bayesian Information Criterion (BIC) with windows of
fixed size to detect the changing speaker. We use the fixed size window of BIC to
speed up calculations. A maximum filter is adopted to remove fault segments caused
by audio noise, and a post process with non-fixed size window is used to reduce false
alarms.
[1] A. V. Oppenheim, and R. W. Schafer, "From frequency to quefrency: a history of the cepstrum," IEEE Signal Processing Magazine, pp. 95-106, 2004
[2] B. Liu, A. Zaccarin, "New fast algorithm for the estimation of block motion vector," IEEE Transactions on Circuits and Systems for Video Technology, pp. 148-157, 1993
[3] B. Logan, "Mel frequency cepstral coefficients for music modeling," In Proc. of the International Conference on Music Information Retrieval (ISMIR 2000), 2000.
[4] C.C. Huang, H. F. Wang, and D.J. Wu, "Automatic scene change detection for composed speech and music sound under low SNR noisy Environment," IEEE Transactions on speech and audio processing, pp. 689-699, 2005.
[5] C. H. Kuo, M. Shen, and C.-C. Jay Kuo, "Motion search algorithm with fast mode decision in H.264 video coding standard," accepted by Journal of Visual Communication and Image Representation.
[6] C. H. Wu, and C. H. Hsieh, "Multiple change-point audio segmentation and classification using an MDL-based Gaussian model," IEEE Transactions on Audio, Speech and Language Processing, pp. 647 - 657, 2006.
[7] C. H. Yeh, S. H. Lee and C. -C. Jay Kuo, "Content-based video analysis for knowledge discovery," Handbook of Pattern Recognition and Computer Vision 3th Edition Version, Editor by Prof. C. H. Chen and Prof. P.S.P. Wang, World Scientific Publishing Co. ISBN:  981-256-105-6.
[8] D. K. Cheng, "Fundamentals of engineering electromagnetics," Addison Wesley, pp. 48-52, 1993.
[9] F. Lari, and A. Zakhor, "Video compression based on camera motion," Conference Record of The Twenty-Sixth Asilomar Conference on Signals, Systems and Computers, pp. 1004-1010, 1992.
[10] F. Lopes, and M. Ghanbari, "Hierarchical motion estimation with spatial transforms," International Conference on Image Processing, pp. 558 - 561, 2000.
[11] Fraleigh, and B. John, "Linear algebra," Addison-Wesley, United States of America, pp. 372-378, 1995.
[12] G. Cote, M. Gallant, and F. Kossentini, "Efficient motion vector estimation and coding for H.263-based very low bit rate video compression," ITU-T SG16, Q15-A-45, 1997
[13] J. Foote, "An overview of audio information retrieval," Multimedia Systems, pp. 2-10, 1999.
[14] J. G. Kim, H. S. Chang, J. Kim, and H. M. Kim, "Efficient camera motion characterization for MPEG video indexing," Proc. IEEE International Conference on Multimedia & Expo, pp. 1171-1174, 2000.
[15] J. S. Kim, and R. H. Park, "A fast feature-based block matching algorithm using integral projections," IEEE Journal on Selected Areas in Communications, pp. 968 - 971, 1992.
[16] J. Y. Tham, S. Ranganath, M. Ranganath, and A. A. Kassim, "A novel unrestricted center-biased diamond search algorithm for block motion estimation," IEEE Transactions on Circuits and Systems for Video Technology, pp. 369-377, 1998
[17] L. K. Liu, and E. Feig, "A block-based gradient descent search algorithm for block motion estimation in video coding," IEEE Transactions on Circuits and Systems for Video Technology, pp. 419-421, 1996.
[18] L. Lu, H. J. Zhang, and H. Jiang, "Content analysis for audio classification and segmentation" IEEE Transactions on speech and audio processing, pp. 504-516, 2002.
[19] L. Rabiner, and B. H. Juang, "Fundamentals of speech recognition," Hall, 1993
[20] P. Bouthemy, M. Gelgon, and F. Ganansia, "A unified approach to shot change detection and camera motion characterization," IEEE Trans. Circuits Syst. Video Technol., pp. 1030-1044, 1999.
[21] P. Kuhn, "Complexity analysis and VLSI architectures for MPEG-4 motion estimation," Boston, MA: Kluwer Academic, 1999.
[22] R. Li, B, Zeng, and M. L. Liou, "A new three-step search algorithm for block motion estimation," IEEE Transactions on Circuits and Systems for Video Technology, pp. 438-444, 1994.
[23] R. Srinivasan, and K. R. Rao, "Predictive coding based on efficient motion estimation," IEEE Transactions on Communications, pp.888-896, 1985.
[24] S. B. Davis, and P. Mermelstenin, "Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences," IEEE Transctions on speech and audio processing, pp. 357-366, 1980.
[25] S. S. Chen, and P. S. Gopalakrishnan, "Speaker, environment and channel change detection clustering via the Bayesian information criterion," Proc. Broadcast News Transcr. And Under. Workshop, 1998.
[26] T. Miyatake, and H. Ueda, Yoshizawa, "Camera work detecting method," United States Patent, 5,267,034, 1993.
[27] T. Koga, K. Iinuma, A. Hirano, Y. Iijima, and T. Ishiguro, "Motion-compensated interframe coding for video conferencing," In Proceedings IEEE National Telecommunication Conference, pp. G5.3.1-G5.3.5, 1981.
[28] Y. Li, S. H. Lee, C. H. Yeh and C.-C. Jay Kuo, "Techniques for movie content analysis and skimming," IEEE Signal Processing Magazine(SCI&EI), pp. 79-89, 2006.
[29] Y. P. Tan, D.D. Saur, S. R. Kulkami, and P. J. Ramadge, "Rapid estimation of camera motion from compressed video with application to video annotation," IEEE Transactions Circuits and Systems for Video Technology, pp. 133-146, 2000.
[30] Y. Su, M. T. Sun, and V. Hsu, "Global motion estimation from coarsely sampled motion vector field and the applications," IEEE Transactions on Circuits and System for Video Technology, pp. 232-242, 2005.