簡易檢索 / 詳目顯示

研究生: 劉榮文
Liou, Rung-Wen
論文名稱: 應用在多媒體影像精采片段擷取之全面性方法
A Comprehensive Approach for Extracting Highlights from Video Media
指導教授: 葉家宏
Yeh, Chia-Hung
郭致宏
Kuo, Chih-Hung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2006
畢業學年度: 94
語文別: 英文
論文頁數: 68
中文關鍵詞: 精采片段索引擷取
外文關鍵詞: extraction, index, video content analysis, highlights
相關次數: 點閱:87下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在此篇論文中,我們提出了一種應用在多媒體影像精采片段擷取的全面性方法。我們提出的方法有別於現存精采片段擷取方法,必須先對特定的片段加以詳細觀察此片段的特性並且利用這些觀察結果去訓練一個對應此片段的模組,最後在利用這模組去擷取出特定的精采片段。我們得方法是直接結合視覺和聽覺的特性去分析輸入的影像序列,並且透過我們系統中的判斷機制自動產生視覺上和聽覺上的節奏分佈曲線。這些節奏分部曲線是由系統中時域裝置自動產生,這些節奏分佈曲線將連接各種低階語意去理解輸入影像序列故事張力的高階語意。根據實驗結果,我們擷取出的精采片段能夠符合人類感官的認知。我們提出的演算法非常實用於在索引、尋找、瀏覽多媒體資料。

    In this thesis, we propose a comprehensive method for extracting highlights from video media. Unlike most current approaches that require specific domain knowledge or models to extract special specific events, our proposed method directly combines visual and audio features to analyze input video programs and automatically generate video tempo curve and audio tempo curve through the decision rules. The tempo curves are generated by time domain mechanism. These tempo curves will bridge the semantic gap to the catch high-level semantics called the story intensity. The proposed algorithm provides a natural way via tempo to segment a video into manageable parts. Furthermore, our proposed method can cover different kinds of sports and movies. According to the experimental results, the detected interesting skimming clips match the human perspective and the proposed algorithm is very useful for indexing, search and browsing multimedia data.

    中文摘要 I Abstract II Acknowledgement III List of Tables VI List of Figures VII Chapter 1 Introduction 1 1.1 Motivation 1 1.2 An Overview of Film Grammar 2 1.3 Story Intensity Representation 3 1.4 Highlights of the Movies 4 1.5 Highlights of the Sports 4 1.6 Thesis Structure 5 Chapter 2 Related Research 6 2.1 An Overview of Related Research 6 2.1.1 Simultaneous or Sequential Fusion 6 2.1.2 Statistical or Knowledge-based Fusion 7 2.2 Visual Feature 11 2.2.1 Motion Estimation 11 2.2.2 Shot Change Detection 13 2.2.3 Transform RGB into HSV 15 2.2.4 Transform RGB into YUV 17 2.2.5 Edge Detection 18 2.3 Audio Feature 19 2.3.1 Energy 20 2.3.2 Zero Crossing Rate 20 Chapter 3 The Proposed Approach 22 3.1 An Overview of Proposed Algorithm 22 3.2 Proposed Algorithm 24 3.2.1 Preprocessing 25 3.2.1.1 Shot Detection 26 3.2.1.2 Keyframe Selection 28 3.2.1.3 Conclusion 32 3.2.2 Low-Level Feature Extraction 32 3.2.2.1 Histogram Calculator 33 3.2.2.2 Motion Vector Calculator 34 3.2.2.3 Zero-Crossing Rate Calculator 36 3.2.2.4 Energy Calculator 37 3.2.2.5 Video threshold determining unit 37 3.2.2.6 Audio threshold determining unit 38 3.2.2.7 Conclusion 39 3.2.3 Audiovisual tempo analysis 39 3.2.3.1 Video tempo generator 41 3.2.3.2 Audio tempo generator 41 3.2.4 Multimodal Data Fusion 42 3.2.4.1 Story tempo generator 42 3.2.4.2 Conclusion 43 Chapter 4 Experimental Results and Discussion 44 4.1 Movies 44 4.1.1 Banlieue 13 45 4.1.2 Kung Fu Hustle 49 4.1.3 Discussion 51 4.2 Sports 52 4.2.1 Basketball 53 4.2.2 Football 58 4.2.3 Soccer 62 4.2.4 Discussion 63 Chapter 5 Conclusions and Future Work 64 5.1 Conclusions 64 5.2 Future Work 65 References 66

    [1]. Y. Li, S. H. Lee, C. H. Yeh & C.-C. Jay Kuo, “Techniques for movie content analysis and skimming,” IEEE Signal Processing Magazine, vol. 23, no. 2, pp. 79-89, 2006.
    [2]. C. H. Yeh, S. H. Lee & C. -C. Jay Kuo, “Content-based video analysis for knowledge discovery, ” Handbook of Pattern Recognition and Computer Vision 3th Edition Version, Editor by Prof. C. H. Chen and Prof. P.S.P. Wang, World Scientific Publishing Co. ISBN: 981-256-105-6.
    [3]. Z. Xiong, R. Radhakrishnan, A. Divakaran, and T.S. Huang, “Highlights Extraction from Sports Video Based on An Audio-Visual Marker Detection Framework,” Proceedings of IEEE International Conferences on Multimedia and Expo., pp.29-32, 2005.
    [4]. A. Hanjalic, “Generic approach to highlights extraction from a sport video,” Proceedings of IEEE International Conference on Image Processing, pp.1-4, 2003.
    [5]. A. Hanjalic, “Multimodal approach to measuring excitement in video,” Proceedings of IEEE International Conferences on Multimedia and Expo, pp. 289-292, 2003
    [6]. L.-Y. Duan, M. Xu, T.-S. Chua, Q. Tian, and C.-S.Xu, “A mid-level representation framework for semantic sports video analysis,” Proceedings of ACM Conference on Multimedia, pp. 33–44, 2003.
    [7]. Y.-L. Chang, W. Zeng, I. Kamel, and R. Alonso, “Integrated image and speech analysis for content-based video indexing,” Proceedings of the IEEE International Conference Multimedia Computing and Systems, pp. 306–313, 1996.
    [8]. K. WAN and C. XU, “Efficient Multimodal Features For Automatic Soccer Highlight Generation,” IEEE Proceedings of the 17th International Conference on Pattern Recognition,pp.973-976,Aug. 2004
    [9]. P. Chang, M. Han, and Y. Gong, “Extract highlights from baseball game video with hidden Markov models,” International Conference on Image Processing, pp.I-609 - I-612, 2002.
    [10]. C.Y. Chao, H.C. Shih, and C.L. Huang, “Semantics-based highlight extraction of soccer program using DBN,” Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings, pp.1057-1060, 2005.
    [11]. H. C. Shih and C. L. Huang, “Detection of the highlights in baseball video program,” IEEE International Conference on Multimedia and Expo., pp.595-598, June 2004.
    [12]. Arijon, D., Grammar of the Film Language, Silman-James Press, 1976.
    [13]. Iain E. G. Richardson, H.264 and MPEG-4 Video Compression, WILEY, 2003.
    [14]. S. H. Lee, C. H. Yeh & C. -C. Jay Kuo, “Automatic movie skimming with story units via general tempo analysis, ” Proceedings of SPIE Electronic Image Storage and Retrieval Methods and Applications for Multimedia, vol. 5307, pp. 396-407, 2004.
    [15]. C. Cotsaces, N. Nikolaidis, and I. Pitas, “Video shot detection and condensed representation,” Signal Processing Magazine, IEEE, vol. 23, pp. 28-37, March 2006.
    [16]. N. Kazakova, M. Margala and N.G. Durdle, “Sobel edge detection processor for a real-time volume rendering system,” Proceedings of the 2004 International Symposium on Circuits and Systems, vol. 2 , pp. II - 913-16, May 2004.
    [17]. S.E. El-Khamy, M. Lotfy and N. El-Yamany, “A modified fuzzy Sobel edge detector,” Radio Science Conference, 2000, pp. C32/1 - C32/9, Feb. 2000.
    [18]. X. Jing and L. P. Chau, “An efficient three-step search algorithm for block motion estimation,” IEEE Transactions on Multimedia, vol. 6, pp. 435 – 438, June 2004.
    [19]. X. Jing and L. P. Chau, “New three-step search algorithm for block motion estimation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 4, pp. 438 – 442, Aug. 1994.
    [20]. C. H. Kuo, M. Shen and C.-C. Jay Kuo, “Fast motion search with efficient inter-prediction mode decision for H.264,” Journal of Visual Communication Image Representation, pp. 217-242, 2006.
    [21]. Bruce Block, The Visual Story: Seeing the Structure of Film, TV, and New Media, Focal Press, 2001.
    [22]. http://en.wikipedia.org/wiki/Main_Page
    [23]. H. W. Chen, “Action movies segmentation and summarization based on tempo analysis” National Taiwan University, Taiwan.

    下載圖示 校內:2007-08-18公開
    校外:2008-08-18公開
    QR CODE