簡易檢索 / 詳目顯示

研究生: 黃志鴻
Huang, Jhih-Hong
論文名稱: 整合視覺特徵與頻繁項目集之視訊註釋方式
A Novel Video Annotation by Integrating Visual Features and Frequent Patterns
指導教授: 曾新穆
Tseng, Shin-Mu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2006
畢業學年度: 94
語文別: 中文
論文頁數: 61
中文關鍵詞: 視訊影片註釋連續樣式資料探勘關聯性法則頻繁項目集以視覺為基礎的註釋方式
外文關鍵詞: Video Annotation, Visual-Based Annotation, Frequent Patterns, Association Rule, Data Mining, Sequential Patterns
相關次數: 點閱:106下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   多媒體搜尋的主要目的是在於如何能有效率地取得使用者想要的多媒體資料。從人類的觀點來看,以往的研究僅利用多媒體的低階特徵值,而這些都無法展示多媒體所代表的意涵概念,所以並不足以幫助使用者取得他們想要的資料。一般來說,替多媒體下註解是一個很好的解決方式。但視訊影片除了視覺特徵之外,還囊括了更多的意涵概念,因此替視訊影片下註解是一個很有挑戰性的研究主題。在本研究中,我們提出了一個整合視覺特徵與頻繁項目集來替視訊影片下註解的方法,我們的方法主要考量了以下兩點:1) 建構三個預測模型,分別是統計模型ModelCRM,規則模型Modelseq和Modelasso。2) 整合上述模型,進而自動地為尚未註解的視訊影片下註解。我們藉由整合方式同時考慮了視覺特徵與意涵概念來增加預測之準確度,經由實驗分析顯示,我們所提出的方法較現有其他方法在視訊註解上具有更高之準確度。

      The major purpose of multimedia data retrieval is to hunt the related multimedia data effectively and efficiently. From human perspective, unfortunately, there is usually not enough semantic support to help users get the accurate results by using only low-level visual features as done in traditional studies. Generally speaking annotation can be a solution for enhancing the accuracy of multimedia data retrieval. Video annotation has been considered as a challenging research topic since videos carry complex scenic semantics in addition to visual features. In this paper, we propose a novel method for visual features and frequent patterns exists in the video. Our proposed method consists of two main phases: 1) construction of three kinds of prediction models, namely association, sequential and statistical models from annotated videos, and 2) fusion of these models for annotating unknown videos automatically. The main advantage of the proposed method lies in that both of visual features and semantic patterns are considered simultaneously through fusion approach so as to enhance the accuracy of annotation. Empirical evaluations show that our approach is very promising in enhancing the annotation accuracy in terms of precision and recall.

    英文摘要 I 中文摘要 III 誌謝   IV 目錄   V 表目錄  VII 圖目錄  VIII 第一章 導論 1 1.1   研究目的 1 1.1.1  研究背景 1 1.1.2  問題描述 2 1.1.3  研究目標 3 1.2   研究方法概述 3 1.3   論文貢獻 5 1.4   論文架構 5 第二章 文獻探討 6 2.1   視訊影片組成 6 2.2   以內容為基礎的視訊影片搜尋技術 7 2.3   結構性的視訊影片索引 8 2.4   以統計機率模型替視訊影片下註解 9 2.5   利用影片之間的連續關聯性來做摘要擷取 10 第三章 研究方法 11 3.1   模型介紹 12 3.2   鏡頭片段切割與關鍵影格擷取 12 3.3   低階特徵值擷取 13 3.4   方法架構 14 3.5   訓練階段 15 3.6.1  統計模型ModelCRM之建置 16 3.6.2  規則模型Modelseq和Modelasso之建置 17 3.5.2.1 編碼 17 3.5.2.2 建置Modelseq模型 18 3.5.2.3 建置Modelasso模型 23 3.6   預測階段 26 3.6.1  統計模型ModelCRM之應用 27 3.6.2  規則模型Modelseq和Modelasso之應用 28 3.6.2.1 編碼 28 3.6.2.2 規則模型Modelseq之預測 28 3.6.2.3 規則模型Modelasso之預測 30 3.6.3  結合機率列表之預測應用 32 第四章 實驗分析 34 4.1   實驗設計 34 4.2   實驗規劃 36 4.2.1  ModelCRM統計模型之切割矩形的方式 36 4.2.2  K-means演算法的分群個數 37 4.2.3  Modelseq中最小支持度與Modelasso中的最小支持度和信賴度 39 4.3   實驗結果 39 4.4   實驗範例 49 4.5   實驗總結 53 第五章 結論 55 5.1   研究結論 55 5.2   未來發展 56 參考文獻 57 作者自述 61

    [1] W. H. Adams, G. Iyengar, C. Y. Lin, M. R. Naphade, C. Neti, H. J. Nock,
      and J. R. Smith, “Semantic Indexing of Multimedia Content Using Visual,
      Audio, and Text Cues,” EURASIP Journal on Applied Signal Proceeding, Vol.
      2003, Issue 2, pp. 170-185, 2003.
    [2] Zaher Aghbari, and Akifumi Makinouchi, “Semantic Approach to Image   
      Database Classification and Retrieval,” NII Journal No. 7, September 2003.
    [3] Rakesh Agrawal, Tomasz Imielinski, and Arun Swami, “Mining Association
      Rules between Sets of Items in Large Databases,” Proceedings of the 1993
      ACM SIGMOD international conference on Management of data, pp. 207 – 216,   1993.
    [4] Rakesh Agrawal, and Ramakrishnan Srikant, “Fast Algorithms for Mining
      Association Rules,” International Proceedings of the VLDB Conference,   
      1994.
    [5] Arnon Amir, Macro Berg, Shih-Fu Chang, Winston Hsu, Giridharan Iyengar,
      Ching-Yung Lin, Milind Naphade, Apostol(Paul) Natsev, Chalapathy Neti,
      Harrien Nock, John R. Smith, Belle Tseng, Yi Wu, and Donqing Zhang, “IBM
      Research TRECVID-2003 Video Retrieval System,” Proceedings of the TRECVID
      2003 Workshop, 2003.
    [6] M. Bertini, A. Del Bimbo, and P. Pala, “Content-Based Indexing and
      Retrieval of TV news,” Pattern Recognition letters – Special Issue on
      Image and Video Indexing, Vol. 22, No. 5, pp. 503-516, April 2001.
    [7] Liping Chen, and Tat-Seng Chua, “A Match and Tiling Approach to Content-
      based Video Retrieval,” 2001 IEEE International Conference on Multimedia
      and Expo, August 2001.
    [8] Andres Dorado, Janko Calic, and Ebroul Izquierdo, “A Rule-Based Video
      Annotation System,” IEEE Transactions on Circuits and Systems for Video
      Technology, Vol. 14, No. 5, May 2004.
    [9] Stefan Eickeler, and Stefan Muller, “Content-based Video Indexing of TV
      Broadcast News Using Hidden Markov Models,” the International Conference
      on Acoustics, Speech and Signal Processing, March 1999.
    [10]Jianping Fan, Ahmed K. Elmagarmid, Xingquan Zhu, Walid G. Aref, and Lide
      Wu, “ClassView: Hierarchical Video Shot Classification, Indexing, and
      Accessing,” IEEE Transactions on Multimedia. Vol. 6, No. 1, February 2004.
    [11]S. L. Feng, R. Manmatha, and V. Lavrenko, “Multiple Bernoulli Relevance
      Models for Image and Video Annotation,” 2004 IEEE Computer Society
      Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 1002-
      1009, 2004.
    [12]G. Iyengar, H. Nock, C. Neti,, and M. Franz, “Semantic Indexing of
      Multimedia Using Audio, Text and Visual Cues,” Proceedings of IEEE
      Internal Conference on Multimedia and Expo, 2002.
    [13]John R. Kender, and Milind R. Naphade, “Visual Concepts for News Story
      Tracking: Analyzing and Exploiting the NIST TRECVID Video Annotation
      Experiment,” Proceedings of the 2005 IEEE Computer Society Conference on
      Computer Vision and Pattern Recognition, June 2005.
    [14]Young-tae Kim, and Tat-Seng Chua, “Retrieval of News Video using Video
      Sequence Matching,” Proceedings of 11th International Multimedia Modeling
      Conference, pp. 68-75, January 2005.
    [15]V. Lavrenko, S. L. Feng, and R. Manmatha, “Statistical Models for
      Automatic Video Annotation and Retrieval,” the International Conference
      on Acoustics, Speech and Signal Processing, May 2004.
    [16]Ching-Yung Lin, Belle L. Tseng, and John R. Smith, “Video Collaborative
      Annotation Forum: Establishing Ground-Truth Labels on Large Multimedia
      Datasets,” Proceedings of the NIST TRECVID Workshop, November 2003.
    [17]Shi Lu, Michael R. Lyu, and Irwin King, “Semantic Video Summarization
      Using Mutual Reinforcement Principle and Shot Arrangement Patterns,”
      Proceedings of 11th International Multimedia Modeling Conference, pp. 60-
      67, January 2005.
    [18]Ying Luo, and Jenq-Neng Hwang, “Video Sequence Modeling by Dynamic
      Bayesian Networks : A Systematic Application from Coarse-to-Fine Grains,”
      IEEE International Conference on Image Processing, September 2003.
    [19]Milind R. Naphade, Ching-Yung Lin, John R. Smith, Belle Tseng, and Sankar
      Basu, “Learning to Annotate Video Database,” SPIE Storage and Retrieval
      for Media Databases, January 2002.
    [20]Yong Rui, Thomas S. Huang, and Sharad Mehrotra, “Constructing Table-of-
      Content for Videos,” ACM Multimedia Systems Journal - Special Issue
      Multimedia Systems on Video Libraries, Vol. 7, No. 5, pp. 359-368,
      September 1999.
    [21]Ramakrishnan Srikant, and Rakesh Agrawal, “Mining Generalized Association
      Rules,” International Proceedings of the VLDB Conference, 1995.
    [22]Martin Szummer, and Rosalind W. Picard, “Indoor-Outdoor Image
      Classification,” IEEE International Workshop on Content-based Access of
      Image and Video Databases, Jan 1998.
    [23]Vincent S. Tseng, and Ming-Hsien Wang, and Ja-Hwung Su, “A New Method for
      Image Classification by Using Multilevel Association Rules,” IEEE
      International Workshop on Managing Data for Emerging Multimedia
      Applications, 2005.
    [24]Vincent S. Tseng, Chon-Jei Lee, and Ja-Hwung Su, “Classify By
      Representative Or Associations(CBROA): A Hybrid Approach for Image
      Classification,” Proceedings of International Workshop on Multimedia Data
      Mining, August 2005.
    [25]Paola Virga, and Pinar Duygulu, “Systematic Evaluation of Machine
      Translation Methods for Image and Video Annotation,” Proceedings of the
      4th International Conference on Image and Video Retrieval, Vol. 3568, July
      2005.
    [26]Cha Zhang, and Tsuhan Chen, “An Active Learning Framework for Content-
      Based Information Retrieval,” IEEE Transaction on Multimedia, Vol. 4, No.
      2, June 2002.
    [27]Di Zhong, HongJiang Zhang, and Shin-Fu Chang, “Clustering Methods for
      Video Browsing and Annotation,” Proceedings of SPIE on Storage and
      Retrieval for Image and Video Databases, 1996.
    [28]Xingquan Zhu, and Xindong Wu, “Sequential Association Mining for Video
      Summarization,” Proceedings of the 2003 IEEE International Conference on
      Multimedia and Expo Vol. 3, pp. 333-336, July 2003.
    [29]Xingquan Zhu, and Xindong Wu, “Mining Video Associations for Efficient
      Database Management,” Proceedings of 18th the Internal Joint Conference
      on Artificial Intelligence, pp. 1422-1424, August 2003.
    [30]Xingquan Zhu, Xindong Wu, Ahmed K. Elmagarmid, Zhe Feng, and Lide Wu,
      “Video Data Mining: Semantic Indexing and Event Detection from the
      Association Perspective,” IEEE Transactions on Knowledge and Data
      Engineering, Vol. 17, No. 5, May 2005.

    下載圖示 校內:2007-08-02公開
    校外:2007-08-02公開
    QR CODE