| 研究生: |
黃志鴻 Huang, Jhih-Hong |
|---|---|
| 論文名稱: |
整合視覺特徵與頻繁項目集之視訊註釋方式 A Novel Video Annotation by Integrating Visual Features and Frequent Patterns |
| 指導教授: |
曾新穆
Tseng, Shin-Mu |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2006 |
| 畢業學年度: | 94 |
| 語文別: | 中文 |
| 論文頁數: | 61 |
| 中文關鍵詞: | 視訊影片註釋 、連續樣式 、資料探勘 、關聯性法則 、頻繁項目集 、以視覺為基礎的註釋方式 |
| 外文關鍵詞: | Video Annotation, Visual-Based Annotation, Frequent Patterns, Association Rule, Data Mining, Sequential Patterns |
| 相關次數: | 點閱:106 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
多媒體搜尋的主要目的是在於如何能有效率地取得使用者想要的多媒體資料。從人類的觀點來看,以往的研究僅利用多媒體的低階特徵值,而這些都無法展示多媒體所代表的意涵概念,所以並不足以幫助使用者取得他們想要的資料。一般來說,替多媒體下註解是一個很好的解決方式。但視訊影片除了視覺特徵之外,還囊括了更多的意涵概念,因此替視訊影片下註解是一個很有挑戰性的研究主題。在本研究中,我們提出了一個整合視覺特徵與頻繁項目集來替視訊影片下註解的方法,我們的方法主要考量了以下兩點:1) 建構三個預測模型,分別是統計模型ModelCRM,規則模型Modelseq和Modelasso。2) 整合上述模型,進而自動地為尚未註解的視訊影片下註解。我們藉由整合方式同時考慮了視覺特徵與意涵概念來增加預測之準確度,經由實驗分析顯示,我們所提出的方法較現有其他方法在視訊註解上具有更高之準確度。
The major purpose of multimedia data retrieval is to hunt the related multimedia data effectively and efficiently. From human perspective, unfortunately, there is usually not enough semantic support to help users get the accurate results by using only low-level visual features as done in traditional studies. Generally speaking annotation can be a solution for enhancing the accuracy of multimedia data retrieval. Video annotation has been considered as a challenging research topic since videos carry complex scenic semantics in addition to visual features. In this paper, we propose a novel method for visual features and frequent patterns exists in the video. Our proposed method consists of two main phases: 1) construction of three kinds of prediction models, namely association, sequential and statistical models from annotated videos, and 2) fusion of these models for annotating unknown videos automatically. The main advantage of the proposed method lies in that both of visual features and semantic patterns are considered simultaneously through fusion approach so as to enhance the accuracy of annotation. Empirical evaluations show that our approach is very promising in enhancing the annotation accuracy in terms of precision and recall.
[1] W. H. Adams, G. Iyengar, C. Y. Lin, M. R. Naphade, C. Neti, H. J. Nock,
and J. R. Smith, “Semantic Indexing of Multimedia Content Using Visual,
Audio, and Text Cues,” EURASIP Journal on Applied Signal Proceeding, Vol.
2003, Issue 2, pp. 170-185, 2003.
[2] Zaher Aghbari, and Akifumi Makinouchi, “Semantic Approach to Image
Database Classification and Retrieval,” NII Journal No. 7, September 2003.
[3] Rakesh Agrawal, Tomasz Imielinski, and Arun Swami, “Mining Association
Rules between Sets of Items in Large Databases,” Proceedings of the 1993
ACM SIGMOD international conference on Management of data, pp. 207 – 216, 1993.
[4] Rakesh Agrawal, and Ramakrishnan Srikant, “Fast Algorithms for Mining
Association Rules,” International Proceedings of the VLDB Conference,
1994.
[5] Arnon Amir, Macro Berg, Shih-Fu Chang, Winston Hsu, Giridharan Iyengar,
Ching-Yung Lin, Milind Naphade, Apostol(Paul) Natsev, Chalapathy Neti,
Harrien Nock, John R. Smith, Belle Tseng, Yi Wu, and Donqing Zhang, “IBM
Research TRECVID-2003 Video Retrieval System,” Proceedings of the TRECVID
2003 Workshop, 2003.
[6] M. Bertini, A. Del Bimbo, and P. Pala, “Content-Based Indexing and
Retrieval of TV news,” Pattern Recognition letters – Special Issue on
Image and Video Indexing, Vol. 22, No. 5, pp. 503-516, April 2001.
[7] Liping Chen, and Tat-Seng Chua, “A Match and Tiling Approach to Content-
based Video Retrieval,” 2001 IEEE International Conference on Multimedia
and Expo, August 2001.
[8] Andres Dorado, Janko Calic, and Ebroul Izquierdo, “A Rule-Based Video
Annotation System,” IEEE Transactions on Circuits and Systems for Video
Technology, Vol. 14, No. 5, May 2004.
[9] Stefan Eickeler, and Stefan Muller, “Content-based Video Indexing of TV
Broadcast News Using Hidden Markov Models,” the International Conference
on Acoustics, Speech and Signal Processing, March 1999.
[10]Jianping Fan, Ahmed K. Elmagarmid, Xingquan Zhu, Walid G. Aref, and Lide
Wu, “ClassView: Hierarchical Video Shot Classification, Indexing, and
Accessing,” IEEE Transactions on Multimedia. Vol. 6, No. 1, February 2004.
[11]S. L. Feng, R. Manmatha, and V. Lavrenko, “Multiple Bernoulli Relevance
Models for Image and Video Annotation,” 2004 IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 1002-
1009, 2004.
[12]G. Iyengar, H. Nock, C. Neti,, and M. Franz, “Semantic Indexing of
Multimedia Using Audio, Text and Visual Cues,” Proceedings of IEEE
Internal Conference on Multimedia and Expo, 2002.
[13]John R. Kender, and Milind R. Naphade, “Visual Concepts for News Story
Tracking: Analyzing and Exploiting the NIST TRECVID Video Annotation
Experiment,” Proceedings of the 2005 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition, June 2005.
[14]Young-tae Kim, and Tat-Seng Chua, “Retrieval of News Video using Video
Sequence Matching,” Proceedings of 11th International Multimedia Modeling
Conference, pp. 68-75, January 2005.
[15]V. Lavrenko, S. L. Feng, and R. Manmatha, “Statistical Models for
Automatic Video Annotation and Retrieval,” the International Conference
on Acoustics, Speech and Signal Processing, May 2004.
[16]Ching-Yung Lin, Belle L. Tseng, and John R. Smith, “Video Collaborative
Annotation Forum: Establishing Ground-Truth Labels on Large Multimedia
Datasets,” Proceedings of the NIST TRECVID Workshop, November 2003.
[17]Shi Lu, Michael R. Lyu, and Irwin King, “Semantic Video Summarization
Using Mutual Reinforcement Principle and Shot Arrangement Patterns,”
Proceedings of 11th International Multimedia Modeling Conference, pp. 60-
67, January 2005.
[18]Ying Luo, and Jenq-Neng Hwang, “Video Sequence Modeling by Dynamic
Bayesian Networks : A Systematic Application from Coarse-to-Fine Grains,”
IEEE International Conference on Image Processing, September 2003.
[19]Milind R. Naphade, Ching-Yung Lin, John R. Smith, Belle Tseng, and Sankar
Basu, “Learning to Annotate Video Database,” SPIE Storage and Retrieval
for Media Databases, January 2002.
[20]Yong Rui, Thomas S. Huang, and Sharad Mehrotra, “Constructing Table-of-
Content for Videos,” ACM Multimedia Systems Journal - Special Issue
Multimedia Systems on Video Libraries, Vol. 7, No. 5, pp. 359-368,
September 1999.
[21]Ramakrishnan Srikant, and Rakesh Agrawal, “Mining Generalized Association
Rules,” International Proceedings of the VLDB Conference, 1995.
[22]Martin Szummer, and Rosalind W. Picard, “Indoor-Outdoor Image
Classification,” IEEE International Workshop on Content-based Access of
Image and Video Databases, Jan 1998.
[23]Vincent S. Tseng, and Ming-Hsien Wang, and Ja-Hwung Su, “A New Method for
Image Classification by Using Multilevel Association Rules,” IEEE
International Workshop on Managing Data for Emerging Multimedia
Applications, 2005.
[24]Vincent S. Tseng, Chon-Jei Lee, and Ja-Hwung Su, “Classify By
Representative Or Associations(CBROA): A Hybrid Approach for Image
Classification,” Proceedings of International Workshop on Multimedia Data
Mining, August 2005.
[25]Paola Virga, and Pinar Duygulu, “Systematic Evaluation of Machine
Translation Methods for Image and Video Annotation,” Proceedings of the
4th International Conference on Image and Video Retrieval, Vol. 3568, July
2005.
[26]Cha Zhang, and Tsuhan Chen, “An Active Learning Framework for Content-
Based Information Retrieval,” IEEE Transaction on Multimedia, Vol. 4, No.
2, June 2002.
[27]Di Zhong, HongJiang Zhang, and Shin-Fu Chang, “Clustering Methods for
Video Browsing and Annotation,” Proceedings of SPIE on Storage and
Retrieval for Image and Video Databases, 1996.
[28]Xingquan Zhu, and Xindong Wu, “Sequential Association Mining for Video
Summarization,” Proceedings of the 2003 IEEE International Conference on
Multimedia and Expo Vol. 3, pp. 333-336, July 2003.
[29]Xingquan Zhu, and Xindong Wu, “Mining Video Associations for Efficient
Database Management,” Proceedings of 18th the Internal Joint Conference
on Artificial Intelligence, pp. 1422-1424, August 2003.
[30]Xingquan Zhu, Xindong Wu, Ahmed K. Elmagarmid, Zhe Feng, and Lide Wu,
“Video Data Mining: Semantic Indexing and Event Detection from the
Association Perspective,” IEEE Transactions on Knowledge and Data
Engineering, Vol. 17, No. 5, May 2005.