簡易檢索 / 詳目顯示

研究生: 張盛雄
Chang, Sheng-Xiong
論文名稱: 應用視覺樣版和語意概念的關係於影像和文字之非監督式對應
Unsupervised Alignment of Video and Text Using Visual Pattern and Textual Concept Mapping
指導教授: 吳宗憲
Wu, Chung-Hsien
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2009
畢業學年度: 97
語文別: 中文
論文頁數: 65
中文關鍵詞: 概念式對應視覺樣板語意概念
外文關鍵詞: Concept-based alignment, textual concept, visual pattern
相關次數: 點閱:61下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著大眾傳播的進步和網路普及化,線上觀看多媒體新聞(影像、聲音、文字)已成為現代人獲得新聞的方式之一。然而硬體設備的限制和網路頻寬,收看完整的新聞仍然存在著技術上的瓶頸。因此對新聞進行處理以提供快速預覽具有相當的必要性。在新聞的表現上,主播的文字往往已是媒體工作者事先進行某種程度摘要的結果。然而在新聞影像上卻沒有相對應的機制。
    在本論文中,我們利用主播的文字敘述(Anchor sentences),找尋合適的外場畫面(Field shots)以建立文字和影像對應之機制,俾於使用者進行快速瀏覽。相關的技術可分為下列三項研究主題:1)語意概念(Textual concept)之映射:將文字敘述利用知網(HowNet)對應到其語意層面的概念。2)視覺樣版(Visual pattern)之抽取:將畫面透過分割(Image segmentation)與轉換成視覺化文字(Visual word)等步驟,利用視覺化文字所組成的視覺樣版來加以表示。3)概念式文字/畫面對應(Concept-based alignment):訓練語意概念和視覺樣版的機率模型,並且同時考慮到各自內部間先後順序的關係。最後,利用訓練出的機率模型對文字和畫面進行對應。
    針對公視新聞語料共蒐集了2,842則新聞(平均長度約為90秒)進行訓練,採用本論文提出之方法,最後得出99,440組平行語料。在實驗的部分針對12種分類中,每類隨機選取3則作為測試語料。進行對文字轉語意概念與否,和畫面在抽取視覺樣版時有無進行篩選,以及在對應時同時考慮相鄰畫面文字間關係三個不同的實驗。透過實驗的評估,得知利用我們提出的方法於文字/畫面的對應上,明顯地優於其他沒有使用相同機制的作法。而在最佳解的評估上,在標記5個標準答案的情況下我們的系統已可達67%的滿意度,顯示出本研究所提出之對應方法在要求文字畫面內容一致的情況下已有一定程度的成效。

    Public communication on internet is getting more and more popular. Online broadcasted with varies mediums (videos, voices and texts) are major ways for people to retrieve news. Due to the restriction on hardware devices and bandwidth of internet, transmitting high quality news videos requires some time. Therefore, video processing for quickening previews on the news is important.
    For TV news, the drafts read by anchor are summarizations of the news manually, but videos are not always fitted for the drafts. In this thesis, we proposed a mapping mechanism to search the outfield shots which fits the sentences in drafts for users to browse and catch up on the news quickly. Tree issues were proposed in this thesis: 1) Textual concepts mapping: Terms in the draft are transferred to semantic concepts using HowNet. 2) Extraction of visual patterns: key frames were segmented into shots and represented to visual patterns which are composed by visual words. 3) Concept-based alignment: an alignment model which considers adjacent relations respectively within textual concepts and visual patterns is proposed and used to align textual concepts and visual patterns.
    2,842 MATBN news stories (90 seconds per story) were used to training models, and extracted 99,440 parallel corpuses. In the experiments, 36 news stories were randomly selected for 12 different classes as test corpuses, and several experiments were conducted to evaluate: textual term mapping textual concept or not, region have selected or not, and three different alignment way. The results show that the proposed approach which considers relations between texts and videos are outperformed that other methods. For the performances of proposed approach achieved 67%, show the proposed approach is capable.

    第一章 序論 1 1.1 研究背景與動機 1 1.2 研究目的 2 1.3 文獻回顧與探討 3 1.4 問題描述與研究方法簡介 7 1.5 章節概述 10 第二章 相關技術簡介 12 2.1 知網系統 12 2.1.1 知網簡介 12 2.1.2 知網的原理 12 2.1.3 知網知識詞典的紀錄樣式 12 2.1.4 知識辭典中標識 13 2.1.5 知網所描述的關係 15 2.2 畫面切割系統 17 2.3 尺度不變特徵轉換 18 2.4 視覺文字 22 第三章 系統架構 24 第四章 語意概念之映射 27 4.1 關鍵詞之蒐集 27 4.2 語意概念 29 4.2.1 對應法則 30 4.2.2 未定義詞之處理 31 第五章 視覺模版之抽取 34 5.1 關鍵畫面之擷取 34 5.1.1 場景偵測 34 5.1.2 畫面擷取 35 5.2 視覺樣版之抽取步驟 36 5.2.1 畫面區塊之篩選 36 5.2.2 視覺文字之轉換 39 5.2.3 視覺樣版樣式與轉換 41 第六章 概念式文字/畫面對應 43 6.1 平行語料之蒐集 43 6.2 概念式對應機率模型 44 第七章 實驗結果與討論 49 7.1 實驗環境與設計 49 7.2 實驗結果與分析 51 第八章 結論與未來方向 59 8.1 結論 59 8.2 未來研究方向 59 參考文獻 62

    [1]Alan Hanjalic,and Li-Qun Xu “Affective Video Content Representation and Modeling”, IEEE Trans. on multimedia.Vol.7, No.1, 2005
    [2]Aner. A, and Kender. J.R. “A unified memory based approach to cut, dissolve , key frame and scene analysis”, In proceedings of IEEE int’l Conf. on Image processing, Vol.3 , pp.370-373, 2001
    [3]Bertrand Le Saux and Giuseppe Amato “Image Recognition for Digital Libraries”, In proceedings of ACM Int’l Conf. on Multimedia information retrieval, 2004
    [4]Brian V. Funt and Grahan D. Finlayson “Color constant color indexing”, IEEE Trans. on pattern analysis and machine Intelligence, Vol.17, No.5 ,pp.522-529, 1995
    [5]Chag-Shing Lee, Zhi-Wei Jian, and Lin-Kai Huang “A Fuzzy Ontology and Its Application to News Summarization”IEEE trans. On systems, man, and cybernetics-part B: cybernetics, Vol.35, No.5, October 2005
    [6]Changsheng Xu, Jinjun Wang, Hanqing Lu, and Yifan Zhang “A Novel Framework for Semantic Annotation and Personalized Retrieval of Sports Video”, IEEE Trans. on multimedia, Vol.10, No.3, 2008
    [7]Chad Carson, Serge Belogie, Hayit Greenspan, and Jitendra Malik “Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying”, IEEE Trans. on pattern analysis and machine intelligence, Vol.24, No.8 ,2002
    [8]Chih-Chung Chang and Chih-Jen Lin “LIBSVM: a Library for Support Vector Machines” Last updated: February, 2009
    [9]Christophe Millet, Isabelle Bloch, and Adrian Popescu “Using the Knowledge of Object Colors to Segment Images and Improve Web Image Search”, In proceedings of Int’l Conf. on RIAO, 2007
    [10]David G.Lowe “Distinctive Image Features from Scale-Invariant keypoints”, International Journal of Computer Vision, 2004
    [11]Fu-Ren Lin, and Chia-Han Liang “Storyline-based summarization for news topic retrospection” Decision Support System, Vol. 45, Issue 3, pp.473-490, 2008
    [12]Greg Pass, and Ramin Zabih. “Histogram refinement for content-based image retrieval”, In proceedings of IEEE workshop on application of computer vision, pp. 96-102, 1996
    [13]Hsin-Hsi Chen, June-Jei Kuo, Sheng-Jie Huang, Chuan-Jie Lin, Hung-chia Wung “A Summarization System for Chinese News from Multiple Source”, Journal of the American Society for information Processing and Technology, Vol.54, Issue 13, pp.1224-1236, 2003
    [14]Jae-Gon Kim, Hyun Sung Chang, kyeongok Kang, Munchurl Kim, Jinwoong Kim, Hyung-Myung Kim “Summarization of News Video and Its Description for Content-based Access”, International Journal Imaging System Technology, Vol.13, No.5, pp.267—274, 2004
    [15]Jia-Yu Pan, Hyungjeong Yang, and Christos Faloutsos “MMSS: Multi-modal Story-oriented Video Summarizaiton” In proceedings of Fourth IEEE Int’l Conf. on Data Mining, 2004
    [16]Jianbo Shi and Jitendra Malik “Normalized Cuts and Image Segmentation”, IEEE Trans. on pattern analysis and machine intelligence, Vol.22, No.8, 2000
    [17]Jiangjian Xiao, and Mubarak Shah “Motion Layer Extraction in the Presence of Occlusion Using Graph Cuts” IEEE Trans. on pattern analysis and machine intelligence, Vol.27, No.10, 2005
    [18]John S. Boreczky, and Lawrence A. Rowe “Comparison of video shot boundary detection techniques” Journal of Electronic Imaging, Vol.5, No.2, pp.122-128, 1996
    [19]Josef Sivic and Andrew Zisserman “Video Google: A Text Retrieval Approach to Object Matching in Videos”, In proceedings of 19th IEEE Int’l Conf. on Computer vision, Vol.2, 2007
    [20]Keiji Yanai “WEB IMAGE SELECTION WITH PLSA” In proceedings of IEEE Int’l Conf. on Multimedia, 2008
    [21]M. Jian, J. Dong, and R. Tang, “Combining Color, Texture and Region with Object of User’s Interest for Content-Based Image retrieval”, In processing of Seventh ACIS Int’l Conf. Software Eng., Artificial Intelligence, Networking and Parallel/Distributed Computing, Vol.01, pp. 764-769, 2007.
    [22]Peter F.Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L.Mercer “The Mathematics of Statistical Machine Translation: Parameter Estimation” Computational Linguistics,Vol.19, pp.263-311,1993
    [23]Qiang Zhu, Mei-Chen Yeh, and Kwang-Ting Cheng “Multimodal fusion using learned text concepts for image categorization”, In proceedings of 14th annual ACM Int’l Conf. on Multimedia,2006
    [24]R. Fergus, L. Fei-Fei, P.Perona, and A. Zisserman “Learning object categories form Google’s image search”, In proceedings of IEEE Int’l Conf. on Computer vision, pp.1816-1823, 2005
    [25]R.C.F. Wong, and C.H.C Leing “Automatic Semantic Annotation of Real-World Web Image”, IEEE Trans on pattern analysis and machine intelligence, VOL. 30, NO. 11, 2008
    [26]Timottee Cour, Chris Jordan, Eleni Mitsakaki, and Ben Taskar “Movie/Script: Alignment and Parsing of video and Text Transcription” In proceedings of 10th European Conf. on computer vision: Part IV, pp.158-171, 2008
    [27]Tamara L. Berg, Alexander C. Berg, Jaety Edwards, Michael Maire, Ryan White, Yee-Whye The, Erik Learned-Miller and D .A. Forsyth “Name and Faces in the News”, In proceedings of IEEE Int’l Conf. on Computer vision and pattern recognition, 2004
    [28]Yasuhide Mori Hironobu, Hironobu Yakahashi, and Ryuichi Oka “Image-to Word transformation based on dividing and vector quantizing images with words” In proceedings of first Int’l Conf. on Multimedia intelligent storage and retrieval management, 1999
    [29]Yong Wang, and Shaogang Gong “Translating Topics to Words for Image Annotation”, In proceedings of ACM Int’l Conf. on information and knowledge management, 2007
    [30]Yongqing Sun, Satoshi Shimada, and Masashi morimoto “Visual pattern discovery using web image”, In proceedings of 8th ACM Int’l Conf. on Multimedia information retrieval, pp.127-136, 2006
    [31]Yu-Fei Ma, Lie Lu, Hong-Jiang Zhang and Mingjing Li “A User attention Model for Video Summarization” In proceedings of ACM Int’l Conf. on Multimedia, 2002
    [32]Zheng-Jun Zha, Yuan Liu, Tao Mei, and Xian-Sheng Hua “Video Concept Detection Using Support Vector Machines – TRECVID 2007 Evaluations” Microsoft research Asia, 2008
    [33]維基百科http://en.wikipedia.org (Dynamic time warping)
    [34]維基百科http://en.wikipedia.org (Golden ratio)
    [35]維基百科http://en.wikipedia.org (LSA)
    [36]維基百科http://en.wikipedia.org (SVD)
    [37]中研院詞庫小組http://ckipsvr.iis.sinica.edu.tw/
    [38]董振東 "HowNet Document", Http://www.how-net.com/
    [39]謝國平, “語言學概論”, pp.189-197, 三民書局,1996
    [40]錢震原/著 “新聞新論” No.4, 2003

    下載圖示 校內:立即公開
    校外:2009-08-17公開
    QR CODE