簡易檢索 / 詳目顯示

研究生: 陳彥良
Chen, Yen-Liang
論文名稱: 嶄新的影片文字擷取法
A Novel Method for Text Extraction from Video
指導教授: 郭淑美
Guo, Shu-Mei
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2007
畢業學年度: 95
語文別: 英文
論文頁數: 59
中文關鍵詞: 光學文字辨識圖形切割文字擷取文字偵測
外文關鍵詞: text detection, text extraction, graph cut, optical character recognition, OCR
相關次數: 點閱:162下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在本論文中,我們提出一種嶄新、高效率、以及高準確性的自動影片文字擷取法。由於文字容易表達想法,影片中的文字通常用來傳達關於該影片內容的資訊。然而影片中的文字通常有著不同顏色,大小,以及字型,使得用光學文字辨識系統來辨認這些文字並不容易。因此,在電腦辨識文字影像之前先將文字的區域擷取出來將有助於辨識的結果。但是,文字影像的多樣化仍然造成文字區域的共同特徵相當不易取得。我們發現文字出現的地方必然會跟文字所在的背景有著顏色上的不同,如此一來我們才得以見到這些文字。因此我們利用Laplacian of Gaussian (LoG)濾波器可以找出影像上zero crossing的特性。首先將文字偵測出來並且定位之後,我們用一個統計的方法決定出文字相較於其所在背景的明暗,再使用(LoG)濾波器找出文字與背景交界處zero crossing的兩條邊緣-內部邊緣位在文字區域,而外部邊緣位在背景區域。接著利用內部以及外部邊緣所在處提供的顏色資訊作為樣本自動建立觀察模型,最後將此觀察模型交由圖形切割演算法找出最佳的切割結果以分出文字及背景。經由與一些近年來較佳的方法比較,我們的結果顯示文字可以更清楚且更有效率地從複雜背景中擷取出來。

    Text in images provides important information of the image, because it easily presents ideas. However, text in images is hard to be recognized by Optical Character Recognition systems because it often appears in various colors, sizes, and fonts. Therefore a good text extraction before recognition is helpful to the recognition result. But the various text appearances still make it difficult to identify constant features for extraction. In this thesis, we propose a novel method for automatic video text extraction with efficiency and accuracy. We find the fact that text still appears differently from its background in color for visibility, therefore we utilize the zero crossing property of Laplacian of Gaussian (LoG) edge detection. After text detection, localization and a simple statistical method to decide text polarity, the LoG detection locates text pixels and background pixels at both sides of each zero crossing between text and background. Those pixels are sampled to construct the color distributions for text and background, and then the distributions facilitate the use of the powerful graph cut algorithm to find the globally optimal segmentation of text and background. Our results show that text is clearly extracted from complex background with efficiency. Comparisons with other recent fine methods also show that our method produces better results with higher performance.

    Abstract II Table of Contents IV List of Tables VI List of Figures VII Chapter 1 Introduction 1 1.1 Previous Works 2 1.2 Thesis Organization 4 Chapter 2 Text Detection and Localization 6 2.1 Edge Detection and Dilation 7 2.2 Text Block Stylization 14 2.2.1 Margin deletion 16 2.2.2 Vertical edge projection 17 2.2.3 Horizontal contrast projection 18 2.2.4 Border extension 21 Chapter 3 Text Extraction through Graph Cut 23 3.1 Polarity 24 3.2 Graph Cut 25 3.2.1 Flow networks 25 3.2.2 Cuts 26 3.2.3 Energy minimization and the max flow/min cut theorem 27 3.2.4 Graph building 32 3.3 Text Extraction 35 3.3.1 Observation models 36 3.3.2 Graph weight decision 40 Chapter 4 Experimental Results 44 4.1 Text Extraction Results 45 4.2 Complexity Analysis 50 4.3 Text Image Enhancement 51 4.4 Text Image with Noise 55 Chapter 5 Conclusions and Future Works 56 References 57

    [1] J. Ohya, A. Shio, and S. Aksmatsu, "Recognition characters in scene images," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, pp. 214-220, 1994.
    [2] Y. Zhong, K. Karu, and A. K. Jain, "Locating text in complex color images," Pattern Recognition, vol. 10, pp. 1523-1536, 1995.
    [3] V. Wu, R. Manmatha, and E. M. Riseman, "TextFinder: an automatic systems to detect and recognize text in images," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, pp. 1224-1229, 1999.
    [4] H. Li, D. Doerman, and O. Kia, "Automatic text detection and tracking in digital video," IEEE Transactions on Image Processing, vol. 9, pp. 147-156, 2000.
    [5] K. I. Kim, K. Jung, and J. H. Kim, "Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, pp. 1631-1639, 2003.
    [6] D. Chen, J. Odobez, and H. Bourlard, "Text detection and recognition in images and video frames," Pattern Recognition, vol. 37, pp. 595-608, 2004.
    [7] T. Sato, T. Kanade, E. K. Hughes, and M. A. Smith, "Video OCR for Digital News Archive," Proceedings of IEEE Workshop on Content Based Access of Image and Video Databases, Bombay, India 1998, pp. 52-60, 1998.
    [8] N. Otsu, "A threshold selection method from gray-level histograms," IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-9, pp. 62-66, 1979.
    [9] M. R. Lyu, J. Song, and M. Cai, "A comprehensive method for multilingual video text detection, localization, and extraction," IEEE Transactions on Circuits and Systems for Video Technology, vol. 15, pp. 243-255, 2005.
    [10] Y. Boykov, O. Veksler, and R. Zabih, "Markov random fields with efficient approximations," IEEE Conference on Computer Vision and Pattern Recognition, pp. 648-655, 1998.
    [11] Y. Boykov, O. Veksler, and R. Zabih, "Fast approximate energy minimization via graph cuts," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, pp. 1222-1239, 2001.
    [12] Y. Boykov and V. Kolmogorov, "An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, pp. 1124-1137, 2004.
    [13] Y. Boykov and M. Jolly, "Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images," Eighth International Conference on Computer Vision (ICCV'01), vol. 1, pp. 105-112, 2001.
    [14] D. Freedman and T. Zhang, "Interactive graph cut based segmentation with shape priors," 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), vol. 1, pp. 755-762, 2005.
    [15] Y. Sun, B. Li, B. Yuan, Z. Miao, and C. Wan, "Better foreground segmentation for static cameras via new energy form and dynamic graph-cut," The 18th International Conference on Pattern Recognition (ICPR'06), pp. 49-52, 2006.
    [16] J. Xiao and M. Shah, "Motion layer extraction in the presence of occlusion using graph cuts," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, pp. 1644-1659, 2005.
    [17] A. Agarwara, M. Dontcheva, M. Agrawala, S. Drucker, and A. Colburn, "Interactive digital photomontage," ACM Transactions on Graphics, vol. 23, pp. 294-302, 2004.
    [18] S. Paris, F. X. Sillion, and L. Quan, "A surface reconstruction method using global graph cut optimization," International Journal of Computer Vision, vol. 66, pp. 141-161, 2006.
    [19] J. Canny, "A computational approach to edge detection," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 8, pp. 679-698, 1986.
    [20] I. Sobel, "An isotropic 3x3 image gradient operator," H. Freeman (eds), Machine Vision for Three-Dimensional Scenes, Academic Press, pp. 376-379, 1990.
    [21] S. Geman and D. Geman, "Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, pp. 721-741, 1984.
    [22] L. Ford and D. Fulkerson, Flows in Networks: Princeton University Press, 1962.
    [23] L. Wang and W. Wang, "A comparative performance study of thresholding algorithms for particle images," 5th International Conference on Signal Processing Proceedings 2000, vol. 3, pp. 2097-2102, 2000.
    [24] E. Dahlhaus, D. S. Johnson, C. H. Papadimitriou, P. D. Seymour, and M. Yannakakis, "The complexity of multiway cuts," Annual ACM Symposium on Theory of Computing, pp. 241-251, 1992.

    下載圖示 校內:2010-07-12公開
    校外:2010-07-12公開
    QR CODE