| 研究生: |
楊奕廷 Yang, Yi-Ting |
|---|---|
| 論文名稱: |
基於針對漫畫資料增強之漫畫文字偵測及其漫畫情緒分析應用 Manga Text Detection with Manga-Specific Data Augmentation and Its Applications on Emotion Analysis |
| 指導教授: |
朱威達
Chu, Wei-Ta |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2022 |
| 畢業學年度: | 110 |
| 語文別: | 英文 |
| 論文頁數: | 38 |
| 中文關鍵詞: | 漫畫分析 、文字偵測 、對抗式生成網路 、情緒分析 |
| 外文關鍵詞: | Manga, Text Detection, Generative Adversarial Networks, Emotion Analysis |
| 相關次數: | 點閱:221 下載:23 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
我們的研究目標為特別針對日本漫畫中具有非典型字型且混雜在背景中的文字進行偵測。為了對非典型的文字進行偵測,我們提出了一套專門針對漫畫的資料增強方法來增強我們的訓練資料。首先,我們基於生成對抗式網路架構設計了一個文字生成器負責生成漫畫中非典型的文字區塊,接著將其混合進漫畫頁中進而大量增加訓練資料的數量及多樣性。我們驗證了這套專門針對漫畫的資料增強方法的重要性,並由實驗表明利用使用此方法增強過後的漫畫資料所訓練出來的模型為目前最優異的漫畫文字偵測模型。再者,藉由漫畫文字偵測的幫助,我們得以融合全域視覺特徵及區域文字特徵在漫畫的情緒分析研究上得到更高的準確率。相信這是第一個專門針對漫畫中像是狀聲詞等非典型文字的文字偵測研究,且借助我們所提出的模型,可以實現對漫畫進行更進階的分析像是漫畫頁的情感理解。
We especially target at detecting text in atypical font styles and in cluttered background for Japanese comics (manga). To enable the detection model to detect atypical text, we augment training data by the proposed manga-specific data augmentation. A generative adversarial network is developed to generate atypical text regions, which are then blended into manga pages to largely increase the volume and diversity of training data. We verify the importance of manga-specific data augmentation, and show that the text detection model finetuned with the augmented dataset significantly outperforms the state of the arts. Furthermore, with the help of manga text detection, we fuse global visual features and local text features to enable more accurate emotion analysis. We believe that this is the first work especially aiming at detecting atypical text like onomatopoeia words in manga, and more advanced manga understanding can be achieved with the aid of the proposed model.
[1] Kiyoharu Aizawa, Azuma Fujimoto, Atsushi Otsubo, Toru Ogawa, Yusuke Matsui, Koki Tsubota, and Hikaru Ikuta. Building a manga dataset manga109 with annotations for multimedia applications. IEEE MultiMedia, 27(2):8–18, 2020.
[2] Kohei Arai and Herman Tolle. Method for real time text extraction of digital manga comic. International Journal of Image Processing, 4(6):669–676, 2011.
[3] Yuji Aramaki, Yusuke Matsui, Toshihiko Yamasaki, and Kiyoharu Aizawa. Text detection in manga by combining connected-component-based and region-based classifications. In Proceedings of IEEE International Conference on Image Processing, pages 2901–2905, 2016.
[4] Olivier Augereau, Motoi Iwata, and Koichi Kise. A survey of comics research in computer science. Journal of Imaging, 4(7), 2018.
[5] Samaneh Azadi, Matthew Fisher, Vladimir G. Kim, Zhaowen Wang, Eli Shechtman, and Trevor Darrell. Multi-content gan for few-shot font style transfer. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[6] Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, and Hwalsuk Lee. Character region awareness for text detection. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9365–9374, 2019.
[7] Emanuel Ben-Baruch, Tal Ridnik, Nadav Zamir, Asaf Noy, Itamar Friedman, Matan Protter, and Lihi Zelnik-Manor. Asymmetric loss for multilabel classification. In Proceedings of International Conference on Computer Vision, pages 82–91, 2021.
[8] Wei-Ta Chu and Wei-Wei Li. Manga face detection based on deep neural networks fusing global and local information. Pattern Recognition, 86:62–72, 2019.
[9] Wei-Ta Chu and Chih-Chi Yu. Text detection in manga by deep region proposal, classification, and regression. In Proceedings of IEEE Visual Communications and Image Processing, pages 2901–2905, 2018.
[10] W.T. Freeman and J.B. Tenenbaum. Learning bilinear models for two-factor problems in vision. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 554–560, 1997.
[11] Julian Del Gobbo and Rosana Matuk Herrera. Unconstrained text detection in manga: A new dataset and baseline. In Proceedings of European Conference on Computer Vision Workshops, pages 629–646, 2020.
[12] Ankush Gupta, Andrea Vedaldi, and Andrew Zisserman. Synthetic data for text localisation in natural images. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2315–2324, 2016.
[13] Douglas R. Hofstadter. Metamagical themas. Scientific American, 248(5):16–E18, 1983.
[14] Douglas R Hofstadter. Fluid concepts and creative analogies: Computer models of the fundamental mechanisms of thought. Basic books, 1995.
[15] Xun Huang and Serge J. Belongie. Arbitrary style transfer in realtime with adaptive instance normalization. In Proceedings of International Conference on Computer Vision, pages 1510–1519, 2017.
[16] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. Image-to-image translation with conditional adversarial nets. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1125–1134, 2017.
[17] Dimosthenis Karatzas, Faisal Shafait, Seiichi Uchida, Masakazu Iwamura, Lluis Gomezi Bigorda, Sergi Robles Mestre, Joan Mas, David Fernandez Mota, Jon Almazan Almazan, and Lluis Pere de las Heras. ICDAR 2013 robust reading competition. In Proceedings of IAPR International Conference on Document Analysis and Recognition, 2013.
[18] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In Proceedings of International Conference on Neural Information Processing Systems, pages 1097–1105, 2012.
[19] Chenhao Li, Yuta Taniguchi, Min Lu, and Shin’ichi Konomi. Few-shot font style transfer between different languages. In Proceedings of IEEE/CVF Winter Conference on Applications of Computer Vision, pages 433–442, 2021.
[20] Wei Li, Yongxing He, Yanwei Qi, Zejian Li, and Yongchuan Tang. Fet-gan: Font and effect transfer via kshot adaptive instance normalization. In Proceedings of AAAI Conference on Artificial Intelligence, volume 34, pages 1717–1724, 2020.
[21] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. Ssd: Single shot multibox detector. In Proceedings of European Conference on Computer Vision, 2016.
[22] Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
[23] Nibal Nayef, Fei Yin, Imen Bizid, Hyunsoo Choi, Yuan Feng, Dimosthenis Karatzas, Zhenbo Luo, Umapada Pal, Christophe Rigaud, Joseph Chazalon, Wafa Khlif, Muhammad Muzzamil Luqman, JeanChristophe Burie, Cheng-Lin Liu, and Jean-Marc Ogier. ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification - rrcmlt. In Proceedings of IAPR International Conference on Document Analysis and Recognition, 2017.
[24] NhuVan Nguyen, Xuan-Son Vu, Christophe Rigaud, Lili Jiang, and Jean-Christophe Burie. ICDAR 2021 competition on multimodal emotion recognition on comics scenes. In Proceedings of International Conference on Document Analysis and Recognition, pages 767–782, 2021.
[25] Toru Ogawa, Atsushi Otsubo, Rei Narita, Yusuke Matsui, Toshihiko Yamasaki, and Kiyoharu Aizawa. Object detection for comics using manga109 annotations, 2018.
[26] Xufang Pang, Ying Cao, Rynson W.H. Lau, and Antoni B. Chan. A robust panel extraction method for manga. In Proceedings of ACM International Conference on Multimedia, pages 1125–1128, 2014.
[27] Boonyarith Piriyothinkul, Kitsuchart Pasupa, and Masanori Sugimoto. Detecting text in manga using stroke width transform. In Proceedings of International Conference on Knowledge and Smart Technology, 2019.
[28] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of Advances in Neural Information Processing Systems, 2015.
[29] Christophe Rigaud, JeanChristophe Burie, and JeanMarc Ogier. Textindependent speech balloon segmentation for comics and manga. In Proceedings of International Workshop on Graphics Recognition, 2017.
[30] Christophe Rigaud, Dimosthenis Karatzas, Joost Van de Weijer, Jean-Christophe Burie, and Jean-Marc Ogier. Automatic text localisation in scanned comic books. In Proceedings of International Conference on Computer Vision Theory and Applications, 2013.
[31] Dongyu She, Ming Sun, and Jufeng Yang. Learning discriminative sentiment representation from strongly- and weakly supervised cnns. ACM Transactions on Multimedia Computing, Communications, and Applications, 15(3s):Article No. 96, 2019.
[32] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In Proceedings of International Conference on Learning Representations, 2015.
[33] Nikita Srivatsan, Jonathan T. Barron, Dan Klein, and Taylor Berg-Kirkpatrick. A deep factorization of style and structure in fonts. In Proceedings of Conference on Empirical Methods in Natural Language Processing, 2019.
[34] Kohei Takayama, Henry Johan, and Tomoyuki Nishita. Face detection and face recognition of cartoon characters using feature extraction. In Proceedings of IIEEJ Image Electronics and Visual Computing Workshop, 2012.
[35] Mingxing Tan and Quoc V. Le. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of International Conference on Machine Learning, pages 6105–6114, 2019.
[36] Joshua B. Tenenbaum and William T. Freeman. Separating style and content with bilinear models. Neural Computation, 12(6):1247–1283, 2000.
[37] Herman Tolle and Kohei Arai. Manga content extraction method for automatic mobile comic content creation. In Proceedings of International Conference on Advanced Computer Science and Information Systems, pages 321–328, 2013.
[38] Yangchen Xie, Xinyuan Chen, Li Sun, and Yue Lu. Dg-font: Deformable generative networks for unsupervised font generation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5130–5140, 2021.
[39] Hideaki Yanagisawa, Daisuke Ishii, and Hiroshi Watanabe. Face detection for comic images with deformable part model. In Proceedings of IIEEJ Image Electronics and Visual Computing Workshop, 2014.
[40] MinLing Zhang and ZhiHua Zhou. A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8):1819–1837, 2014.
[41] Yexun Zhang, Ya Zhang, and Wenbin Cai. Separating style and content for generalized style transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 2018.
[42] Yexun Zhang, Ya Zhang, and Wenbin Cai. A unified framework for generalizable style transfer: Style and content separation. IEEE Transactions on Image Processing, 29:4085–4098, 2020.
[43] JunYan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017.