研究生: |
張慈殷 Zhang, Ci-Yin |
---|---|
論文名稱: |
以對比式學習進行遮蔽感知的漫畫人物重識別 Occlusion-Aware Manga Character Re-identification with Contrastive Learning |
指導教授: |
朱威達
Chu, Wei-Ta |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 英文 |
論文頁數: | 30 |
中文關鍵詞: | 對比式學習 、漫畫人物重識別 |
外文關鍵詞: | Contrastive Learning, Manga Character Re-identification |
相關次數: | 點閱:53 下載:5 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
漫畫 (manga) 作為結合圖像和文字的藝術形式,近年越來越受到關注。現有的漫畫角色重新識別 (Manga character reidentification, Manga ReID) 方法主要依賴人物臉部資訊,忽略了漫畫角色身體的獨特特徵,並且未解決漫畫中常見的一些挑戰,例如對話框的遮擋和身體資訊不完整。
為應對這些問題,本研究提出一種以對比式學習進行遮蔽感知的漫畫人物重識別方法 (OcclusionAware Manga Character Reidentification with Contrastive Learning, OAM-ReID),利用 manga109 [1] 資料集標註的漫畫人物身體資料進行訓練。通過合成具有對話框遮擋和不完整身體的訓練資料,我們的方法能夠在各種漫畫情境中提取出具有辨識性的特徵表示,有效解決了先前方法所忽視的挑戰。實驗結果顯示,我們的方法在 Manga ReID 任務中表現出優異的性能。
Manga, as an art form combining images and text, has gained increasing attention in recent years. Existing methods for manga character reidentification (Manga ReID) primarily rely on facial information, overlooking the unique characteristics of the characters' bodies and failing to address common challenges found in manga, such as occlusion by speech balloons and incomplete body parts.
To tackle these issues, we propose a method called OcclusionAware Manga Character Re-identification with Contrastive Learning (OAMReID), which leverages annotated body data from manga109 [1] dataset for training. By synthesizing training data with occluded speech balloons and incomplete bodies, our method can extract discriminative feature representations in various scenarios, effectively addressing the challenges overlooked by previous approaches. Experimental results demonstrate the outstanding performance of our method in the task of manga character reidentification.
[1] Kiyoharu Aizawa, Azuma Fujimoto, Atsushi Otsubo, Toru Ogawa, Yusuke Matsui, Koki Tsubota, and Hikaru Ikuta. Building a manga dataset manga109 with annotations for multimedia applications. IEEE MultiMedia, 27(2):8–18, 2020.
[2] Peixian Chen, Wenfeng Liu, Pingyang Dai, Jianzhuang Liu, Qixiang Ye, Mingliang Xu, Qi'an Chen, and Rongrong Ji. Occlude them all: Occlusionaware attention network for occluded person reid. In Proceedings of IEEE/CVF international conference on computer vision, pages 11833–11842, 2021.
[3] WeiTa Chu and WeiChung Cheng. Mangaspecific features and latent style model for manga style analysis. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1332–1336. IEEE, 2016.
[4] WeiTa Chu and WeiWei Li. Manga face detection based on deep neural networks fusing global and local information. Pattern Recognition, 86:62–72, 2019.
[5] Jia Deng, Wei Dong, Richard Socher, LiJia Li, Kai Li, and Li FeiFei. Imagenet: A largescale hierarchical image database. In Proceedings of IEEE conference on computer vision and pattern recognition, pages 248–255. IEEE, 2009.
[6] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of international conference on learning representations, 2021.
[7] Yixiao Ge, Feng Zhu, Dapeng Chen, Rui Zhao, et al. Selfpaced contrastive learning with hybrid memory for domain adaptive object reid. 33:11309–11321, 2020.
[8] Shuting He, Hao Luo, Pichao Wang, Fan Wang, Hao Li, and Wei Jiang. Transreid: Transformerbased object reidentification. In Proceedings of IEEE/CVF international conference on computer vision, pages 15013–15022, 2021.
[9] Huaibo Huang, Aihua Zheng, Chenglong Li, Ran He, et al. Parallel augmentation and dual enhancement for occluded person reidentification. arXiv preprint arXiv:2210.05438, 2022.
[10] Yusuke Matsui, Kota Ito, Yuji Aramaki, Azuma Fujimoto, Toru Ogawa, Toshihiko Yamasaki, and Kiyoharu Aizawa. Sketchbased manga retrieval using manga109 dataset. Multimedia tools and applications, 76:21811–21838, 2017.
[11] Jiaxu Miao, Yu Wu, Ping Liu, Yuhang Ding, and Yi Yang. Poseguided feature alignment for occluded person reidentification. In Proceedings of IEEE/CVF international conference on computer vision, pages 542–551, 2019.
[12] Rei Narita, Koki Tsubota, Toshihiko Yamasaki, and Kiyoharu Aizawa. Sketchbased manga retrieval using deep features. In 2017 14th IAPR International conference on document analysis and recognition (ICDAR), volume 3, pages 49–53. IEEE, 2017.
[13] NhuVan Nguyen, Christophe Rigaud, Arnaud Revel, and JeanChristophe Burie. Mangammtl: Multimodal multi-task transfer learning for manga character analysis. In Proceedings of international conference on document analysis and recognition, pages 410–425. Springer, 2021.
[14] Xufang Pang, Ying Cao, Rynson WH Lau, and Antoni B Chan. A robust panel extraction method for manga. In Proceedings of the 22nd ACM international conference on Multimedia, pages 1125–1128, 2014.
[15] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, highperformance deep learning library. In Proceedings of advances in neural information processing systems, pages 8024–8035. Curran Associates, Inc., 2019.
[16] Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood Dehghan, Osmar R Zaiane, and Martin Jagersand. U2net: Going deeper with nested ustructure for salient object detection. Pattern recognition, 106:107404, 2020.
[17] Edwin Arkel Rios, WenHuang Cheng, and BoCheng Lai. Daf: Re: A challenging, crowdsourced, largescale, longtailed dataset for anime character recognition. arXiv preprint arXiv:2101.08674, 2021.
[18] Ergys Ristani, Francesco Solera, Roger Zou, Rita Cucchiara, and Carlo Tomasi. Performance measures and a data set for multitarget, multicamera tracking. In Proceedings of european conference on computer vision, pages 17–35. Springer, 2016.
[19] Hao Su, Jianwei Niu, Xuefeng Liu, Qingfeng Li, Jiahe Cui, and Ji Wan. Mangagan: Unpaired phototomanga translation based on the methodology of manga drawing. In Proceedings of AAAI conference on artificial intelligence, volume 35, pages 2611–2619, 2021.
[20] Kodai Takashima and Takehisa Onisawa. Generation of scene frame of manga from narrative text. In Proceedings of kansei engineering and emotion research international conference, pages 2223–2233, 2010.
[21] Tao Wang, Hong Liu, Wenhao Li, Miaoju Ban, Tuanyu Guo, and Yidi Li. Feature completion transformer for occluded person reidentification. arXiv preprint arXiv:2303.01656, 2023.
[22] Zhikang Wang, Feng Zhu, Shixiang Tang, Rui Zhao, Lihuo He, and Jiangning Song. Feature erasing and diffusion network for occluded person reidentification. In Proceedings of IEEE/CVF conference on computer vision and pattern recognition, pages 4754–4763, 2022.
[23] Hideaki Yanagisawa, Takuro Yamashita, and Watanabe Hiroshi. Manga character clustering with dbscan using finetuned cnn model. In International workshop on advanced image technology (IWAIT) 2019, volume 11049, pages 305–310. SPIE, 2019.
[24] YiTing Yang and WeiTa Chu. Manga text detection with mangaspecific data augmentation and its applications on emotion analysis. In International conference on multimedia modeling, pages 29–40. Springer, 2023.
[25] Mang Ye, Jianbing Shen, Gaojie Lin, Tao Xiang, Ling Shao, and Steven CH Hoi. Deep learning for person reidentification: A survey and outlook. IEEE transactions on pattern analysis and machine intelligence, 44(6):2872–2893, 2021.
[26] Lvmin Zhang, Yi Ji, Xin Lin, and Chunping Liu. Style transfer for anime sketches with enhanced residual unet and auxiliary classifier gan. In 2017 4th IAPR Asian conference on pattern recognition (ACPR), pages 506–511. IEEE, 2017.
[27] Zhimin Zhang, Zheng Wang, and Wei Hu. Unsupervised manga character reidentification via facebody and spatialtemporal associated clustering. arXiv preprint arXiv:2204.04621, 2022.
[28] Yi Zheng, Yifan Zhao, Mengyuan Ren, He Yan, Xiangju Lu, Junhui Liu, and Jia Li. Cartoon face recognition: A benchmark dataset. In Proceedings of 28th ACM international conference on multimedia, pages 2264–2272, 2020.
[29] Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang. Random erasing data augmentation. In Proceedings of AAAI conference on artificial intelligence, volume 34, pages 13001–13008, 2020.