簡易檢索 / 詳目顯示

研究生: 張慈殷
Zhang, Ci-Yin
論文名稱: 以對比式學習進行遮蔽感知的漫畫人物重識別
Occlusion­-Aware Manga Character Re-­identification with Contrastive Learning
指導教授: 朱威達
Chu, Wei-Ta
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 30
中文關鍵詞: 對比式學習漫畫人物重識別
外文關鍵詞: Contrastive Learning, Manga Character Re-­identification
相關次數: 點閱:53下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 漫畫 (manga) 作為結合圖像和文字的藝術形式,近年越來越受到關注。現有的漫畫角色重新識別 (Manga character re­identification, Manga ReID) 方法主要依賴人物臉部資訊,忽略了漫畫角色身體的獨特特徵,並且未解決漫畫中常見的一些挑戰,例如對話框的遮擋和身體資訊不完整。
    為應對這些問題,本研究提出一種以對比式學習進行遮蔽感知的漫畫人物重識別方法 (Occlusion­Aware Manga Character Re­identification with Contrastive Learning, OAM-ReID),利用 manga109 [1] 資料集標註的漫畫人物身體資料進行訓練。通過合成具有對話框遮擋和不完整身體的訓練資料,我們的方法能夠在各種漫畫情境中提取出具有辨識性的特徵表示,有效解決了先前方法所忽視的挑戰。實驗結果顯示,我們的方法在 Manga ReID 任務中表現出優異的性能。

    Manga, as an art form combining images and text, has gained increasing attention in recent years. Existing methods for manga character re­identification (Manga ReID) primarily rely on facial information, overlooking the unique characteristics of the characters' bodies and failing to address common challenges found in manga, such as occlusion by speech balloons and incomplete body parts.
    To tackle these issues, we propose a method called Occlusion­Aware Manga Character Re-identification with Contrastive Learning (OAM­ReID), which leverages annotated body data from manga109 [1] dataset for training. By synthesizing training data with occluded speech balloons and incomplete bodies, our method can extract discriminative feature representations in various scenarios, effectively addressing the challenges overlooked by previous approaches. Experimental results demonstrate the outstanding performance of our method in the task of manga character re­identification.

    摘要 i Abstract ii Table of Contents iii List of Tables v List of Figures vi Chapter 1. Introduction 1 1.1. Overview 1 1.2. Motivation 2 1.3. Contributions 3 Chapter 2. Related Works 5 2.1. Manga­related Works 5 2.2. Manga Character Re­identification 6 2.3. Person Re­identification 7 2.4. Contrastive Learning 8 Chapter 3. Occlusion­Aware Manga Character Re­identification 9 3.1. Overall framework 9 3.2. Multiple Augmentation Module (MAM) 10 3.2.1. Occlusion by Speech Balloon 10 3.3. Vision Transformer Feature Extractor (ViT) 13 3.4. Occlusion Estimation Module (OEM) 14 3.5. Memory Bank Module (MBM) 16 3.5.1. Memory Bank Initialization 17 3.5.2. Memory Bank Update 17 3.5.3. Contrastive Loss 18 3.6. Loss Function 19 Chapter 4. Experimental Results 20 4.1. Datasets 20 4.2. Evaluation Protocol 22 4.3. Implementation Details 22 4.4. Performance Comparison 22 4.5. Ablation Study 23 4.6. Visual Samples 25 Chapter 5. Conclusion 27 5.1. Conclusion 27 5.2. Future Work 27 References 28

    [1] Kiyoharu Aizawa, Azuma Fujimoto, Atsushi Otsubo, Toru Ogawa, Yusuke Matsui, Koki Tsubota, and Hikaru Ikuta. Building a manga dataset manga109 with annotations for multimedia applications. IEEE MultiMedia, 27(2):8–18, 2020.
    [2] Peixian Chen, Wenfeng Liu, Pingyang Dai, Jianzhuang Liu, Qixiang Ye, Mingliang Xu, Qi'an Chen, and Rongrong Ji. Occlude them all: Occlusion­aware attention network for occluded person re­id. In Proceedings of IEEE/CVF international conference on computer vision, pages 11833–11842, 2021.
    [3] Wei­Ta Chu and Wei­Chung Cheng. Manga­specific features and latent style model for manga style analysis. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1332–1336. IEEE, 2016.
    [4] Wei­Ta Chu and Wei­Wei Li. Manga face detection based on deep neural networks fusing global and local information. Pattern Recognition, 86:62–72, 2019.
    [5] Jia Deng, Wei Dong, Richard Socher, Li­Jia Li, Kai Li, and Li Fei­Fei. Imagenet: A large­scale hierarchical image database. In Proceedings of IEEE conference on computer vision and pattern recognition, pages 248–255. IEEE, 2009.
    [6] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of international conference on learning representations, 2021.
    [7] Yixiao Ge, Feng Zhu, Dapeng Chen, Rui Zhao, et al. Self­paced contrastive learning with hybrid memory for domain adaptive object re­id. 33:11309–11321, 2020.
    [8] Shuting He, Hao Luo, Pichao Wang, Fan Wang, Hao Li, and Wei Jiang. Transreid: Transformer­based object re­identification. In Proceedings of IEEE/CVF international conference on computer vision, pages 15013–15022, 2021.
    [9] Huaibo Huang, Aihua Zheng, Chenglong Li, Ran He, et al. Parallel augmentation and dual enhancement for occluded person re­identification. arXiv preprint arXiv:2210.05438, 2022.
    [10] Yusuke Matsui, Kota Ito, Yuji Aramaki, Azuma Fujimoto, Toru Ogawa, Toshihiko Yamasaki, and Kiyoharu Aizawa. Sketch­based manga retrieval using manga109 dataset. Multimedia tools and applications, 76:21811–21838, 2017.
    [11] Jiaxu Miao, Yu Wu, Ping Liu, Yuhang Ding, and Yi Yang. Pose­guided feature alignment for occluded person re­identification. In Proceedings of IEEE/CVF international conference on computer vision, pages 542–551, 2019.
    [12] Rei Narita, Koki Tsubota, Toshihiko Yamasaki, and Kiyoharu Aizawa. Sketch­based manga retrieval using deep features. In 2017 14th IAPR International conference on document analysis and recognition (ICDAR), volume 3, pages 49–53. IEEE, 2017.
    [13] Nhu­Van Nguyen, Christophe Rigaud, Arnaud Revel, and Jean­Christophe Burie. Manga­mmtl: Multimodal multi-task transfer learning for manga character analysis. In Proceedings of international conference on document analysis and recognition, pages 410–425. Springer, 2021.
    [14] Xufang Pang, Ying Cao, Rynson WH Lau, and Antoni B Chan. A robust panel extraction method for manga. In Proceedings of the 22nd ACM international conference on Multimedia, pages 1125–1128, 2014.
    [15] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high­performance deep learning library. In Proceedings of advances in neural information processing systems, pages 8024–8035. Curran Associates, Inc., 2019.
    [16] Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood Dehghan, Osmar R Zaiane, and Martin Jagersand. U2­net: Going deeper with nested u­structure for salient object detection. Pattern recognition, 106:107404, 2020.
    [17] Edwin Arkel Rios, Wen­Huang Cheng, and Bo­Cheng Lai. Daf: Re: A challenging, crowd­sourced, large­scale, long­tailed dataset for anime character recognition. arXiv preprint arXiv:2101.08674, 2021.
    [18] Ergys Ristani, Francesco Solera, Roger Zou, Rita Cucchiara, and Carlo Tomasi. Performance measures and a data set for multi­target, multi­camera tracking. In Proceedings of european conference on computer vision, pages 17–35. Springer, 2016.
    [19] Hao Su, Jianwei Niu, Xuefeng Liu, Qingfeng Li, Jiahe Cui, and Ji Wan. Mangagan: Unpaired photo­to­manga translation based on the methodology of manga drawing. In Proceedings of AAAI conference on artificial intelligence, volume 35, pages 2611–2619, 2021.
    [20] Kodai Takashima and Takehisa Onisawa. Generation of scene frame of manga from narrative text. In Proceedings of kansei engineering and emotion research international conference, pages 2223–2233, 2010.
    [21] Tao Wang, Hong Liu, Wenhao Li, Miaoju Ban, Tuanyu Guo, and Yidi Li. Feature completion transformer for occluded person re­identification. arXiv preprint arXiv:2303.01656, 2023.
    [22] Zhikang Wang, Feng Zhu, Shixiang Tang, Rui Zhao, Lihuo He, and Jiangning Song. Feature erasing and diffusion network for occluded person re­identification. In Proceedings of IEEE/CVF conference on computer vision and pattern recognition, pages 4754–4763, 2022.
    [23] Hideaki Yanagisawa, Takuro Yamashita, and Watanabe Hiroshi. Manga character clustering with dbscan using fine­tuned cnn model. In International workshop on advanced image technology (IWAIT) 2019, volume 11049, pages 305–310. SPIE, 2019.
    [24] Yi­Ting Yang and Wei­Ta Chu. Manga text detection with manga­specific data augmentation and its applications on emotion analysis. In International conference on multimedia modeling, pages 29–40. Springer, 2023.
    [25] Mang Ye, Jianbing Shen, Gaojie Lin, Tao Xiang, Ling Shao, and Steven CH Hoi. Deep learning for person re­identification: A survey and outlook. IEEE transactions on pattern analysis and machine intelligence, 44(6):2872–2893, 2021.
    [26] Lvmin Zhang, Yi Ji, Xin Lin, and Chunping Liu. Style transfer for anime sketches with enhanced residual u­net and auxiliary classifier gan. In 2017 4th IAPR Asian conference on pattern recognition (ACPR), pages 506–511. IEEE, 2017.
    [27] Zhimin Zhang, Zheng Wang, and Wei Hu. Unsupervised manga character reidentification via face­body and spatial­temporal associated clustering. arXiv preprint arXiv:2204.04621, 2022.
    [28] Yi Zheng, Yifan Zhao, Mengyuan Ren, He Yan, Xiangju Lu, Junhui Liu, and Jia Li. Cartoon face recognition: A benchmark dataset. In Proceedings of 28th ACM international conference on multimedia, pages 2264–2272, 2020.
    [29] Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang. Random erasing data augmentation. In Proceedings of AAAI conference on artificial intelligence, volume 34, pages 13001–13008, 2020.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE