簡易檢索 / 詳目顯示

研究生: 黃姿涵
Huang, Tzu-Han
論文名稱: 用於行人重識別的離散餘弦變換網絡
A Discrete Cosine Transform Network for Person Re-identification
指導教授: 戴顯權
Tai, Shen-Chuan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 112
語文別: 英文
論文頁數: 61
中文關鍵詞: 深度學習殘差學習注意力機制行人重識別
外文關鍵詞: deep learning, residual learning, attention mechanism, person re-identification
相關次數: 點閱:50下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨者深度學習的迅速發展,越來越多與影像處理相關的議題也隨之崛起。除此之外,大眾對安全意識的抬頭,許多監視器被設置在公共場所,以防止安全問題的發生。因此,行人重識別的議題逐漸受到重視。
    行人重識別的主要目的是跨鏡頭的追蹤。換句話說,在互不重疊的攝像頭下去辨識相同的行人。行人重識別經常面臨人物遮擋、影像模糊、光線變化、姿勢差異和視角等因素的挑戰,這增加了辨識的難度。這些問題主要可以分為類別內的變異和類別間的變異。前者主要是因為不同的拍攝視角導致同一類別之間存在較大的差異;而後者是因為不同行人間相似的穿著導致的較小的差異。
    為了解決上述的問題,本論文提出一個行人重識別的改良版網路架構。利用長距離注意力機制和頻率域上的殘差學習來提取更穩健的特徵。隨後,使用transformer探討行人間的相關性,以解決類別間的變異問題。本論文所提的方法在Market-1501、DukeMTMC-reID和CUHK03上有相當不錯的表現,實驗結果驗證了本方法的有效性。

    The rapid evolution of deep learning technologies has brought numerous challenges in image processing. Additionally, driven by increased security awareness, public spaces are now adorned with numerous monitors to thwart potential security issues. Consequently, person re-identification has garnered increasing significance.
    The primary objective of person re-identification is to track individuals across different camera views, seeking to identify the same people in non-overlapping camera perspectives. This task has many challenges, including occlusions, image blurriness, lighting variations, pose changes and differing viewing angles. These challenges contribute to the intricate nature of recognition tasks in person re-identification. Specifically, these challenges can be categorized into intra-class- and inter-class variations. Intra-class variations arise from significant differences within the same class due to diverse camera perspectives, while inter-class variations result from minor distinctions among people wearing similar clothing.
    To tackle these challenges, this Thesis introduces an enhanced network architecture for person re-identification. It incorporates non-local attention mechanisms and frequency-domain residual learning to extract more robust features. Subsequently, it employs a transformer block to explore correlations among individuals, aiming to address inter-class variation issues. The proposed method showcased outstanding performance on Market-1501, DukeMTMC-reID, and CUHK03 datasets. The experimental results validate the effectiveness of the proposed approach.

    Acknowledgements iv Contents v List of Tables vii List of Figures viii Chapter 1 Introduction 1 Chapter 2 Related Work 5 2.1 Traditional Methods 5 2.2 Deep learning-based methods 6 2.2.1 Bag of Tricks and A Strong Baseline for Deep Person Re-identification (BoT) 6 2. 2.2 Attention Generalized Mean Pooling with Weighted Triplet Loss (AGW) 8 2.2.3 Transformer-based Object Re-identification (TransReID) 10 2.3 Attention 15 2.3.1 Squeeze-and-Excitation Block (SE block) 15 2.3.2 Frequency Channel Attention Networks (FcaNet) 16 Chapter 3 The Proposed Algorithm 18 3.1 Data Preprocessing 20 3.2 Proposed Network Architecture 20 3.2.1 Lightweight Non-local Attention 21 3.2.2 Improved DCT Residual Block 22 3.2.3 Transformer 24 3.3 Loss Function 24 3.3.1 Identity Loss 24 3.3.2 Triplet loss 25 3.3.3 Center Loss 27 3.3.4 Total Loss 28 Chapter 4 Experiment 29 4.1 Experimental Dataset 29 4.1.1 Market-1501 30 4.1.2 DukeMTMC-reID 30 4.1.3 CUHK03 31 4.2 Evaluation Criteria 32 4. 2.1 Rank-M 32 4.2.2 mAP 33 4.3 Implementation Details 34 4.3.1 Experimental Environment 34 4.3.2 Training Strategy 34 4.4 Experimental Results 35 4.5 Ablation studies 41 Chapter 5 Conclusion and Future Work 46 5.1 Conclusion 46 5.2 Future Work 46 References 47

    [1] Wei Li, Xiatian Zhu, and S. Gong, “Harmonious Attention Network for Person Re-Identification,” Conference on Computer Vision and Pattern Recognition, 2018.
    [2] Yingjie Zhu a, Wenzhong Yang, Liejun Wang, Danny Chen, Min Wang, Fuyuan Wei, HaiLaTi KeZiErBieKe, and Y. Liao, “Multiscale Global-Aware Channel Attention for Person Re-identification,” Journal of Visual Communication and Image Representation, 2023.
    [3] Changxin Gaoa, Yang Chena, Jin-Gang Yub, and N. Sang, “Pose-guided spatiotemporal alignment for video-based person Re-identification,” Information Sciences, pp. 176-190, 2020.
    [4] Chi Su, Jianing Li, Shiliang Zhang, Junliang Xing, Wen Gao, and Q. Tian, “Pose-driven Deep Convolutional Model for Person Re-identification,” IEEE International Conference on Computer Vision, 2017.
    [5] Jinxian Liu, Bingbing Ni, Yichao Yan, Peng Zhou, Shuo Cheng, and J. Hu, “Pose Transferrable Person Re-Identification,” IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
    [6] Yuning Du, Haizhou Ai, and S. Lao, “Evaluation of color spaces for person re-identification,” Proceedings of the International Conference on Pattern Recognition (ICPR), pp. 1371-1374, 2012.
    [7] Raimond Laptik, “Color spaces and histogram distances influence on people re-identification in video stream,” Open Conference of Electrical, Electronic and Information Sciences (eStream), pp. 1-4, 2015.
    [8] Miri Kim, Jaehoon Jung, Hyuncheol Kim, and J. Paik, “Person re-identification using color name descriptor-based sparse representation,” IEEE Annual Computing and Communication Workshop and Conference (CCWC), pp. 1-4, 2017.
    [9] Rui Zhao, Wanli Ouyang, and Xiaogang Wang, “Person re-identification by salience matching,” Proceedings of the IEEE international conference on computer vision, pp. 2528-2535, 2013.
    [10] Martin Hirzer, Csaba Beleznai, Peter M. Roth, and Horst Bischof, “Person re-identification by descriptive and discriminative classification,” Scandinavian Conference on Image Analysis (SCIA), pp. 91-102, 2011.
    [11] Hao Luo, Youzhi Gu, Xingyu Liao, Shenqi Lai, and Wei Jiang, “Bag of Tricks and A Strong Baseline for Deep Person Re-identification,” IEEE/CVF Conference Vision Computer and Pattern Recongnition Workshops, 2019.
    [12] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual learning for image recognition,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
    [13] Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang, “Random erasing data augmentation,” Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 07, pp. 13001-13008, 2020.
    [14] Xing Fan, Wei Jiang, Hao Luo, and M. Fei, “Spherereid: Deep hypersphere manifold embedding for person re-identification,” Journal of Visual Communication and Image Representation, vol. 60, pp. 51-58, 2019.
    [15] Ashish Vaswani, Prajit Ramachandran, Aravind Srinivas, Niki Parmar, Blake Hechtman, and Jonathon Shlens, “Scaling local self-attention for parameter efficient visual backbones,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12894-12904, 2021.
    [16] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
    [17] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv:2010.11929, 2020.
    [18] Daquan Zhou, Yujun Shi, Bingyi Kang, Weihao Yu, Zihang Jiang, Yuan Li, Xiaojie Jin, Qibin Hou, and J. Feng, “Refiner: Refining self-attention for vision transformers,” arXiv:2106.03714, 2021.
    [19] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko, “End-to-end object detection with transformers,” European conference on computer vision, pp. 213-229, 2020.
    [20] Jieneng Chen, Yongyi Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Yan Wang, Le Lu, Alan L. Yuille, and Yuyin Zhou, “Transunet: Transformers make strong encoders for medical image segmentation,” arXiv:2102.04306, 2021.
    [21] Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, and Ping Luo, “SegFormer: Simple and efficient design for semantic segmentation with transformers,” Advances in Neural Information Processing Systems, vol. 34, pp. 12077-12090, 2021.
    [22] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv:1810.04805, 2018.
    [23] Mang Ye, Jianbing Shen, Gaojie Lin, Tao Xiang, Ling Shao, and Steven C.H. Hoi, “Deep learning for person re-identification: A survey and outlook,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 6, pp. 2872-2893, 2021.
    [24] Min Lin, Qiang Chen, and Shuicheng Yan, “Network in network,” arXiv:1312.4400, 2013.
    [25] Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He, “Non-local neural networks,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7794-7803, 2018.
    [26] Jie Hu, Li Shen, and Gang Sun, “Squeeze-and-excitation networks,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132-7141, 2018.
    [27] Zequn Qin, Pengyi Zhang, Fei Wu, and Xi Li, “Fcanet: Frequency channel attention networks,” Proceedings of the IEEE/CVF international conference on computer vision, pp. 783-792, 2021.
    [28] Zihang Dai, Hanxiao Liu, Quoc V. Le, and Mingxing Tan, “Coatnet: Marrying convolution and attention for all data sizes,” Advances in neural information processing systems, vol. 34, pp. 3965-3977, 2021.
    [29] Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, and A. Dosovitskiy, “Mlp-mixer: An all-mlp architecture for vision,” Advances in neural information processing systems, vol. 34, pp. 24261-24272, 2021.
    [30] Zhilu Zhang, and Mert Sabuncu, “Generalized cross entropy loss for training deep neural networks with noisy labels,” Advances in neural information processing systems, vol. 31, 2018.
    [31] Rafael Müller, Simon Kornblith, and Geoffrey E. Hinton, “When does label smoothing help?,” Advances in neural information processing systems, vol. 32, 2019.
    [32] Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang, “Beyond triplet loss: a deep quadruplet network for person re-identification,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 403-412, 2017.
    [33] Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao, “A discriminative feature learning approach for deep face recognition,” Proceedings of the European conference on computer vision (ECCV), pp. 499-515, 2016.
    [34] Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian, “Scalable person re-identification: A benchmark,” Proceedings of the IEEE international conference on computer vision, pp. 1116-1124, 2015.
    [35] Zhedong Zheng, Liang Zheng, and Yi Yang, “Unlabeled samples generated by gan improve the person re-identification baseline in vitro,” Proceedings of the IEEE international conference on computer vision, pp. 3754-3762, 2017.
    [36] Zhun Zhong, Liang Zheng, Donglin Cao, and Shaozi Li, “Re-ranking person re-identification with k-reciprocal encoding,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1318-1327, 2017.
    [37] Xiaogang Wang, Gianfranco Doretto, Thomas Sebastian, Jens Rittscher, and Peter Tu, “Shape and appearance context modeling,” IEEE 11th international conference on computer vision, pp. 1-8, 2007.
    [38] Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He, “Accurate, large minibatch sgd: Training imagenet in 1 hour,” arXiv:1706.02677, 2017.
    [39] Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, and Shengjin Wang, “Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline),” Proceedings of the European conference on computer vision (ECCV), pp. 480-496, 2018.
    [40] Kuan Zhu, Haiyun Guo, Zhiwei Liu, Ming Tang, and Jinqiao Wang, “Identity-guided human semantic parsing for person re-identification,” Proceedings of the European conference on computer vision (ECCV), pp. 346-363, 2020.
    [41] Binh X. Nguyen, Binh D. Nguyen, Tuong Do, Erman Tjiputra, Quang D. Tran, and Anh Nguyen, “Graph-based person signature for person re-identifications,” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3492-3501, 2021.
    [42] Shuting He, Hao Luo, Pichao Wang, Fan Wang, Hao Li, and Wei Jiang, “Transreid: Transformer-based object re-identification,” Proceedings of the IEEE/CVF international conference on computer vision, pp. 15013-15022, 2021.
    [43] Yifan Chen, Han Wang, Xiaolu Sun, Bin Fan, and Chu Tang, “Deep attention aware feature learning for person re-identification,” Pattern Recognition, vol. 126, pp. 108567, 2022.
    [44] Ying Chen, Shixiong Xia, Jiaqi Zhao, Yong Zhou, Qiang Niu, Rui Yao, Dongjun Zhu, and Dongjingdian Liu, “ResT-ReID: Transformer block-based residual learning for person re-identification,” Pattern Recognition Letters, vol. 157, pp. 90-96, 2022.
    [45] Xiao Ma, Wenqi Lv, and Meng Zhao, “A Double Stream Person Re-Identification Method Based on Attention Mechanism and Multi-Scale Feature Fusion,” IEEE Access, vol. 11, pp. 14612-14620, 2023.
    [46] Zuozhuo Dai, Mingqiang Chen, Xiaodong Gu, Siyu Zhu, and Ping Tan, “Batch dropblock network for person re-identification and beyond,” Proceedings of the IEEE/CVF international conference on computer vision, pp. 3691-3701, 2019.
    [47] Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He, “Aggregated residual transformations for deep neural networks,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1492-1500, 2017.
    [48] Sachin Mehta, Marjan Ghazvininejad, Srinivasan Iyer, Luke Zettlemoyer, and Hannaneh Hajishirzi, “Delight: Deep and light-weight transformer,” arXiv:2008.00623, 2020.

    下載圖示 校內:2025-02-01公開
    校外:2025-02-01公開
    QR CODE