簡易檢索 / 詳目顯示

研究生: 曾子硯
Tseng, Tzu-Yen
論文名稱: 使用圖像修復進行半自動數據標記以實現耳部穴道定位
Semi-automatic data annotation through in-painting for automated localization of ear acupoints
指導教授: 藍崑展
Lan, Kun-Chan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 129
中文關鍵詞: 圖像修復耳朵偵測關鍵點檢測自動穴道定位半自動數據標記
外文關鍵詞: Image inpainting, Ear detection, Keypoint detection, Automatic acupoint localization, Semi-automatic annotation
相關次數: 點閱:107下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 關鍵點偵測在電腦視覺領域中是相當重要的,其功能用於識別和定位圖像中的重要點,應用範圍涵蓋人類和動物姿勢估計,以及手部關鍵點偵測等領域。近年來,深度學習模型在這些領域展現了極高的準確率,然而,訓練這些模型需要大量的標註數據,而這些數據需要花費時間及精力的手動標註來完成。因此,半自動標註方法受到了重視。在我們的工作中,我們提出了一種基於跟踪器的半自動數據標註框架。我們使用人工標記物作為跟踪器的特徵,以實現更精確的跟踪。在獲取所有標記物坐標後,我們使用一個修補模型來移除這些人工標記物。通過我們的方法,我們節省了大量的數據標註時間。
    耳穴針灸是中醫的重要治療之一。在這項研究中,我們將耳穴點視為關鍵點來展示我們提出的方法。我們訓練了一個關鍵點檢測模型用自動定位耳穴點,並達到了1.33毫米的估計誤差。

    Keypoint detection is crucial in computer vision for identifying and localizing significant points in images, with applications ranging from human and animal pose estimation to hand keypoint detection. Recently, deep learning models for keypoint detection have demonstrated high accuracy across these domains. However, training such models requires extensive annotated data, predominantly achieved through labor-intensive manual annotation. Hence, semi-automatic annotation methods have gained traction. In our work, we proposed a semi-automatic data annotation frame which is tracking-based method. We used artificial markers as features for tracker to track more precisely. After retrieving all the coordinates of markers, we used an inpainting model to remove these artificial markers. By our method, we saved a lot of time for data annotation.
    Ear acupuncture stands as one of the key treatments in Traditional Chinese Medicine (TCM). In this research, we regarded ear acupoints as keypoints to demonstrate our proposed approach. We trained a keypoint detection model which can automatically localize the ear acupoints and achieved an estimation error of 1.33 mm.

    摘要 1 ABSTRACT 2 致謝 3 CONTENTS 4 LIST OF FIGURES 7 LIST OF TABLES 11 1. INTRODUCTION 13 1.1 The importance of keypoint detection 13 1.2 The importance of semi-automatic data annotation and the problem with previous work in semi-automatic data annotation 13 1.3 The importance of our proposed semi-automatic data annotation 14 1.4 The importance of ear acupuncture and the issue of traditional ear acupuncture 15 1.5 Contribution 17 2. RELATED WORK 18 2.1 Prior work on real-time object detection based on deep learning 18 2.2 Prior work on keypoint detection on deep learning 21 2.3 Prior work on semi-automatic annotation 25 2.4 Prior work on image inpainting on deep learning 29 2.5 Prior work on acupoint localization 32 3. METHOD 38 3.1 Architecture 38 3.2 Data collection of ear images 40 3.3 Training data for keypoint detector 47 3.3.1 Image inpainting 47 3.3.2 Ear detection 50 3.3.3 Keypoint detection training data preparation 57 3.4 Keypoint detector training and online phase 58 3.4.1 Keypoint detector training 58 3.4.2 Model deployment 68 3.4.3 Online phase - ear detection 69 3.4.4 Online phase - keypoint detection 70 3.4.5 Acupoints estimation 71 4. EXPERIMENTS 72 4.1 Experimental setup and Subject recruitment 72 4.2 Ear detection 72 4.3 Image inpainting 76 4.3.1 Result of evaluation for Image Inpainting 77 4.3.2 Annotated ear images repaired with image inpainting model 78 4.4 Acupoint localization errors 80 4.4.1 Convert pixels to Millimeters 80 4.4.2 Acupoint localization errors of different acupoints 81 4.4.3 Acupoint localization errors of different subject 86 4.5 Acupoint localization errors with occlusion 88 4.6 Speed of acupoint localization 94 4.7 Ablation study 94 4.7.1 Acupoint localization with and without bounding box 94 4.7.2 Comparison of localization error under different FID of MAT inpainting model 96 5. Prototype 98 5.1 The learning mode of application 98 5.2 Battle Field Acupuncture (BFA) System 100 6. Discussion 104 6.1 Data annotation using feature matching 104 6.2 Further discussion on localization errors in different nutritional status 106 6.3 Comparison of dilation, Navier-stokes inpainting algorithm and MAT pretrained model 111 7. Limitations and Future work 115 8. Conclusion 117 REFERENCES 118

    [1] "Importance of keypoint detection." https://www.mathworks.com/help/vision/keypoint-detection.html (accessed.
    [2] H. Chen, R. Feng, S. Wu, H. Xu, F. Zhou, and Z. Liu, "2D Human pose estimation: A survey," Multimedia systems, vol. 29, no. 5, pp. 3115-3138, 2023.
    [3] L. Jiang, C. Lee, D. Teotia, and S. Ostadabbas, "Animal pose estimation: A closer look at the state-of-the-art, existing gaps and opportunities," Computer Vision and Image Understanding, vol. 222, p. 103483, 2022.
    [4] Y. Li, X. Wang, W. Liu, and B. Feng, "Pose anchor: A single-stage hand keypoint detection network," IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 7, pp. 2104-2113, 2019.
    [5] W. Yang, S. Li, W. Ouyang, H. Li, and X. Wang, "Learning feature pyramids for human pose estimation," in proceedings of the IEEE international conference on computer vision, 2017, pp. 1281-1290.
    [6] C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, and M. Pantic, "A semi-automatic methodology for facial landmark annotation," in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2013, pp. 896-903.
    [7] P. A. Bromiley, A. C. Schunke, H. Ragheb, N. A. Thacker, and D. Tautz, "Semi-automatic landmark point annotation for geometric morphometrics," Frontiers in Zoology, vol. 11, pp. 1-21, 2014.
    [8] J. D. White et al., "MeshMonk: Open-source large-scale intensive 3D phenotyping," Scientific reports, vol. 9, no. 1, p. 6085, 2019.
    [9] E. Bermejo et al., "Automatic landmark annotation in 3D surface scans of skulls: Methodological proposal and reliability study," Computer Methods and Programs in Biomedicine, vol. 210, p. 106380, 2021.
    [10] S. B. Krah, J. Brauer, W. H羹bner, and M. Arens, "Supporting annotation of anatomical landmarks using automatic scale selection," in Articulated Motion and Deformable Objects: 8th International Conference, AMDO 2014, Palma de Mallorca, Spain, July 16-18, 2014. Proceedings 8, 2014: Springer, pp. 61-70.
    [11] N. Van der Aa, X. Luo, G.-J. Giezeman, R. T. Tan, and R. C. Veltkamp, "Umpm benchmark: A multi-person dataset with synchronized video and motion capture data for evaluation of articulated human motion and interaction," in 2011 IEEE international conference on computer vision workshops (ICCV Workshops), 2011: IEEE, pp. 1264-1269.
    [12] G. Garcia-Hernando, S. Yuan, S. Baek, and T.-K. Kim, "First-person hand action benchmark with rgb-d videos and 3d hand pose annotations," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 409-419.
    [13] H. Naik et al., "3D-POP-An automated annotation approach to facilitate markerless 2D-3D tracking of freely moving birds with marker-based motion capture," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 21274-21284.
    [14] E. Wu, H. Nishioka, S. Furuya, and H. Koike, "Marker-removal networks to collect precise 3D hand data for RGB-based estimation and its application in piano," in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 2977-2986.
    [15] A. Mathis et al., "Markerless tracking of user-defined features with deep learning," arXiv preprint arXiv:1804.03142, 2018.
    [16] S. Hampali, M. Rad, M. Oberweger, and V. Lepetit, "Honnotate: A method for 3d annotation of hand and object poses," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 3196-3206.
    [17] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, "Unpaired image-to-image translation using cycle-consistent adversarial networks," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223-2232.
    [18] L. Gori and F. Firenzuoli, "Ear acupuncture in European traditional medicine," Evidence-Based Complementary and Alternative Medicine, vol. 4, pp. 13-16, 2007.
    [19] M. Murakami, L. Fox, and M. P. Dijkers, "Ear acupuncture for immediate pain relief? systematic review and meta-analysis of randomized controlled trials," Pain Medicine, vol. 18, no. 3, pp. 551-564, 2017.
    [20] R. C. Niemtzow, "Battlefield acupuncture," Medical Acupuncture, vol. 19, no. 4, pp. 225-228, 2007.
    [21] T. I. Usichenko et al., "Auricular acupuncture for pain relief after ambulatory knee surgery: a randomized trial," Cmaj, vol. 176, no. 2, pp. 179-183, 2007.
    [22] R. Kaur and S. Singh, "A comprehensive review of object detection with deep learning," Digital Signal Processing, p. 103812, 2022.
    [23] Z.-Q. Zhao, P. Zheng, S.-t. Xu, and X. Wu, "Object detection with deep learning: A review," IEEE transactions on neural networks and learning systems, vol. 30, no. 11, pp. 3212-3232, 2019.
    [24] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580-587.
    [25] R. Girshick, "Fast r-cnn," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440-1448.
    [26] S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," Advances in neural information processing systems, vol. 28, 2015.
    [27] K. He, G. Gkioxari, P. Doll獺r, and R. Girshick, "Mask r-cnn," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961-2969.
    [28] W. Liu et al., "Ssd: Single shot multibox detector," in Computer Vision?CCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11??4, 2016, Proceedings, Part I 14, 2016: Springer, pp. 21-37.
    [29] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779-788.
    [30] C. Lyu et al., "Rtmdet: An empirical study of designing real-time object detectors," arXiv preprint arXiv:2212.07784, 2022.
    [31] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, "Yolov4: Optimal speed and accuracy of object detection," arXiv preprint arXiv:2004.10934, 2020.
    [32] Z. Yao, Y. Cao, S. Zheng, G. Huang, and S. Lin, "Cross-iteration batch normalization," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 12331-12340.
    [33] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, "Cbam: Convolutional block attention module," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3-19.
    [34] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, "Path aggregation network for instance segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8759-8768.
    [35] C.-Y. Wang, H.-Y. M. Liao, Y.-H. Wu, P.-Y. Chen, J.-W. Hsieh, and I.-H. Yeh, "CSPNet: A new backbone that can enhance learning capability of CNN," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020, pp. 390-391.
    [36] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 7464-7475.
    [37] C.-Y. Wang, I.-H. Yeh, and H.-Y. M. Liao, "You only learn one representation: Unified network for multiple tasks," arXiv preprint arXiv:2105.04206, 2021.
    [38] Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, "Yolox: Exceeding yolo series in 2021," arXiv preprint arXiv:2107.08430, 2021.
    [39] Z. Ge, S. Liu, Z. Li, O. Yoshie, and J. Sun, "Ota: Optimal transport assignment for object detection," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 303-312.
    [40] C. Li et al., "YOLOv6: A single-stage object detection framework for industrial applications," arXiv preprint arXiv:2209.02976, 2022.
    [41] K. Weng, X. Chu, X. Xu, J. Huang, and X. Wei, "Efficientrep: An efficient Repvgg-style convnets with hardware-aware neural network design," arXiv preprint arXiv:2302.00386, 2023.
    [42] X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, "Repvgg: Making vgg-style convnets great again," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 13733-13742.
    [43] Z. Gevorgyan, "SIoU loss: More powerful learning for bounding box regression," arXiv preprint arXiv:2205.12740, 2022.
    [44] YOLO v8. https://github.com/ultralytics/ultralytics?tab=readme-ov-file (accessed.
    [45] Ultralytics. https://www.ultralytics.com/zh/ (accessed.
    [46] MMDetection official website. https://github.com/open-mmlab/mmdetection (accessed.
    [47] Projects under OpenMMLab. https://openmmlab.com/codebase (accessed.
    [48] T. Jiang et al., "RTMPose: Real-Time Multi-Person Pose Estimation based on MMPose," arXiv preprint arXiv:2303.07399, 2023.
    [49] J. Li et al., "Human pose regression with residual log-likelihood estimation," in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 11025-11034.
    [50] Y. Yuan et al., "Hrformer: High-resolution transformer for dense prediction," arXiv preprint arXiv:2110.09408, 2021.
    [51] K. Sun, B. Xiao, D. Liu, and J. Wang, "Deep high-resolution representation learning for human pose estimation," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5693-5703.
    [52] W. Mao et al., "Poseur: Direct human pose regression with transformers," in European Conference on Computer Vision, 2022: Springer, pp. 72-88.
    [53] B. Artacho and A. Savakis, "Omnipose: A multi-scale framework for multi-person pose estimation," arXiv preprint arXiv:2103.10180, 2021.
    [54] Y. Cai et al., "Learning delicate local representations for multi-person pose estimation," in Computer Vision?CCV 2020: 16th European Conference, Glasgow, UK, August 23??8, 2020, Proceedings, Part III 16, 2020: Springer, pp. 455-472.
    [55] Y. Xu, J. Zhang, Q. Zhang, and D. Tao, "Vitpose: Simple vision transformer baselines for human pose estimation," Advances in Neural Information Processing Systems, vol. 35, pp. 38571-38584, 2022.
    [56] H. Liu, F. Liu, X. Fan, and D. Huang, "Polarized self-attention: Towards high-quality pixel-wise regression," arXiv preprint arXiv:2107.00782, 2021.
    [57] Y. Li et al., "Simcc: A simple coordinate classification perspective for human pose estimation," in European Conference on Computer Vision, 2022: Springer, pp. 89-106.
    [58] I. Matthews and S. Baker, "Active appearance models revisited," International journal of computer vision, vol. 60, pp. 135-164, 2004.
    [59] K. Zhang, L. Zhang, and M.-H. Yang, "Fast compressive tracking," IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 10, pp. 2002-2015, 2014.
    [60] Z. Xu et al., "A Review of Image Inpainting Methods Based on Deep Learning," Applied Sciences, vol. 13, no. 20, p. 11189, 2023.
    [61] Z. Liu, P. Luo, X. Wang, and X. Tang, "Deep learning face attributes in the wild," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 3730-3738.
    [62] S. Zhao et al., "Large scale image completion via co-modulated generative adversarial networks," arXiv preprint arXiv:2103.10428, 2021.
    [63] R. Suvorov et al., "Resolution-robust large mask inpainting with fourier convolutions," in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2022, pp. 2149-2159.
    [64] L. Chi, B. Jiang, and Y. Mu, "Fast fourier convolution," Advances in Neural Information Processing Systems, vol. 33, pp. 4479-4488, 2020.
    [65] M. Zhu et al., "Image inpainting by end-to-end cascaded refinement with mask awareness," IEEE Transactions on Image Processing, vol. 30, pp. 4855-4866, 2021.
    [66] T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu, "Semantic image synthesis with spatially-adaptive normalization," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 2337-2346.
    [67] X. Wang, K. Yu, C. Dong, and C. C. Loy, "Recovering realistic texture in image super-resolution by deep spatial feature transform," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 606-615.
    [68] Y. Zeng, J. Fu, H. Chao, and B. Guo, "Aggregated contextual transformations for high-resolution image inpainting," IEEE Transactions on Visualization and Computer Graphics, 2022.
    [69] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang, "Free-form image inpainting with gated convolution," in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 4471-4480.
    [70] S. Iizuka, E. Simo-Serra, and H. Ishikawa, "Globally and locally consistent image completion," ACM Transactions on Graphics (ToG), vol. 36, no. 4, pp. 1-14, 2017.
    [71] C. Li and M. Wand, "Precomputed real-time texture synthesis with markovian generative adversarial networks," in Computer Vision?CCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14, 2016: Springer, pp. 702-716.
    [72] J. Johnson, A. Alahi, and L. Fei-Fei, "Perceptual losses for real-time style transfer and super-resolution," in Computer Vision?CCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, 2016: Springer, pp. 694-711.
    [73] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, "Spectral normalization for generative adversarial networks," arXiv preprint arXiv:1802.05957, 2018.
    [74] K. Nazeri, E. Ng, T. Joseph, F. Z. Qureshi, and M. Ebrahimi, "Edgeconnect: Generative image inpainting with adversarial edge learning," arXiv preprint arXiv:1901.00212, 2019.
    [75] Z. Wan, J. Zhang, D. Chen, and J. Liao, "High-fidelity pluralistic image completion with transformers," in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 4692-4701.
    [76] W. Li, Z. Lin, K. Zhou, L. Qi, Y. Wang, and J. Jia, "Mat: Mask-aware transformer for large hole image inpainting," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10758-10768.
    [77] H. Jiang, J. Starkman, C.-H. Kuo, and M.-C. Huang, "Acu glass: quantifying acupuncture therapy using Google Glass," EAI Endorsed Transactions on Pervasive Health and Technology, vol. 2, no. 6, pp. e2-e2, 2016.
    [78] 吳家宇, "應用於視訊按摩機之背部阿是穴立體定位," 碩士, 電腦與通信工程研究所, 國立成功大學, 台南市, 2010. [Online]. Available: https://hdl.handle.net/11296/fmtuyn
    [79] K.-C. Lan, M.-C. Hu, Y.-Z. Chen, and J.-X. Zhang, "The application of 3D morphable model (3DMM) for real-time visualization of acupoints on a smartphone," IEEE Sensors Journal, vol. 21, no. 3, pp. 3289-3300, 2020.
    [80] X. X. Lu, "A review of solutions for perspective-n-point problem in camera pose estimation," in Journal of Physics: Conference Series, 2018, vol. 1087, no. 5: IOP Publishing, p. 052009.
    [81] 李冠陞, "利用機械手臂穴道按摩," 碩士, 資訊工程學系, 國立成功大學, 台南市, 2018. [Online]. Available: https://hdl.handle.net/11296/z2jfk8
    [82] 張俊翔, "使用擴增實境於智慧型手機上之足部穴道定位技術," 碩士, 醫學資訊研究所, 國立成功大學, 台南市, 2021. [Online]. Available: https://hdl.handle.net/11296/c98h6z
    [83] J. Kittler, P. Huber, Z.-H. Feng, G. Hu, and W. Christmas, "3D morphable face models and their applications," in Articulated Motion and Deformable Objects: 9th International Conference, AMDO 2016, Palma de Mallorca, Spain, July 13-15, 2016, Proceedings 9, 2016: Springer, pp. 185-206.
    [84] 洪浩祐, "透過機械手臂實現人體背部穴道自動定位," 碩士, 醫學資訊研究所, 國立成功大學, 台南市, 2021. [Online]. Available: https://hdl.handle.net/11296/c84uv5
    [85] H. Wang, L. Liu, Y. Wang, and S. Du, "Hand acupuncture point localization method based on a dual-attention mechanism and cascade network model," Biomedical Optics Express, vol. 14, no. 11, pp. 5965-5978, 2023.
    [86] X. Sun, J. Dong, Q. Li, D. Lu, and Z. Yuan, "Deep Learning-Based Auricular Point Localization for Auriculotherapy," IEEE Access, vol. 10, pp. 112898-112908, 2022.
    [87] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "Mobilenetv2: Inverted residuals and linear bottlenecks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510-4520.
    [88] M. Zhang, J. P. Schulze, and D. Zhang, "E-faceatlasAR: extend atlas of facial acupuncture points with auricular maps in augmented reality for self-acupressure," Virtual Reality, vol. 26, no. 4, pp. 1763-1776, 2022.
    [89] E. E. Hansley, M. P. Segundo, and S. Sarkar, "Employing fusion of learned and handcrafted features for unconstrained ear recognition," Iet Biometrics, vol. 7, no. 3, pp. 215-223, 2018.
    [90] S. Schaefer, T. McPhail, and J. Warren, "Image deformation using moving least squares," in ACM SIGGRAPH 2006 Papers, 2006, pp. 533-540.
    [91] C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, "Understanding deep learning (still) requires rethinking generalization," Communications of the ACM, vol. 64, no. 3, pp. 107-115, 2021.
    [92] A. Lukezic, T. Vojir, L. ?Cehovin Zajc, J. Matas, and M. Kristan, "Discriminative correlation filter with channel and spatial reliability," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 6309-6318.
    [93] M. I. Ribeiro and P. Lima, "Introduction to kalman filtering," Institue of Super Tehnico, 2000.
    [94] Q. Bai, "Analysis of particle swarm optimization algorithm," Computer and information science, vol. 3, no. 1, p. 180, 2010.
    [95] I. J. Cox and S. L. Hingorani, "An efficient implementation of Reid's multiple hypothesis tracking algorithm and its evaluation for the purpose of visual tracking," IEEE Transactions on pattern analysis and machine intelligence, vol. 18, no. 2, pp. 138-150, 1996.
    [96] A. Balasundaram, S. A. Kumar, and S. M. Kumar, "Optical flow based object movement tracking," International Journal of Engineering and Advanced Technology, vol. 9, no. 1, pp. 3913-3916, 2019.
    [97] H. Grabner, M. Grabner, and H. Bischof, "Real-time tracking via on-line boosting," in Bmvc, 2006, vol. 1, no. 5: Citeseer, p. 6.
    [98] Z. Kalal, K. Mikolajczyk, and J. Matas, "Tracking-learning-detection," IEEE transactions on pattern analysis and machine intelligence, vol. 34, no. 7, pp. 1409-1422, 2011.
    [99] Z. Kalal, K. Mikolajczyk, and J. Matas, "Forward-backward error: Automatic detection of tracking failures," in 2010 20th international conference on pattern recognition, 2010: IEEE, pp. 2756-2759.
    [100] B. Babenko, M.-H. Yang, and S. Belongie, "Visual tracking with online multiple instance learning," in 2009 IEEE Conference on computer vision and Pattern Recognition, 2009: IEEE, pp. 983-990.
    [101] D. S. Bolme, J. R. Beveridge, B. A. Draper, and Y. M. Lui, "Visual object tracking using adaptive correlation filters," in 2010 IEEE computer society conference on computer vision and pattern recognition, 2010: IEEE, pp. 2544-2550.
    [102] J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, "High-speed tracking with kernelized correlation filters," IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 3, pp. 583-596, 2014.
    [103] D. Held, S. Thrun, and S. Savarese, "Learning to track at 100 fps with deep regression networks," in Computer Vision?CCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11??4, 2016, Proceedings, Part I 14, 2016: Springer, pp. 749-765.
    [104] R. R. Patil, O. S. Vaidya, G. M. Phade, and S. T. Gandhe, "Qualified scrutiny for real-time object tracking framework," International Journal on Emerging Technologies, vol. 11, no. 3, pp. 313-319, 2020.
    [105] A. Brdjanin, N. Dardagan, D. Dzigal, and A. Akagic, "Single object trackers in opencv: A benchmark," in 2020 International Conference on INnovations in Intelligent SysTems and Applications (INISTA), 2020: IEEE, pp. 1-6.
    [106] MAT source code. https://github.com/fenglinglwb/MAT (accessed.
    [107] A. Dosovitskiy et al., "An image is worth 16x16 words: Transformers for image recognition at scale," arXiv preprint arXiv:2010.11929, 2020.
    [108] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, "Analyzing and improving the image quality of stylegan," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 8110-8119.
    [109] CelebA-HQ_512 pretrained model download. https://mycuhk-my.sharepoint.com/personal/1155137927_link_cuhk_edu_hk/_layouts/15/onedrive.aspx?ga=1&id=%2Fpersonal%2F1155137927%5Flink%5Fcuhk%5Fedu%5Fhk%2FDocuments%2FRelease%2FMAT%2Fmodels (accessed.
    [110] CelebA-HQ Dataset. https://github.com/tkarras/progressive_growing_of_gans#preparing-datasets-for-training (accessed.
    [111] W. Luo, Y. Li, R. Urtasun, and R. Zemel, "Understanding the effective receptive field in deep convolutional neural networks," Advances in neural information processing systems, vol. 29, 2016.
    [112] X. Ding, X. Zhang, J. Han, and G. Ding, "Scaling up your kernels to 31x31: Revisiting large kernel design in cnns," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 11963-11975.
    [113] A. G. Howard et al., "Mobilenets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017.
    [114] MMDetection Customize Dataset https://mmdetection.readthedocs.io/en/latest/advanced_guides/customize_dataset.html (accessed.
    [115] "Kullback–Leibler divergence." https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence (accessed.
    [116] MMPose github. https://github.com/open-mmlab/mmpose (accessed.
    [117] MMPose custom_dataset. https://mmpose.readthedocs.io/en/latest/advanced_guides/customize_datasets.html (accessed.
    [118] MMDeploy. https://github.com/open-mmlab/mmdeploy (accessed.
    [119] "NCNN." https://github.com/Tencent/ncnn (accessed.
    [120] PoseTracker-Android-Prototype. https://github.com/hanrui1sensetime/PoseTracker-Android-Prototype (accessed.
    [121] VOC2010. http://host.robots.ox.ac.uk/pascal/VOC/voc2010/ (accessed.
    [122] Indoor Scene Recognition. https://web.mit.edu/torralba/www/indoor.html (accessed.
    [123] Kaggle MIT Indoor Scenes. https://www.kaggle.com/datasets/itsahmad/indoor-scenes-cvpr-2019 (accessed.
    [124] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, "Gans trained by a two time-scale update rule converge to a local nash equilibrium," Advances in neural information processing systems, vol. 30, 2017.
    [125] C. Fang, Z. Wu, H. Zheng, J. Yang, C. Ma, and T. Zhang, "MCP: Multi-Chicken Pose Estimation Based on Transfer Learning," Animals, vol. 14, no. 12, p. 1774, 2024.
    [126] D. G. Low, "Distinctive image features from scale-invariant keypoints," Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
    [127] "Nutritional status." https://www.who.int/europe/news-room/fact-sheets/item/a-healthy-lifestyle---who-recommendations (accessed.
    [128] M. Bertalmio, A. L. Bertozzi, and G. Sapiro, "Navier-stokes, fluid dynamics, and image and video inpainting," in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, 2001, vol. 1: IEEE, pp. I-I.

    下載圖示
    校外:立即公開
    QR CODE