簡易檢索 / 詳目顯示

研究生: 呂威儒
Lu, Wei-Ru
論文名稱: 整合人臉辨識與 PTZ 攝影機控制技術的互動式智慧體育場系統
An Interactive Smart Stadium System Integrating Facial Recognition and PTZ Camera Control Technologies
指導教授: 張亞寧
Chang, Ya-Ning
學位類別: 碩士
Master
系所名稱: 敏求智慧運算學院 - 智慧科技系統碩士學位學程
MS Degree Program on Intelligent Technology Systems
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 120
中文關鍵詞: 體育場館人臉辨識雲台變焦(PTZ)攝影機控制YOLODeep SORTArcFace臺灣人臉資料集(TWFD)BRIEF 特徵比對
外文關鍵詞: sport stadiums, face recognition, pan–tilt–zoom (PTZ) camera control, YOLO, Deep SORT, ArcFace, Taiwanese Facial Dataset (TWFD), BRIEF feature matching
相關次數: 點閱:16下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究專注於結合人臉識別技術與 Pan-Tilt-Zoom (PTZ) 攝影機控制,開發智慧場館的即時互動系統,並針對現有以西方數據為主的人臉辨識模型局限性,建立了一套在地化的臺灣人臉部數據集。透過深度學習演算法,實現即時辨識「誰是大明星」的互動遊戲,並結合自動化攝影機控制技術,動態追蹤並顯示選定臉孔於場館螢幕。
    研究內容包括建構多樣化臺灣名人臉部數據集,並運用 YOLO 演算法進行人臉檢測與 Deep SORT 進行多目標追蹤,結合特徵匹配技術進一步開發 PTZ 攝影機的動態控制機制。實驗結果顯示,若採用多來源數據集進行混合訓練,可大幅增強模型的跨族群辨識泛化能力。特別是在台灣名人測試集中(TWFD),AUC 高達 0.994。PTZ 攝影機控制測試結果顯示,系統平均僅需約 2.87 秒即可自動將目標臉孔平移至畫面中央,反應速度約為人工操作所需時間(8.3 秒)的三分之一。自動控制過程中僅進行少量移動即可完成畫面構圖,展現出良好的即時性與效率。相較之下,傳統手動追蹤需多次調整方能完成同一任務,顯示本系統於實場環境中具備顯著的控制優勢與穩定性。
    本系統展現了創新的互動娛樂應用,能顯著提升觀眾參與感,並於演唱會、展覽等大型活動中具備實際應用潛力。同時,本研究也為多族群人臉辨識與智慧場館技術的整合應用開創新方向,並進一步探討未來在資料擴充與商業化應用上的可能性。

    This study presents the development of a real-time interactive system for smart venues by integrating facial recognition technology with Pan-Tilt-Zoom (PTZ) camera control. In response to the limitations of existing models that rely heavily on Western-centric datasets, a localized facial dataset featuring Taiwanese was constructed to improve recognition performance in regional contexts. Leveraging deep learning techniques, the system enables real-time “Celebrity look alike” interaction, dynamically identifying look-alike spectators and controlling PTZ cameras to display target faces on venue screens.
    The system design includes the construction of a diverse facial dataset representing Taiwanese public figures of various genders and facial characteristics. It employs YOLO for real-time face detection, Deep SORT for multi-object tracking, and feature matching to drive adaptive PTZ camera movements. Experimental results show that training with a combined dataset significantly enhances cross-group generalization. Specifically, the model achieved an AUC of 0.994 on the TWFD test set. In PTZ camera tests, the system automatically centered the target face within an average of 2.87 seconds—approximately one-third the time required for manual control—using minimal movement to complete the framing task. This efficiency underscores the system’s high responsiveness and operational stability in real-world deployments.
    Overall, the proposed system demonstrates innovative applications in interactive entertainment, significantly improving audience engagement and offering practical feasibility for large-scale events such as concerts and exhibitions. Furthermore, this work contributes to the integration of multi-ethnic facial recognition and smart venue technologies, laying a foundation for future research in dataset expansion, cross-cultural adaptability, and commercial deployment.

    中文摘要 i SUMMARY ii 致謝 xvii 目錄 1 表目錄 4 圖目錄 5 CHAPTER 1 緒論 6 1.1 研究背景 6 1.2 研究動機與目的 7 1.3 研究架構 8 CHAPTER 2 文獻回顧 10 2.1 智能場館的現況分析 10 2.1.1 增強互動體驗 11 2.1.2 智能導航與人流控制 11 2.1.3 社交互動系統 11 2.1.4 數據分析與回放 11 2.1.5 安全管理與監控 12 2.2 目標偵測技術 13 2.2.1 目標偵測演算法概述與演進 13 2.2.2 針對人臉偵測之討論 14 2.2.3 YOLO 15 • 圖像網格劃分 16 • CNN 特徵提取並進行回歸 17 • IOU(Intersection Over Union) 18 • 非極大值抑制(Non-Maximum Suppression, NMS) 20 2.3 人臉辨識 21 2.3.1 CNN 深度學習架構基礎 22 2.3.2 深度殘差網路(ResNet)之特徵擷取 23 • 殘差塊與跳躍連結 24 • Bottleneck 架構與分段設計 24 2.3.3 特徵空間與距離計算介紹 25 2.3.4 損失函數 26 2.3.5 人臉數據集 27 2.3.6 人臉偵測與識別技術在娛樂領域的應用 30 • 即時情感分析與數位互動增強 30 • 影片元數據生成技術 30 2.3.7 將 PTZ 攝影機控制與人臉識別技術相結合的研究 31 • 實時人臉檢測與追蹤演算法 31 • 人員持續追蹤與臉部捕捉演算法 31 • 全景與PTZ視覺融合系統 31 2.4 物體追蹤(Object Tracking) 31 2.4.1 物體追蹤回顧 32 2.4.2 Deep SORT:結合外觀資訊的即時多目標追蹤 33 • 物體偵測(Object Detection) 33 • 狀態估計:Kalman 濾波(Estimation – Kalman Filter) 34 • 目標關聯(Target Association) 35 • 追蹤生命週期(Track Identity Life Cycle) 35 2.5 PTZ 攝影機在大型活動中的應用與控制技術 36 2.5.1 PTZ校正與動態規劃 37 2.5.2 基於演算法的PTZ控制技術 37 2.6 特徵匹配 37 2.6.1 特徵匹配的歷史回顧 38 2.6.2 BRIEF 特徵匹配流程 39 • 特徵偵測(Feature Detection):FAST 39 • 關鍵點挑選(Keypoint Selection) 40 • 方向估計(Orientation Calculation) 40 • BRIEF 描述子(BRIEF Descriptor) 41 • 特徵匹配(Feature Matching) 41 2.7 本章小結 42 CHAPTER 3 追蹤系統 44 3.1 引言 44 3.2 系統總覽與流程 44 3.3 整合式追蹤系統設計與流程說明 45 CHAPTER 4 台灣人數據集與人臉辨識 47 4.1 台灣人數據集建構 47 4.1.1 前言 47 4.1.2 數據來源與GDC (Gather-Detect-Classify)收集方法 47 4.2 模型與方法設計 50 4.2.1 訓練流程 50 4.2.2 測試流程 53 CHAPTER 5 實驗設計、測試方法、結果與討論 55 5.1 實驗目的 55 5.2 人臉辨識子系統實驗 55 5.2.1 測試方法、環境與參數設定 55 5.2.2 測試數據集合 56 5.2.3 評估指標介紹(Accuracy、ROC、AUC、F1-score) 58 • 假陽性率 (FPR) 與真陽性率 (TPR) 58 • 準確率 (Accuracy) 58 • ROC 曲線下的面積 (AUC) 59 • F1 分數 (F1-Score) 59 5.2.4 實驗結果與討論 60 • 準確度結果 60 • TWFD 測試 62 • AFD測試 63 • WebFace 測試 65 • LFW 測試 66 • CASIA-V5 測試 68 • 資料集解讀 69 • 綜合討論 70 5.3 追蹤系統子系統實驗 72 5.3.1 測試環境設定與方法說明 72 5.3.2 PTZ 攝影機的實地追蹤性能結果(PT 控制) 74 5.3.3 追蹤系統的即時性與穩定性討論 75 5.3.4 延遲來源粗估與瓶頸判定 77 CHAPTER 6 建議與未來展望 79 6.1 研究貢獻 79 6.2 研究限制 79 6.3 系統改進建議 80 6.4 未來展望 80 參考資料 81 附錄 A 90 附錄 B 92

    [1] S. van Heck, B. Valks, and A. Den Heijer, "The added value of smart stadiums: a case study at Johan Cruijff Arena," Journal of Corporate Real Estate, vol. 23, no. 2, pp. 130-148, 2021, doi: 10.1108/JCRE-09-2020-0033.
    [2] N. E. Song and E. J. Kim, "A Modelling Study on Visitor Usage Intentions with AR-Based Content for Sports Venues," Revista de Psicología del Deporte (Journal of Sport Psychology), vol. 33, no. 1, pp. 139-152, 2024.
    [3] M. Ludvigsen and R. Veerasawmy, "Designing technology for active spectator experiences at sporting events," 2010: ACM, pp. 96-103, doi: 10.1145/1952222.1952243. [Online]. Available: https://dx.doi.org/10.1145/1952222.1952243
    [4] Z. Mahmood, T. Ali, S. Khattak, L. Hasan, and S. U. Khan, "Automatic player detection and identification for sports entertainment applications," Pattern Analysis and Applications, vol. 18, no. 4, pp. 971-982, 2015/11/01 2015, doi: 10.1007/s10044-014-0416-4.
    [5] A. Sadovnik, W. Gharbi, T. Vu, and A. Gallagher, "Finding your lookalike: Measuring face similarity rather than face identity," 2018, pp. 2345-2353.
    [6] A. Aggarwal, Y. Pandya, L. A. Ravindranathan, L. S. Ahire, M. Sethu, and K. Nandy, "Robust Actor Recognition in Entertainment Multimedia at Scale," 2022: ACM, pp. 2079-2087, doi: 10.1145/3503161.3548408. [Online]. Available: https://dx.doi.org/10.1145/3503161.3548408
    [7] K. Khan, W. Albattah, R. U. Khan, A. M. Qamar, and D. Nayab, "Advances and Trends in Real Time Visual Crowd Analysis," Sensors, vol. 20, no. 18, p. 5073, 2020, doi: 10.3390/s20185073.
    [8] S. Ranmale and B. Jagdale, "Systematic Review: Face Recognition Algorithms for Photos and Real-Time Applications," in 2024 5th International Conference on Mobile Computing and Sustainable Informatics (ICMCSI), 18-19 January 2024 2024, pp. 371-377, doi: 10.1109/ICMCSI61536.2024.00059. [Online]. Available: http://doi.ieeecomputersociety.org/10.1109/ICMCSI61536.2024.00059
    [9] M. Noyen. "Simu Liu said the celebrity look-alike segment at the NBA All-Star Weekend Game 'wasn't cool." Business Insider. https://www.businessinsider.com/simu-liu-celebrity-look-alike-segment-nba-all-star-weekend-2023-2 (accessed February, 2025).
    [10] Z. Liu, P. Luo, X. Wang, and X. Tang, "Deep learning face attributes in the wild," 2015, pp. 3730-3738.
    [11] Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman, "Vggface2: A dataset for recognising faces across pose and age," 2018: IEEE, pp. 67-74.
    [12] J. Ma et al., "Invariant feature regularization for fair face recognition," 2023, pp. 20861-20870.
    [13] A. Mian, "Realtime face detection and tracking using a single Pan, Tilt, Zoom camera," in 2008 23rd International Conference Image and Vision Computing New Zealand, 26-28 Nov. 2008 2008, pp. 1-6, doi: 10.1109/IVCNZ.2008.4762103.
    [14] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," 2015/06/08, doi: 10.48550/arXiv.1506.02640.
    [15] N. Wojke, A. Bewley, and D. Paulus, "Simple Online and Realtime Tracking with a Deep Association Metric," 2017/03/21, doi: 10.48550/arXiv.1703.07402.
    [16] M. Calonder, V. Lepetit, C. Strecha, and P. Fua, "Brief: Binary robust independent elementary features," 2010: Springer, pp. 778-792.
    [17] F. O’Brolcháin, S. de Colle, and B. Gordijn, "The Ethics of Smart Stadia: A Stakeholder Analysis of the Croke Park Project," Science and Engineering Ethics, vol. 25, no. 3, pp. 737-769, 2019/06/01 2019, doi: 10.1007/s11948-018-0033-5.
    [18] S. Panchanathan et al., "Enriching the Fan Experience in a Smart Stadium Using Internet of Things Technologies," International Journal of Semantic Computing, vol. 11, no. 02, pp. 137-170, 2017/06/01 2017, doi: 10.1142/S1793351X17400062.
    [19] C. Yang and C. L. Cole, "Smart Stadium as a Laboratory of Innovation: Technology, Sport, and Datafied Normalization of the Fans," Communication & Sport, vol. 10, no. 2, pp. 374-389, 2022/04/01 2020, doi: 10.1177/2167479520943579.
    [20] S. G. Davani, M. S. Al-Hadrusi, and N. J. Sarhan, "An Autonomous System for Efficient Control of PTZ Cameras," ACM Trans. Auton. Adapt. Syst., vol. 16, no. 2, p. Article 6, 2022, doi: 10.1145/3507658.
    [21] R. Veerasawmy and O. S. Iversen, "Bannerbattle: introducing crowd experience to interaction design," presented at the Proceedings of the 7th Nordic Conference on Human-Computer Interaction: Making Sense Through Design, Copenhagen, Denmark, 2012. [Online]. Available: https://doi.org/10.1145/2399016.2399052.
    [22] P. Halvorsen et al., "Bagadus: an integrated system for arena sports analytics: a soccer case study," presented at the Proceedings of the 4th ACM Multimedia Systems Conference, Oslo, Norway, 2013. [Online]. Available: https://doi.org/10.1145/2483977.2483982.
    [23] M. Beato and M. Jamil, "Intra-system reliability of SICS: video-tracking system (Digital. Stadium®) for performance analysis in football," The Journal of sports medicine and physical fitness, 2017, doi: 10.23736/S0022-4707.17.07267-X.
    [24] H. C. Shih, "A Survey of Content-Aware Video Analysis for Sports," IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 5, pp. 1212-1231, 2018, doi: 10.1109/TCSVT.2017.2655624.
    [25] Y. Cai and G. Medioni, "Demo: Persistent people tracking and face capture using a PTZ camera," in 2013 Seventh International Conference on Distributed Smart Cameras (ICDSC), 29 Oct.-1 Nov. 2013 2013, pp. 1-3, doi: 10.1109/ICDSC.2013.6778246.
    [26] B. Hutchins and M. Andrejevcic, "Olympian surveillance: Sports stadiums and the normalization of biometric monitoring," International Journal of Communication, vol. 15, p. 20, 2021.
    [27] Y. Xiao et al., "A review of object detection based on deep learning," Multimedia Tools and Applications, vol. 79, no. 33, pp. 23729-23791, 2020/09/01 2020, doi: 10.1007/s11042-020-08976-6.
    [28] Z. Zou, K. Chen, Z. Shi, Y. Guo, and J. Ye, "Object detection in 20 years: A survey," Proceedings of the IEEE, vol. 111, no. 3, pp. 257-276, 2023.
    [29] N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," presented at the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005. [Online]. Available: https://ieeexplore.ieee.org/document/1467360.
    [30] D. G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, 2004/11/01 2004, doi: 10.1023/B:VISI.0000029664.99615.94.
    [31] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, "Object Detection with Discriminatively Trained Part-Based Models | IEEE Journals & Magazine | IEEE Xplore," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 9, 2010, doi: 10.1109/TPAMI.2009.167.
    [32] R. Girshick, F. Iandola, T. Darrell, and J. Malik, "Deformable part models are convolutional neural networks," 2015, pp. 437-446.
    [33] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," 2013/11/11, doi: 10.48550/arXiv.1311.2524.
    [34] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," 2015/06/04, doi: 10.48550/arXiv.1506.01497.
    [35] W. Liu et al., "Ssd: Single shot multibox detector," 2016: Springer, pp. 21-37.
    [36] C.-Y. Wang and H.-Y. M. Liao, "YOLOv1 to YOLOv10: The fastest and most accurate real-time object detection systems," APSIPA Transactions on Signal and Information Processing, vol. 13, no. 1, 2024.
    [37] J. Terven, D.-M. Córdova-Esparza, and J.-A. Romero-González, "A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS," Machine Learning and Knowledge Extraction, vol. 5, no. 4, pp. 1680-1716, 2023. [Online]. Available: https://www.mdpi.com/2504-4990/5/4/83.
    [38] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, "End-to-end object detection with transformers," 2020: Springer, pp. 213-229.
    [39] Y. Wang, K. Pan, Y. Shao, J. Ma, and X. Li, "Applying Convolutional Vision Transformer for Emotion Recognition of Children with Autism: Fusion of Facial Expressions and Speech Features," 2025.
    [40] S. Tariyal, R. Chauhan, Y. Bijalwan, R. Rawat, and R. Gupta, "A comparitive study of MTCNN, Viola-Jones, SSD and YOLO face detection algorithms," 2024: IEEE, pp. 1-7.
    [41] P. Viola and M. Jones, "Rapid object detection using a boosted cascade of simple features | IEEE Conference Publication | IEEE Xplore," doi: 10.1109/CVPR.2001.990517.
    [42] P. Viola and M. J. Jones, "Robust real-time face detection," International journal of computer vision, vol. 57, pp. 137-154, 2004.
    [43] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, "Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks," 2016/04/11, doi: 10.48550/arXiv.1604.02878.
    [44] I. Dey. "What is YOLO algorithm." https://medium.com/@ishudey11032002/what-is-yolo-algorithm-ef5a3326510b (accessed 07-13, 2023).
    [45] H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, "Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression," 2019/02/25, doi: 10.48550/arXiv.1902.09630.
    [46] W. W. Bledsoe, "The Model Method in Facial Recognition," Panoramic Research, Inc., Palo Alto, CA, USA, 1964.
    [47] T. Kanade, Computer recognition of human faces. Birkhäuser Basel, 1977.
    [48] L. Sirovich and M. Kirby, "Low-dimensional procedure for the characterization of human faces," Journal of the Optical Society of America A, vol. 4, no. 3, pp. 519-524, 1987.
    [49] M. Turk and A. Pentland, "Eigenfaces for Recognition," Journal of Cognitive Neuroscience, vol. 3, no. 1, pp. 71-86, 1991, doi: 10.1162/jocn.1991.3.1.71.
    [50] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, "Eigenfaces vs. Fisherfaces: recognition using class specific linear projection," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711-720, 1997, doi: 10.1109/34.598228.
    [51] P. J. Phillips, M. Hyeonjoon, S. A. Rizvi, and P. J. Rauss, "The FERET evaluation methodology for face-recognition algorithms," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 10, pp. 1090-1104, 2000, doi: 10.1109/34.879790.
    [52] P. J. Phillips, P. Grother, R. Micheals, D. M. Blackburn, E. Tabassi, and M. Bone, "Face recognition vendor test 2002," 2003: IEEE, p. 44.
    [53] G. B. Huang, M. Mattar, T. Berg, and E. Learned-Miller, "Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments," presented at the Workshop on Faces in 'Real-Life' Images: Detection, Alignment, and Recognition, Marseille, France, 2008-10, 2008. [Online]. Available: https://inria.hal.science/inria-00321923.
    [54] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, "Robust Face Recognition via Sparse Representation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, 2009, doi: 10.1109/TPAMI.2008.79.
    [55] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," 2012. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.
    [56] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "ImageNet: A large-scale hierarchical image database | IEEE Conference Publication | IEEE Xplore," 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, doi: 10.1109/CVPR.2009.5206848.
    [57] M. Wang and W. Deng, "Deep Face Recognition: A Survey," 2018/04/18, doi: 10.48550/arXiv.1804.06655.
    [58] Y. Taigman, M. Yang, M. A. Ranzato, and L. Wolf, "DeepFace: Closing the Gap to Human-Level Performance in Face Verification | IEEE Conference Publication | IEEE Xplore," 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, doi: 10.1109/CVPR.2014.220.
    [59] Y. Sun, X. Wang, and X. Tang, "Deep Learning Face Representation from Predicting 10,000 Classes | IEEE Conference Publication | IEEE Xplore," 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, doi: 10.1109/CVPR.2014.244.
    [60] F. Schroff, D. Kalenichenko, and J. Philbin, "FaceNet: A unified embedding for face recognition and clustering," in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 7-12 June 2015 2015, pp. 815-823, doi: 10.1109/CVPR.2015.7298682.
    [61] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," 2015/12/10, doi: 10.48550/arXiv.1512.03385.
    [62] J. Deng, J. Guo, J. Yang, N. Xue, I. Kotsia, and S. Zafeiriou, "ArcFace: Additive Angular Margin Loss for Deep Face Recognition," 2018/01/23, doi: 10.48550/arXiv.1801.07698.
    [63] H. Wang et al., "CosFace: Large Margin Cosine Loss for Deep Face Recognition," 2018/01/29, doi: 10.48550/arXiv.1801.09414.
    [64] W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, and L. Song, "SphereFace: Deep Hypersphere Embedding for Face Recognition," 2017/04/26, doi: 10.48550/arXiv.1704.08063.
    [65] Y. Jing, X. Lu, and S. Gao, "3D face recognition: A comprehensive survey in 2022," Computational Visual Media, vol. 9, no. 4, pp. 657-685, 2023.
    [66] H. Kim, H. Choi, and Y. Kwak, "Federated Learning for Face Recognition via Intra-subject Self-supervised Learning," arXiv preprint arXiv:2407.16289, 2024.
    [67] W. K. Jung and H. Y. Kwon, "Privacy and data protection regulations for AI using publicly available data: Clearview AI case," presented at the Proceedings of the 17th International Conference on Theory and Practice of Electronic Governance, 2024. [Online]. Available: https://doi.org/10.1145/3680127.3680200.
    [68] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," nature, vol. 521, no. 7553, pp. 436-444, 2015.
    [69] P. Udawant, R. Pratap, S. Gupta, V. Upadhyay, K. Sabale, and H. K. Thakkar, "A Systematic Approach to Face Recognition Using Convolutional Neural Network," in 2024 International Conference on Advancements in Smart, Secure and Intelligent Computing (ASSIC), 27-29 Jan. 2024 2024, pp. 1-6, doi: 10.1109/ASSIC60049.2024.10507997.
    [70] K. He, X. Zhang, S. Ren, and J. Sun, "Identity Mappings in Deep Residual Networks," 2016/03/16, doi: 10.48550/arXiv.1603.05027.
    [71] I. Adjabi, A. Ouahabi, A. Benzaoui, and A. Taleb-Ahmed, "Past, Present, and Future of Face Recognition: A Review," Electronics, vol. 9, no. 8, p. 1188, 2020. [Online]. Available: https://www.mdpi.com/2079-9292/9/8/1188.
    [72] W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld, "Face recognition: A literature survey," ACM Comput. Surv., vol. 35, no. 4, pp. 399–458, 2003, doi: 10.1145/954339.954342.
    [73] L. Li, X. Mu, S. Li, and H. Peng, "A Review of Face Recognition Technology," IEEE Access, vol. 8, pp. 139110-139120, 2020, doi: 10.1109/ACCESS.2020.3011028.
    [74] W. J. Scheirer, A. d. R. Rocha, A. Sapkota, and T. E. Boult, "Toward Open Set Recognition | IEEE Journals & Magazine | IEEE Xplore," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 7, 2013, doi: 10.1109/TPAMI.2012.256.
    [75] Y. Wen, K. Zhang, Z. Li, and Y. Qiao, "A Discriminative Feature Learning Approach for Deep Face Recognition," Lecture Notes in Computer Science, 2016, doi: 10.1007/978-3-319-46478-7_31.
    [76] W. Liu, Y. Wen, Z. Yu, and M. Yang, "Large-Margin Softmax Loss for Convolutional Neural Networks," 2016/12/07, doi: 10.48550/arXiv.1612.02295.
    [77] F. Wang, W. Liu, H. Liu, and J. Cheng, "Additive Margin Softmax for Face Verification," 2018/01/17, doi: 10.48550/arXiv.1801.05599.
    [78] R. Ranjan, C. D. Castillo, and R. Chellappa, "L2-constrained Softmax Loss for Discriminative Face Verification," 2017/03/28, doi: 10.48550/arXiv.1703.09507.
    [79] F. Wang, X. Xiang, J. Cheng, and A. L. Yuille, "NormFace: L2 Hypersphere Embedding for Face Verification," 2017/04/21, doi: 10.48550/arXiv.1704.06369.
    [80] B. Maze et al., "IARPA Janus Benchmark - C: Face Dataset and Protocol | IEEE Conference Publication | IEEE Xplore," 2018 International Conference on Biometrics (ICB), 2018, doi: 10.1109/ICB2018.2018.00033.
    [81] Y. Huang et al., "CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition," 2020/04/01, doi: 10.48550/arXiv.2004.00288.
    [82] Y. Wang et al., "Resource Aware Person Re-identification across Multiple Resolutions," 2018/05/22, doi: 10.48550/arXiv.1805.08805.
    [83] Z. Xiong, Z. Wang, C. Du, R. Zhu, J. Xiao, and T. Lu, "An Asian Face Dataset and How Race Influences Face Recognition," Advances in Multimedia Information Processing – PCM 2018, 2018, doi: 10.1007/978-3-030-00767-6_35.
    [84] Y. Guo, L. Zhang, Y. Hu, X. He, and J. Gao, "MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition," 2016/07/27, doi: 10.48550/arXiv.1607.08221.
    [85] H.-W. Ng and S. Winkler, "A data-driven approach to cleaning large face datasets | IEEE Conference Publication | IEEE Xplore," doi: 10.1109/ICIP.2014.7025068.
    [86] D. Yi, Z. Lei, S. Liao, and S. Z. Li, "Learning Face Representation from Scratch," 2014/11/28, doi: 10.48550/arXiv.1411.7923.
    [87] I. Kemelmacher-Shlizerman, S. Seitz, D. Miller, and E. Brossard, "The MegaFace Benchmark: 1 Million Faces for Recognition at Scale," 2015/12/02, doi: 10.48550/arXiv.1512.00596.
    [88] L. Wolf, T. Hassner, and I. Maoz, "Face recognition in unconstrained videos with matched background similarity | IEEE Conference Publication | IEEE Xplore," CVPR 2011, 2011, doi: 10.1109/CVPR.2011.5995566.
    [89] O. Jesorsky, K. J. Kirchberg, and R. W. Frischholz, "Robust Face Detection Using the Hausdorff Distance," Lecture Notes in Computer Science, 2001, doi: 10.1007/3-540-45344-X_14.
    [90] I. o. A. Chinese Academy of Sciences. "CASIA Face Image Database Version 5.0 (CASIA-FaceV5)." Center for Biometrics and Security Research, National Laboratory of Pattern Recognition. http://biometrics.idealtest.org/findTotalDbByMode.do?mode=Face (accessed Technical Documentation.
    [91] A. Khuntia and S. Kale, "Real Time Emotion Analysis Using Deep Learning for Education, Entertainment, and Beyond," 2024/07/05, doi: 10.48550/arXiv.2407.04560.
    [92] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998, doi: 10.1109/5.726791.
    [93] R. Pena, F. Ferreira, F. Caroli, L. Schirmer Silva, and H. Lopes, Globo Face Stream: A System for Video Meta-data Generation in an Entertainment Industry Setting. 2020.
    [94] R. Khemmar, J. Ertaud, Yves, and X. Savatier, "Face Detection & Recognition based on Fusion of Omnidirectional & PTZ Vision Sensors and Heteregenous Database," (in English), International Journal of Computer Applications, vol. 61, 2013 2013. [Online]. Available: https://hal.science/hal-02343543.
    [95] C. Ding, B. Song, A. Morye, J. A. Farrell, and A. K. Roy-Chowdhury, "Collaborative Sensing in a Distributed PTZ Camera Network," IEEE Transactions on Image Processing, vol. 21, no. 7, pp. 3282-3295, 2012, doi: 10.1109/TIP.2012.2188806.
    [96] A. W. Smeulders, D. M. Chu, R. Cucchiara, S. Calderara, A. Dehghan, and M. Shah, "Visual tracking: An experimental survey," IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 7, pp. 1442-1468, 2013.
    [97] A. Yilmaz, O. Javed, and M. Shah, "Object tracking: A survey," Acm computing surveys (CSUR), vol. 38, no. 4, pp. 13-es, 2006.
    [98] S. R. Sain, "The nature of statistical learning theory," ed: Taylor & Francis, 1996.
    [99] L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. Torr, "Fully-convolutional siamese networks for object tracking," in Computer vision–ECCV 2016 workshops: Amsterdam, the Netherlands, October 8-10 and 15-16, 2016, proceedings, part II 14, 2016: Springer, pp. 850-865.
    [100] W. Luo, J. Xing, A. Milan, X. Zhang, W. Liu, and T.-K. Kim, "Multiple object tracking: A literature review," Artificial intelligence, vol. 293, p. 103448, 2021.
    [101] Y. Huang, X. Liu, Y. Zhang, and J.-F. Hu, "Learning Discriminative Proposal Representation for Multi-object Tracking," in International Conference on Image and Graphics, 2023: Springer, pp. 300-310.
    [102] J.-B. Grill et al., "Bootstrap your own latent-a new approach to self-supervised learning," Advances in neural information processing systems, vol. 33, pp. 21271-21284, 2020.
    [103] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, "End-to-end object detection with transformers," in European conference on computer vision, 2020: Springer, pp. 213-229.
    [104] P. Sun et al., "Transtrack: Multiple object tracking with transformer," arXiv preprint arXiv:2012.15460, 2020.
    [105] A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, "Simple Online and Realtime Tracking," 2016/02/02, doi: 10.48550/arXiv.1602.00763.
    [106] N. Wojke and A. Bewley, "Deep cosine metric learning for person re-identification," in 2018 IEEE winter conference on applications of computer vision (WACV), 2018: IEEE, pp. 748-756.
    [107] O. Elharrouss, N. Almaadeed, and S. Al-Maadeed, "A review of video surveillance systems," Journal of Visual Communication and Image Representation, vol. 77, p. 103116, 2021/05/01/ 2021, doi: https://doi.org/10.1016/j.jvcir.2021.103116.
    [108] P. Vennam, P. T. C, T. B. M, Y.-G. Kim, and P. K. B. N, "Attacks and Preventive Measures on Video Surveillance Systems: A Review," Applied Sciences, vol. 11, no. 12, doi: 10.3390/app11125571.
    [109] A. S. Olagoke, H. Ibrahim, and S. S. Teoh, "Literature Survey on Multi-Camera System and Its Application," IEEE Access, vol. 8, pp. 172892-172922, 2020, doi: 10.1109/ACCESS.2020.3024568.
    [110] D. Oladimeji, K. Gupta, N. A. Kose, K. Gundogan, L. Ge, and F. Liang, "Smart Transportation: An Overview of Technologies and Applications," Sensors, vol. 23, no. 8, doi: 10.3390/s23083880.
    [111] M. A. Khan, H. Menouar, A. Eldeeb, A. Abu-Dayya, and F. D. Salim, "On the Detection of Unauthorized Drones—Techniques and Future Perspectives: A Review," IEEE Sensors Journal, vol. 22, no. 12, pp. 11439-11455, 2022, doi: 10.1109/JSEN.2022.3171293.
    [112] L. Cui, S. Yang, F. Chen, Z. Ming, N. Lu, and J. Qin, "A survey on application of machine learning for Internet of Things," International Journal of Machine Learning and Cybernetics, vol. 9, pp. 1399-1417, 2018.
    [113] H. Yong, J. Huang, W. Xiang, X. Hua, and L. Zhang, "Panoramic Background Image Generation for PTZ Cameras," IEEE Transactions on Image Processing, vol. 28, no. 7, pp. 3162-3176, 2019, doi: 10.1109/TIP.2019.2894940.
    [114] I. H. Chen and S. J. Wang, "An Efficient Approach for the Calibration of Multiple PTZ Cameras," IEEE Transactions on Automation Science and Engineering, vol. 4, no. 2, pp. 286-293, 2007, doi: 10.1109/TASE.2006.884040.
    [115] M. S. Al-Hadrusi, N. J. Sarhan, and S. G. Davani, "A Clustering Approach for Controlling PTZ Cameras in Automated Video Surveillance," in 2016 IEEE International Symposium on Multimedia (ISM), 11-13 Dec. 2016 2016, pp. 333-336, doi: 10.1109/ISM.2016.0073.
    [116] W. Starzyk and F. Z. Qureshi, "Learning proactive control strategies for PTZ cameras," in 2011 Fifth ACM/IEEE International Conference on Distributed Smart Cameras, 22-25 Aug. 2011 2011, pp. 1-6, doi: 10.1109/ICDSC.2011.6042928.
    [117] B. Zitová and J. Flusser, "Image registration methods: a survey," Image and Vision Computing, vol. 21, no. 11, pp. 977-1000, 2003/10/01/ 2003, doi: https://doi.org/10.1016/S0262-8856(03)00137-9.
    [118] C. Harris and M. Stephens, "A Combined Corner and Edge Detector," in Alvey Vision Conference, Manchester, September 1988 1988, pp. 147-152, doi: 10.5244/c.2.23.
    [119] H. Bay, T. Tuytelaars, and L. Van Gool, "Surf: Speeded up robust features," in Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 7-13, 2006. Proceedings, Part I 9, 2006: Springer, pp. 404-417.
    [120] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, "ORB: An efficient alternative to SIFT or SURF," in 2011 International Conference on Computer Vision, 6-13 Nov. 2011 2011, pp. 2564-2571, doi: 10.1109/ICCV.2011.6126544.
    [121] S. Leutenegger, M. Chli, and R. Y. Siegwart, "BRISK: Binary robust invariant scalable keypoints," in 2011 International conference on computer vision, 2011: Ieee, pp. 2548-2555.
    [122] A. Alahi, R. Ortiz, and P. Vandergheynst, "Freak: Fast retina keypoint," in 2012 IEEE conference on computer vision and pattern recognition, 2012: Ieee, pp. 510-517.
    [123] T.-Y. Yang, J.-H. Hsu, Y.-Y. Lin, and Y.-Y. Chuang, "Deepcd: Learning deep complementary descriptors for patch representations," in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 3314-3322.
    [124] P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, "Superglue: Learning feature matching with graph neural networks," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 4938-4947.
    [125] Q. Huang, X. Guo, Y. Wang, H. Sun, and L. Yang, "A survey of feature matching methods," IET Image Processing, vol. 18, no. 6, pp. 1385-1410, 2024.
    [126] M. Muja and D. G. Lowe, "Fast matching of binary features," in 2012 Ninth conference on computer and robot vision, 2012: IEEE, pp. 404-410.
    [127] E. Rosten and T. Drummond, "Machine Learning for High-Speed Corner Detection," Lecture Notes in Computer Science, 2006, doi: 10.1007/11744023_34.
    [128] V. Lepetit and P. Fua, "Keypoint recognition using randomized trees," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 9, pp. 1465-1479, 2006, doi: 10.1109/TPAMI.2006.188.
    [129] M.-K. Hu, "Visual pattern recognition by moment invariants | IEEE Journals & Magazine | IEEE Xplore," IEEE Transactions on Information Theory, vol. 8, no. 2, 1962, doi: 10.1109/TIT.1962.1057692.
    [130] A. Gionis, P. Indyk, and R. Motwani, "Similarity Search in High Dimensions via Hashing," presented at the Proceedings of the 25th International Conference on Very Large Data Bases, 1999.
    [131] F. A. and B. C., "Random sample consensus," Communications of the ACM, vol. 24, no. 6, 1981-06-01, doi: 10.1145/358669.358692.
    [132] T. Schlett, C. Rathgeb, J. Tapia, and C. Busch, "Double Trouble? Impact and Detection of Duplicates in Face Image Datasets," arXiv preprint arXiv:2401.14088, 2024.
    [133] AutoCrawler. (2025). GitHub. [Online]. Available: https://github.com/YoongiKim/AutoCrawler
    [134] S. J. Pan and Q. Yang, "A Survey on Transfer Learning," IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345-1359, 2010, doi: 10.1109/TKDE.2009.191.
    [135] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, "Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning," 2016/02/23, doi: 10.48550/arXiv.1602.07261.
    [136] S. Ioffe and C. Szegedy, "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift," 2015/02/11, doi: 10.48550/arXiv.1502.03167.
    [137] H. V. Nguyen and L. Bai, "Cosine Similarity Metric Learning for Face Verification," Computer Vision – ACCV 2010, 2011, doi: 10.1007/978-3-642-19309-5_55.
    [138] M. L. Ngan et al., Ongoing Face Recognition Vendor Test (FRVT) Part 6A: Face recognition accuracy with masks using pre- COVID-19 algorithms (NIST). 2020-07-24T08:00-04:00.
    [139] J. P. Robinson, G. Livitz, Y. Henon, C. Qin, Y. Fu, and S. Timoner, "Face Recognition: Too Bias, or Not Too Bias?," 2020/02/16, doi: 10.48550/arXiv.2002.06483.
    [140] P. A. Flach, "The geometry of ROC space: understanding machine learning metrics through ROC isometrics," 2003, pp. 194-201.
    [141] H.-R. Chou, J.-H. Lee, Y.-M. Chan, and C.-S. Chen, "Data-specific Adaptive Threshold for Face Recognition and Authentication," 2018/10/26, doi: 10.48550/arXiv.1810.11160.
    [142] S. I. Serengil and A. Ozpinar, "LightFace: A Hybrid Deep Face Recognition Framework | IEEE Conference Publication | IEEE Xplore," 2020 Innovations in Intelligent Systems and Applications Conference (ASYU), 2020, doi: 10.1109/ASYU50717.2020.9259802.
    [143] A. P. Bradley, "The use of the area under the ROC curve in the evaluation of machine learning algorithms," Pattern recognition, vol. 30, no. 7, pp. 1145-1159, 1997.

    QR CODE