簡易檢索 / 詳目顯示

研究生: 陳冠廷
Chen, Kuan-Ting
論文名稱: 從影片資料中進行高風險行為標記的配對選擇技術
SPS: Strategic pair selection for high-risk behavior labeling from video data
指導教授: 莊坤達
Chuang, Kun-Ta
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 37
中文關鍵詞: 資料標註成對採樣影像分析
外文關鍵詞: Data labeling, Pairwise sampling, Image analysis
相關次數: 點閱:54下載:12
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 日常生活充滿了潛在的危險,典型的例子包括行人交通事故和兒童的校園安全問題。針 對這些風險,危險行為檢測技術領域受到了相當多的關注。危險行為偵測技術可以在危險行 為發生時及早偵測以減少反應時間並最大程度地減輕傷害的發生。然而,在為這些模型生成 標註數據時,由於人類行為的多樣性和定義危險行為的主觀性使得人為標註的過程充滿挑 戰,經常導致模稜兩可的情況。為了克服這一挑戰,我們引入了一種基於配對比較的標註生 成框架,稱為Strategic Pair Selection(SPS)。 SPS採用配對比較的方法來幫助註釋者判斷 模稜兩可的情況,從而提高危險行為偵測的準確性。此外,SPS 結合基於影片的動作分析來 了解危險行為的特徵,優化配對挑選的策略。在真實數據上的實驗結果表明,SPS明顯優於 其他比較的模型,顯示出其良好的實用性。

    Accidental risk can occur anywhere in daily life, with typical examples including pedestrian accidents and concerns about child safety on school campuses. In response to these risks, the field of dangerous behavior detection technology has gained considerable attention. Such technology aims to minimize response times and mitigate the occurrence of harm through early detection of potentially dangerous behavior. However, when it comes to generating label data for these models, the diversity of human behavior and the subjective nature of defining dangerous behaviors make the labeling process challenging, often leading to ambiguous situations. To overcome this challenge, we introduce a labeling generation framework based on pair comparison called Strategic Pair Selection (SPS). SPS employs a comparative approach to assist annotators in determining ambiguous cases, thus enhancing the accuracy of the detection of dangerous behavior. Additionally, SPS combines video-based action analysis to learn distinctive features of dangerous behaviors, optimizing the selection of pairs for comparison. The experimental results on real data demonstrate that SPS outperforms other pairwise sampling baseline models, showing its attractive practicability.

    中文摘要 ii Abstract iii Acknowledgment iv Contents v List of Tables vii List of Figures viii 1 Introduction 1 2 Related Works 5 2.1 Human Action Representation 5 2.2 Action Representation Learning 5 2.3 Pairwise Active Sampling 6 2.3.1 Passive Approaches 6 2.3.2 Sorting Approaches 7 2.3.3 Information-gain Approaches 7 2.3.4 Matchmaking Approaches 8 2.4 Deep Reinforcement Learning 8 3 Methodology 10 3.1 System Architecture 10 3.2 Action video analysis 13 3.2.1 Skeleton based video 3 3.2.2 Skeleton-based action analysis 14 3.3 Candidate Selector 14 3.3.1 Training phase 15 3.3.2 Working phase 16 3.4 Pair Selector 18 3.5 Pairwise to Score 20 4 Experimental Results 22 4.1 Dataset Description 22 4.2 Experimental Settings 22 4.3 Evaluation Metrics 24 4.4 Baseline Methods 25 4.5 Experimental Results 26 4.6 Ablation Study 30 5 Conclusions 31 6 Future Work 32 Bibliography 33

    [1] W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijayanarasimhan, F. Viola, T. Green, T. Back, A. Natsev, M. Suleyman, and A. Zisserman, “The kinetics human action video dataset,” ArXiv, vol. abs/1705.06950, 2017.

    [2] Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh, “Openpose: Realtime multiperson 2d pose estimation using part affinity fields,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, pp. 172–186, 2018.

    [3] “Data query of ministry of transportation,” https://roadsafety.tw/Dashboard/Custom.

    [4] M. Bortnikov, A. Khan, A. M. Khattak, and M. Ahmad, “Accident recognition via 3d cnns for automated traffic monitoring in smart cities,” Advances in Intelligent Systems and Computing, 2019.

    [5] P. K. Mishra, A. Iaboni, B. Ye, K. Newman, A. Mihailidis, and S. S. Khan, “Privacyprotecting behaviours of risk detection in people with dementia using videos,” BioMedical Engineering OnLine, vol. 22, 2022.

    [6] J. Cohen, “School safety and school violence: Trends,” International Journal of Applied Psychoanalytic Studies, 2021.

    [7] T. Ozyer, A. D. Selin, and R. Alhajj, “Human action recognition approaches with video datasets - a survey,” Knowl. Based Syst., vol. 222, p. 106995, 2021.

    [8] K. R. Tarlow, D. F. Brossart, A. M. McCammon, A. J. Giovanetti, M. C. Belle, and J. Philip, “Reliable visual analysis of single-case data: A comparison of rating, ranking, and pairwise methods,” Cogent Psychology, vol. 8, 2021.

    [9] A. J. Culyer, “Thurstone’s law of comparative judgment,” 2014.

    [10] X. Chen, P. N. Bennett, K. Collins-Thompson, and E. Horvitz, “Pairwise ranking aggregation in a crowdsourced setting,” Proceedings of the sixth ACM international conference on Web search and data mining, 2013.

    [11] A. Shahroudy, J. Liu, T.-T. Ng, and G. Wang, “Ntu rgb+d: A large scale dataset for 3d human activity analysis,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1010–1019, 2016.

    [12] Y. Ji, F. Xu, Y. Yang, F. Shen, H. T. Shen, and W. Zheng, “A large-scale rgb-d database for arbitrary-view human action recognition,” Proceedings of the 26th ACM international conference on Multimedia, 2018.

    [13] J. Steward, D. D. Lichti, J. C. K. Chow, R. Ferber, and S. T. Osis, “Performance assessment and calibration of the kinect 2.0 time-of-flight range camera for use in motion capture applications,” 2015.

    [14] G. Rogez, P. Weinzaepfel, and C. Schmid, “Lcr-net++: Multi-person 2d and 3d pose detection in natural images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, pp. 1146–1161, 2018.

    [15] S. Lu, H.-J. Ye, and D. chuan Zhan, “Few-shot action recognition with compromised metric via optimal transport,” ArXiv, vol. abs/2104.03737, 2021.

    [16] K. Cao, J. Ji, Z. Cao, C. Chang, and J. C. Niebles, “Few-shot video classification via temporal alignment,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10 615–10 624, 2019.

    [17] S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeletonbased action recognition,” in AAAI Conference on Artificial Intelligence, 2018.

    [18] S. Ji, W. Xu, M. Yang, and K. Yu, “3d convolutional neural networks for human action recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, pp. 221–231, 2010.

    [19] Z. Liu, H. Zhang, Z. Chen, Z. Wang, and W. Ouyang, “Disentangling and unifying graph convolutions for skeleton-based action recognition,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 140–149, 2020.

    [20] C. S. Lea, M. D. Flynn, R. Vidal, A. Reiter, and G. Hager, “Temporal convolutional networks for action segmentation and detection,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1003–1012, 2016.

    [21] P. Ghosh, Y. Yao, L. S. Davis, and A. Divakaran, “Stacked spatio-temporal graph convolutional networks for action segmentation,” 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 565–574, 2018.

    [22] A. Aristidou, D. Cohen-Or, J. K. Hodgins, Y. Chrysanthou, and A. Shamir, “Deep motifs and motion signatures,” ACM Transactions on Graphics (TOG), vol. 37, pp. 1 – 13, 2018.

    [23] K. Jun, D.-W. Lee, K. Lee, S. Lee, and M. S. Kim, “Feature extraction using an rnn autoencoder for skeleton-based abnormal gait recognition,” IEEE Access, vol. 8, pp. 19196–19207, 2020.

    [24] L. Zhao, X. Peng, Y. Tian, M. Kapadia, and D. N. Metaxas, “Semantic graph convolutional networks for 3d human pose regression,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3420–3430, 2019.

    [25] R. Heckel, M. Simchowitz, K. Ramchandran, and M. J. Wainwright, “Approximate ranking from pairwise comparisons,” ArXiv, vol. abs/1801.01253, 2018.

    [26] R. Heckel, N. B. Shah, K. Ramchandran, and M. J. Wainwright, “Active ranking from pairwise comparisons and when parametric assumptions do not help,” arXiv: Learning, 2016.

    [27] K. G. Jamieson and R. D. Nowak, “Active ranking using pairwise comparisons,” in NIPS, 2011.

    [28] E. Zerman, V. Hulusic, G. Valenzise, R. K. Mantiuk, and F. Dufaux, “The relation between mos and pairwise comparisons and the importance of cross-content comparisons,” in Human Vision and Electronic Imaging, 2018.

    [29] L. Maystre and M. Grossglauser, “Just sort it! a simple and effective approach to active preference learning,” in International Conference on Machine Learning, 2015.

    [30] N. N. Ponomarenko, L. Jin, O. Ieremeiev, V. V. Lukin, K. O. Egiazarian, J. Astola, B. Vozel, K. Chehdi, M. Carli, F. Battisti, and C.-C. J. Kuo, “Image database tid2013: Peculiarities, results and perspectives,” Signal Process. Image Commun., vol. 30, pp. 57–77, 2015.

    [31] S. Kullback and R. A. Leibler, “On information and sufficiency,” Annals of Mathematical Statistics, vol. 22, pp. 79–86, 1951.

    [32] B. Settles, “Active learning literature survey,” 2009.

    [33] T. Pfeiffer, X. A. Gao, Y. Chen, A. Mao, and D. G. Rand, “Adaptive polling for information aggregation,” Proceedings of the AAAI Conference on Artificial Intelligence, 2012.

    [34] M. E. Glickman and S. T. Jensen, “Adaptive paired comparison design,” Journal of Statistical Planning and Inference, vol. 127, pp. 279–293, 2005.

    [35] P. Ye and D. S. Doermann, “Active sampling for subjective image quality assessment,” 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 4249–4256, 2014.

    [36] J. Li, R. K. Mantiuk, J. Wang, S. Ling, and P. L. Callet, “Hybrid-mst: A hybrid active sampling strategy for pairwise preference aggregation,” in Neural Information Processing Systems, 2018.

    [37] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, “Introduction to algorithms, third edition,” 2009.

    [38] Q. Xu, J. Xiong, X. Chen, Q. Huang, and Y. Yao, “Hodgerank with information maximization for crowdsourced pairwise ranking aggregation,” in AAAI Conference on Artificial Intelligence, 2017.

    [39] R. Herbrich, T. P. Minka, and T. Graepel, “Trueskilltm: A bayesian skill rating system,” in NIPS, 2006.

    [40] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. A. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, pp. 529–533, 2015.

    [41] T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” CoRR, vol. abs/1511.05952, 2015.

    [42] T. Hester, M. Vecer´ık, O. Pietquin, M. Lanctot, T. Schaul, B. Piot, D. Horgan, J. Quan, A. Sendonaris, I. Osband, G. Dulac-Arnold, J. P. Agapiou, J. Z. Leibo, and A. Gruslys, “Deep q-learning from demonstrations,” in AAAI Conference on Artificial Intelligence, 2017.

    [43] Z. Wang, T. Schaul, M. Hessel, H. V. Hasselt, M. Lanctot, and N. de Freitas, “Dueling network architectures for deep reinforcement learning,” in International Conference on Machine Learning, 2015.

    [44] H.-H. Pham, H. Salmane, L. Khoudour, A. Crouzil, P. Zegers, and S. A. Velast´ın, “Spatio–temporal image representation of 3d skeletal movements for view-invariant action recognition with deep convolutional neural networks †,” Sensors (Basel, Switzerland), vol. 19, 2019.

    [45] C. J. C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. N. Hullender, “Learning to rank using gradient descent,” Proceedings of the 22nd international conference on Machine learning, 2005.

    [46] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. A. Riedmiller, “Playing atari with deep reinforcement learning,” ArXiv, vol. abs/1312.5602, 2013.

    [47] H. V. Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double qlearning,” in AAAI Conference on Artificial Intelligence, 2015.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE