簡易檢索 / 詳目顯示

研究生: 郭嘉玲
Guo, Jia-Ling
論文名稱: 非監督式學習用於監視影像異常事件偵測
A Unsupervised Learning Model for Abnormal Events Detection in Surveillance Videos
指導教授: 王宗一
Wang, Tzone-I
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 40
中文關鍵詞: 異常事件偵測遷移學習神經網路非監督式學習
外文關鍵詞: Abnormal Events Detection, Transfer Learning, Neural Network, Unsupervised Learning
相關次數: 點閱:110下載:35
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 監視器被廣泛運在許多地方,如交通安全、車輛辨識、保全系統...等,隨著監視器的數量越來越多,對於負責監控管理人員產生相當大的負擔,長時間的觀看監視影像也容易疲勞而造成判斷力下降,監視影像中包含許多不同物體及事件,物件偵測或行為分析方法已經無法滿足需求,若有監督系統能夠協助監看影片並判斷重要資訊及事件,對於監視系統在環境異常的資訊流偵測上,將有相當大幫助。。
    本研究主要開發一種影像異常偵測的方法,使用現實世界中的監視器影像做為資料集,蒐集到的影片每秒取一張,將圖片轉成灰階並resize 至256 × 256 ,將這些為標記的影像資料按照時間序依序輸入到autoencoder中學習數據特徵, 再藉由ConvLSTM模型學習時間上的訊息,透過大量學習正常的影像資料,使用Outlier的事件分辨為異常,為解決戶外路口監視器可能因為光影、不同時間或人流車流變化影像異常偵測的準確性,本研究使用遷移學習的方式,保留前一次訓練權重,使用近期的資料訓練2個epoch,結果與未進行遷移學習的模型相比對於異常事件的偵測結果有明顯提升,證明此方法確實能夠提升模型對於時間段不同的異常事件判斷。

    Surveillance cameras are widely used everywhere nowadays, such as traffic safety, vehicle identification, security systems, etc. With the increasing number of Surveillance cameras in some places, it has caused a considerable burden on the personnel who are responsible for monitoring irregular events. It is easy to get tired and causes a decline in judgment after focusing on viewing the monitor for a long time. There are many objects and events in a surveillance video, where traditional object detection or behavior analysis methods can no longer meet the needs of judging abnormal events in the environment. It will be of great help if there is an automatic monitoring system that can constantly watch and identify abnormal events in a surveillance video.
    This research develops a method of detecting anomaly events in surveillance videos by establishing a deep learning model for video monitoring and automatic warning of abnormal information flows in the videos. The training set of the model is self-made and is a collection of surveillance videos from the real world. Images are collected from videos one per second and are properly pre-processed before they are input to the training model in video time sequence. The features of the image data are learned by the autoencoder, and the spatiotemporal sequence prediction model (ConvLSTM) that combining the convolution (Convolution) and the Long Short-Term Memory (Long Short-Term Memory) network architecture, is used to learn the video time series information. By learning from a large amount of normal image data, the model can discriminate abnormal events when classification outliers appear during real-time monitoring. In order to solve the misjudge problems caused by changes in light and shadow or changes in people and traffic flow at different times for outdoor surveillance cameras and to improve the accuracy of abnormal events detection, this study uses transfer learning to retain previous training weights and to train the model further 2 epochs using up-to-date video data. Compared with the model trained without transfer learning, the transferred model has the performance significantly improved, which proves that this method can indeed improve the accuracy of the model for abnormal events detection in different time periods.

    摘要 I Extended Abstract II 致謝 X 目錄 XI 表目錄 XIV 圖目錄 XV 第一章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的 1 1.3 研究方法 2 1.4 研究貢獻 2 第二章 文獻探討 3 2.1 異常偵測 3 2.2 非監督式學習 3 2.3 Convolutional Autoencoder 4 2.4 長短期記憶神經網路 (LSTM) 5 第三章 系統設計與模型架構 7 3.1 系統設計 7 3.2 系統架構 7 3.3 系統流程圖 8 3.2 預處理 10 3.3 Autoencoder 10 3.4 Spatial Autoencoder 10 3.5 Convolutional LSTM 11 3.6 模型架構 14 3.7 遷移學習 15 3.8 LOSS計算 15 3.9 系統實際應用 15 第四章 實驗設計與結果 16 4.1 資料集 16 4.2 訓練參數 17 4.3 評估工具 18 4.3.1 Thresholding 18 4.3.2 事件計算 20 4.4 實驗結果與討論 21 4.4.1 實驗結果 21 4.4.2 討論 34 第五章 結論與未來展望 36 5.1 結論 36 5.2 未來展望 36 參考文獻 37

    1. Grauman, J. Kim and K. Observe locally, infer globally: a space-time mrf for decting abnormal activities with incremental updates. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2009, p. 2.
    2. Mahmudul HasanChoi, Jan Neumann, Amit K. Roy-Chowdhury, Larry SJonghyun. Learning Temporal Regularity in Video Sequences. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016年.
    3. K. -W Cheng-T. Chen, and W.-H. FangY. Video anomaly detection and localization using hierarchical feature representation and gaussian process regression. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015年.
    4. C. LuShi and J. JiaJ. Abnormal dection at 150 fps in matlab. IEEE International Comference on Computer Vision (ICCV). 2013年, 頁 1,2,6,7,8.
    5. B. ZhaoFei-Fei, and E. p. XingL. Online detection of unusual events in videos via dynamic sparse coding. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2011年.
    6. N. VaswaniK. Roy-Chowdhury, and R. ChellappaA. “Shape Activity”: A Continuous-State HMM for Moving/Deforming Shapes With Application to Abnormal Activity Detection. IEEE Transactions on Image Processing (TIP). 2005年.
    7. T. XiaoZhang, and H. ZhaC. Learning to detect anomalies in surveillance video. Signal Processing Letters, IEEE. 2015年.
    8. Y. ZhuM. Nayak, and A. K. Roy-ChowdhuryN. Contextaware activity recognition and anomaly detection in video. Selected Topics in Signal Processing, IEEE Journal ofvol. 7, no. 1. 2013年.
    9. M. SabokrouFathy, M. Hoseini, and R. KletteM. Real time anomaly detection and localization in crowded scenes. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2015年.
    10. D. XuRicci, Y. Yan, J. Song, and N. SebeE. Learning deep representations of appearance and motion for anomalous event detection. The British Machine Vision Conference (BMVC). 2015年.
    11. Q. V. LeY. Zou, S. Y. Yeung, and A. Y. NgW. Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2011年.
    12. H. JhuangSerre, L. Wolf, and T. PoggioT. A biologically inspired system for action recognition. IEEE International Comference on Computer Vision (ICCV). 2007年.
    13. G. W. TaylorFergus, Y. LeCun, and C. BreglerR. Convolutional learning of spatiotemporal features. European Conference on Computer Vision (ECCV). 2010年.
    14. A. GravesMohamed, and G. HintonA.-R. Speech recognition with deep recurrent neural networks. IEEE The international Conference on Acoustics, Speech, & Signal Processing (ICASSP). 2013年.
    15. J. DonahueA. Hendricks, S. Guadarrama, M. Rohrbach,S. Venugopalan, K. Saenko, and T. DarrellL. Long-term recurrent convolutional networks for visual recognition and description. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015年.
    16. Y. DuWang, and L. WangW. Hierarchical recurrent neural network for skeleton based action recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015年.
    17. M. RanzatoSzlam, J. Bruna, M. Mathieu, R. Collobert, and S. ChopraA. Video (language) modeling: a baseline for generative models of natural videos. Arxiv. 2014年.
    18. A. S. RazavianAzizpour, J. Sullivan, and S. CarlssonH. CNN features off-the-shelf: an astounding baseline forecognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2014年.
    19. T. DeanRuzon, M. Segal, J. Shleps, S. Vijayanarasimhan, J. YagnikM. Fast, accurate detection of 100,000 object classes on a single machine. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2013年.
    20. S. RenHe, R. Girshick, and J. SunA. Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks. Neural Information Processing Systems(NIPS). 2015年.
    21. LaptevVu、 Anton Osokin、IvanTuan-Hung. Context-aware CNNs for person head detection. IEEE International Conference on Computer Vision (ICCV). 2015年, 頁 2893-2901.
    22. Karen SimonyanZissermanAndrew. Two-Stream Convolutional Networks for Action Recognition in Videos. Advances in Neural Information Processing System. 2014年, 頁 568-576.
    23. Du TranBourdev, Lubomir Bourdev, Lubomir Bourdev, Manohar Paluri, Facebook AI Research, 2Dartmouth CollegeLubomir. Learning Spatiotemporal Features with 3D Convolutional Network. IEEE International Conferance on Computer Vision (ICCV). 2015年12月, 頁 4489-4497.
    24. G. W. TaylorFergus, Y. LeCun, and C. BreglerR. Convolutional learning of spatiotemporal features. European Conference on Computer Vision (ECCV). 2010年.
    25. J. ZhaoMathieu, R. Goroshin, and Y. LecunM. Stacked What-Where Auto-encoders. International Conference on Learning Representations (ICLR). 2016年.
    26. H. NohHong, and B. HanS. Learning Deconvolution Network for Semantic Segmentation. IEEE International Conferance on Computer Vision (ICCV). 2015年.
    27. ChongTay, Y.HY.S.,. Abnormal Event Detection in Videos Using Spatiotemporal Autoencoder. Advances in Neural Networks. 2017年.
    28. ShiChen, Z., Wang, H. Yeung, D.Y.,Wong, W.k., Woo, W.c.X.,. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. Neural Information Processing Systems(NIPS). 2015年.
    29. Viorica PatrauceanHanda & Roberto CipollaAnkur. Spatio-temporal video autoencoder with differentiable memory. International Conference on Learning Representations (ICLR). 2016年.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE