| 研究生: |
石哲榮 Shih, Che-June |
|---|---|
| 論文名稱: |
深度學習於深度偽造影片檢測之應用 Application of Deep Learning for DeepFake Videos Detection |
| 指導教授: |
王明習
Wang, Ming-Shi |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 中文 |
| 論文頁數: | 54 |
| 中文關鍵詞: | 深度偽造視訊檢測 、深度學習 、深度偽造 、卷積神經網路 |
| 外文關鍵詞: | deep learning, convolution neural network, deepfake, deepfake video detection |
| 相關次數: | 點閱:85 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
目前,視訊內之人物可容易地透過相關數位偽造技術將這些人物的臉部它更換,使此視訊成為一部偽造視訊。如何利用相關技術來檢驗一部視訊是否經過偽造是個蠻重要的研究議題。本研究探討如何檢測視訊框內之臉部是否為偽造。作法如下:首先,檢測視訊框內是否存在臉部,若視訊框內有出現臉部,則進一步評估該臉部是否有被變造過。若是視訊框內沒有出現臉部,則可進一步做全視訊框之變造偵測。在本論文中,透過兩種不同的臉部檢測模型來檢知視訊框內是否存在臉部,並比較這兩種臉部檢測模型對系統在準確度的表現。另外,也考慮不同的每秒取樣幀數對系統的準確性及耗時的影響。為了讓系統能夠處理畫面為過度曝光或光線不足的情況下也可以正常進行判斷是否被偽造,分別考慮只對臉部偽造檢測的卷積神經網路和對整個畫面做全圖臉部偽造檢測的卷積神經網路,來增加系統的準確性及泛用性。透過這些方法,最終結果顯示該系統的準確性達到92.6%。
Nowadays, someone’s face in a video can easily be altered or swapped through the relevant digital forgery technology, making this video as a fake video. How to use related technologies to detect whether a video has been forged is a very important research issue. This study explores how to detect if the face in the video frame is altered. The method is as follows: First, detect whether there is a face in the video frame, if there is a face in the video frame, then to evaluate whether the face has been altered. If there is no face existed in the video frame, then further detection of changes in the full video frame can be performed. In this study, two different face detection models are applied respectively, to detect if there is a face in the video frame, and the accuracy of the system is compared by these two face detection models. In addition, the impact of different sampling frames per second on the accuracy and time-consuming of the system is also considered. In order to process these frames that is overexposed or under-lighted, it can also decide whether it has been forged normally, both the convolutional neural network that only detects faked of the face and that detects the forgery of the entire frame. Using convolutional neural network can increase the accuracy and versatility of the system. Our experimental results show that the accuracy is 92.6%.
[1]D. Michie, D.J. Spiegelhalter, and C.C. Taylor, “ Machine learning. Neural and Statistical Classification ”, Journal of the American Statistical Association,Vol. 91, No. 433, 1994, pp.436-438
[2]N.R. Gavai, Y.A. Jakhade, S.A. Tribhuvan, et al., “ MobileNets for flower classification using TensorFlow ”, 2017 International Conference on Big Data, IoT and Data Science (BID), Pune, 20-22 Dec., 2017, pp. 154-158.
[3]D. Güera, and E.J. Delp, “ Deepfake Video Detection Using Recurrent Neural Networks ”, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Auckland, New Zealand, 27-30 Nov, 2018, pp. 1-6.
[4]M. Đorđević, M. Milivojević and A. Gavrovska, “ DeepFake Video Analysis using SIFT Features ”, 2019 27th Telecommunications Forum (TELFOR), Belgrade, 2019, pp. 1-4.
[5]F. Rosenblatt, “ Perceptron Simulation Experiments ”, Proceedings of the IRE, March, 1960, pp.301-309.
[6]P. Korshunov, and S. Marcel, “ Vulnerability assessment and detection of deepfake videos ”, 2019 International Conference on Biometrics (ICB), Crete, Greece, 4-7 June, 2019, pp.1-6.
[7]I. Korshunova, W. Shi, J. Dambre, and L. Theis, “ Fast Face-swap Using Convolutional Neural Networks ”, 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 Oct., 2017, pp. 3697-3705.
[8]J. Cao, Y. Hu, B. Yu, R. He, and Z. Sun, “ 3D Aided Duet GANs for Multi-View Face Image Synthesis ”, in IEEE Transactions on Information Forensics and Security, Vol. 14, No. 8, 2019, pp.2028-2042.
[9]A. Vedaldi, M. Blaschko, and A. Zisserman, “ Learning equivariant structured output SVM regressors ”, 2011 International Conference on Computer Vision, Barcelona, 6-13 Nov., 2011, pp. 959-966.
[10]F. Schroff, D. Kalenichenko, and J. Philbin, “ FaceNet: A unified embedding for face recognition and clustering”, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June, 2015, pp. 815-823.
[11]T. Nyein, and A.N. Oo, “ University Classroom Attendance System Using FaceNet and Support Vector Machine ”, 2019 International Conference on Advanced Information Technologies (ICAIT), Yangon, Myanmar, 6-7 Nov., 2019, pp. 171-176.
[12]J.S. Chung, A. Senior, O. Vinyals, and A. Zisserman, “ Lip Reading Sentences in the Wild ”, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July, 2017, pp. 3444-3453.
[13]S. Suwajanakorn, S.M. Seitz, et al., “ Synthesizing Obama: learning lip sync from audio ”, ACM Transactions on Graphics, Vol. 36, No. 4, 2017, pp. 1-13
[14]P. Korshunov, and S. Marcel, “ Speaker inconsistency detection in tampered video ”, 2018 26th European Signal Processing Conference (EUSIPCO), Rome, 3-7 Sept., 2018, pp.2375-2379.
[15]J. Galbally, and S. Marcel, “ Face Anti-spoofing Based on General Image Quality Assessment ”, 2014 22nd International Conference on Pattern Recognition, Stockholm, 24-28 Aug., 2014, pp. 1173-1178.
[16]D. Wen, H. Han, and A.K. Jain, “ Face spoof detection with image distortion analysis ”, 2016 International Conference on Emerging Technological Trends (ICETT), Kollam, 2016, pp. 1-5.
[17]E. Sabir, J. Cheng, A. Jaiswal, W. AbdAlmageed, I.Masi, P. Natarajan, “ Recurrent convolutional strategies for face manipulation detection in videos ”, Conference on Computer Vision and Pattern Recognition Workshops, Boston, 2019, pp. 80-87.
[18]G. Huang, Z. Liu, L.V.D. Maaten, and K.Q. Weinberger, “ Densely connected convolutional networks ”, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July, 2017, pp. 2261-2269.
[19]K. Cho, B.V. Merrienboer, C. Gulcehre, et al., “ Learning phrase representations using RNN encoderdecoder for statistical machine translation ”, Conference on Empirical Methods in Natural Language Processing (EMNLP), Austin, 2014, pp. 1724-1734.
[20]A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, “ FaceForensics++: Learning to Detect Manipulated Facial Images ”, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 27 Oct.-2 Nov., 2019, pp. 1-11.
[21]S. Liu, and W. Deng, “ Very deep convolutional neural network based image classification using small training sample size ”, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Kuala Lumpur,, 3-6 Nov. 2015, pp. 730-734.
[22]K. He, X. Zhang, S. Ren, and J. Sun, “ Deep residual learning for image recognition ”, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June, 2016, pp.770-778.
[23]D.E. King, “ Dlib-ml: A Machine Learning Toolkit ” Journal of Machine Learning Research ,Vol.10 ,No. 69, 2009, pp.1755-1758.
[24]V. Bazarevsky, Y. Kartynnik, A. Vakunov, K. Raveendran, and M. Grundmann, “ BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs ”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 2019.
[25]Y. Li, S. Lyu, “ Exposing DeepFake Videos By Detecting Face Warping Artifacts ”, Computer Science Department University at Albany State University of New York, 2019.
[26]B. Dolhansky, R. Howes, B. Pflaum, N. Baram, and C.C. Ferrer, “ The Deepfake Detection Challenge(DFDC) Preview Dataset ”, arXiv:1910.08854,19 October, 2019, https://arxiv.org/abs/1910.08854 .
[27]F. Chollet, “ Xception: Deep Learning with Depthwise Separable Convolutions ”, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July, 2017, pp. 1800-1807.
[28]C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “ Rethinking the Inception Architecture for Computer Vision ”, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June, 2016, pp. 2818-2826.
[29]A. Qi, J. Wei, B. Bai. “ Research on Deep Learning Expression Recognition Algorithm Based on Multi-Model Fusion ”, 2019 International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), Taiyuan, China, 8-10 Nov., 2019, pp. 288-291.
[30]Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “ Gradient-Based Learning Applied to Document Recognition ”, in Proceedings of the IEEE, Vol. 86, No. 11, 1998, pp.2278-2324.
[31]V. Vapnik, and O. Chapelle, “ Bounds on Error Expectation for Support Vector Machines ”, Neural Computation, Vol.12, No.9, 2000, pp.2013-2036.
[32]W.S. McCulloch, and W. Pitts, “A logical calculus of the ideas immanent in nervous activity”, The bulletin of mathematical biophysics, 1943, pp.115-133.
[33]R.K. Mohapatra, B. Majhi and S.K. Jena “ Classification performance analysis of MNIST Dataset utilizing a Multi-resolution Technique ”, 2015 International Conference on Computing, Communication and Security (ICCCS), Pamplemousses, 4-5 Dec., 2015, pp. 1-5.
[34]J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, and F.F. Li, “ ImageNet: A Large-Scale Hierarchical Image Database ”, 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, 20-25 June, 2009, pp. 248-255.
[35]U. Karn, “An Intuitive Explanation of Convolutional Neural Networks”, 2016, https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
[36]N. Bodla, B. Singh, R. Chellappa, L.S. Davis, “ Soft-NMS — Improving Object Detection with One Line of Code ”, 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 Oct., 2017, pp.5562-5570.
校內:2025-09-05公開