| 研究生: |
王藝帆 Wang, Yi-Fan |
|---|---|
| 論文名稱: |
結合社群媒體語義之圖誘導跨模態融合於Deepfake 偵測方法 Graph-Induced Cross-Modal Fusion with Social Media Semantics for Deepfake Detection |
| 指導教授: |
許志仲
Hsu, Chih-Chung 戴安順 Tai , An-Shun |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 數據科學研究所 Institute of Data Science |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 中文 |
| 論文頁數: | 60 |
| 中文關鍵詞: | 深度偽造偵測 、多模態學習 、異構圖神經網絡 、電腦視覺 |
| 外文關鍵詞: | Deepfake Detection, Multimodal Learning, Hetero-GNN, Computer Vision |
| 相關次數: | 點閱:44 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
[1] D. Afchar, V. Nozick, J. Yamagishi, and I. Echizen, "Mesonet: a compact facial video forgery detection network," in 2018 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, Dec. 2018, p. 1–7. [Online]. Available: http://dx.doi.org/10.1109/WIFS.2018.8630761
[2] J. Bai, S. Bai, S. Yang, S. Wang, S. Tan, P. Wang, J. Lin, C. Zhou, and J. Zhou, "Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond," 2023. [Online]. Available: https://arxiv.org/abs/2308.12966
[3] Z. Cai, S. Ghosh, A. Dhall, T. Gedeon, K. Stefanov, and M. Hayat, "Glitch in the matrix: A large scale benchmark for content driven audio–visual forgery detection and localization," Computer Vision and Image Understanding, vol. 236, p. 103818, 2023.
[4] H. Cheng, Y. Guo, T. Wang, Q. Li, X. Chang, and L. Nie, "Voice-face homogeneity tells deepfake," ACM Transactions on Multimedia Computing, Communications and Applications, vol. 20, no. 3, pp. 1–22, 2023.
[5] R. Chesney and D. Citron, "Deep fakes: A looming challenge for privacy, democracy, and national security," California Law Review, vol. 107, no. 1, pp. 175–199, 2019.
[6] Y. Choi et al., "Fakeaveceleb: A novel audio-video multimodal deepfake dataset," arXiv preprint, 2021.
[7] F. Chollet, "Xception: Deep learning with depthwise separable convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1251–1258.
[8] K. Chugh, P. Gupta, A. Dhall, and R. Subramanian, "Not made for each other-audio-visual dissonance-based deepfake detection and localization," in Proceedings of the 28th ACM international conference on multimedia, 2020, pp. 439–447.
[9] ——, "Not made for each other- audio-visual dissonance-based deepfake detection and localization," 2021. [Online]. Available: https://arxiv.org/abs/2005.14405
[10] U. A. Ciftci, I. Demir, and L. Yin, "Fakecatcher: Detection of synthetic portrait videos using biological signals," IEEE Transactions on Pattern Analysis and Machine Intelligence, p. 1–1, 2024. [Online]. Available: http://dx.doi.org/10.1109/TPAMI.2020.3009287
[11] B. Dolhansky et al., "The deepfake detection challenge (dfdc) dataset," arXiv preprint, 2020.
[12] B. Dolhansky, J. Bitton, B. Pflaum, J. Lu, R. Howes, M. Wang, and C. C. Ferrer, "The deepfake detection challenge (dfdc) dataset," arXiv preprint arXiv:2006.07397, 2020.
[13] Genlong Zhou and Fei Qiao, "Generating a deepfake frame: A text mining study based on reddit," KOME – An International Journal of Pure Communication Inquiry, vol. 13, no. 1, 2025. [Online]. Available: https://doi.org/10.17646/KOME.of.22
[14] N. Gondru et al., "Explaindif: Multimodal neurosymbolic approach for explainable deepfake detection," ACM TOMM, 2023.
[15] A. Gu and T. Dao, "Mamba: Linear-time sequence modeling with selective state spaces," 2024. [Online]. Available: https://arxiv.org/abs/2312.00752
[16] A. Haliassos, K. Vougioukas, S. Petridis, and M. Pantic, "Lips don't lie: A generalisable and robust approach to face forgery detection," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 5039–5049.
[17] S. Hamed, M. Ab Aziz, and M. Yaakub, "Fake news detection model on social media by leveraging sentiment analysis of news content and emotion analysis of users' comments," Sensors, vol. 23, no. 4, p. 1748, 2023.
[18] C.-C. Hsu, S.-N. Chen, M.-H. Wu, Y.-F. Wang, C.-M. Lee, and Y.-S. Chou, "Grace: Graph-regularized attentive convolutional entanglement with laplacian smoothing for robust deepfake video detection," 2024. [Online]. Available: https://arxiv.org/abs/2406.19941
[19] Y.-K. Hung, Y.-C. Huang, T.-Y. Su, Y.-T. Lin, L.-P. Cheng, B. Wang, and S.-H. Sun, "Simtube: Generating simulated video comments through multimodal ai and user personas," 2024. [Online]. Available: https://arxiv.org/abs/2411.09577
[20] F. Khalid, A. Javed, H. Ilyas, A. Irtaza et al., "Dfgnn: An interpretable and generalized graph neural network for deepfakes detection," Expert Systems with Applications, vol. 222, p. 119843, 2023.
[21] H. Khalid et al., "Av-deepfake1m: A large-scale llm-driven audio-visual deepfake dataset," arXiv preprint, 2023.
[22] H. Khalid, S. Tariq, M. Kim, and S. S. Woo, "Fakeaveceleb: A novel audio-video multimodal deepfake dataset," arXiv preprint arXiv:2108.05080, 2021.
[23] N. Khan, T. Nguyen, A. Bermak, and I. Khalil, "Camme: Adaptive deepfake image detection with multi-modal cross-attention," 2025. [Online]. Available: https://arxiv.org/abs/2505.18035
[24] J. Kietzmann, L. Lee, I. P. McCarthy, and K. Kietzmann, "Deepfakes: Trick or treat?" Business Horizons, vol. 63, no. 2, pp. 135–146, 2020.
[25] D. E. King, "Dlib-ml: A machine learning toolkit," The Journal of Machine Learning Research, vol. 10, pp. 1755–1758, 2009.
[26] A. Kumar, Q. Xie et al., "Truth be told: Fake news detection using user reactions on reddit," in Proceedings of the 29th ACM CIKM, 2020.
[27] Y. Li, S. Yang, W. Wang, Z. He, B. Peng, and J. Dong, "Counterfactual explanations for face forgery detection via adversarial removal of artifacts," in 2024 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2024, pp. 1–6.
[28] Y. Li, M.-C. Chang, and S. Lyu, "In ictu oculi: Exposing ai generated fake face videos by detecting eye blinking," 2018. [Online]. Available: https://arxiv.org/abs/1806.02877
[29] Y. Li and S. Lyu, "Exposing deepfake videos by detecting face warping artifacts," 2019. [Online]. Available: https://arxiv.org/abs/1811.00656
[30] H. Liu, Z. Tan, Q. Chen, Y. Wei, Y. Zhao, and J. Wang, "Unified frequency-assisted transformer framework for detecting and grounding multi-modal manipulation," International Journal of Computer Vision, pp. 1–18, 2024.
[31] T. Mittal, U. Bhattacharya, R. Chandra, A. Bera, and D. Manocha, "Emotions don't lie: An audio-visual deepfake detection method using affective cues," in Proceedings of the 28th ACM international conference on multimedia, 2020, pp. 2823–2832.
[32] S. Mukhopadhyay et al., "Do you really mean that? content driven audio-visual deepfake dataset and multimodal method for temporal forgery localization," arXiv preprint, 2022.
[33] H. Qi, Q. Guo, F. Juefei-Xu, X. Xie, L. Ma, W. Feng, Y. Liu, and J. Zhao, "Deeprhythm: Exposing deepfakes with attentional visual heartbeat rhythms," 2020. [Online]. Available: https://arxiv.org/abs/2006.07634
[34] M. Qiao, R. Tian, and Y. Wang, "Towards generalizable deepfake detection with spatial-frequency collaborative learning and hierarchical cross-modal fusion," 2025. [Online]. Available: https://arxiv.org/abs/2504.17223
[35] S. Saif, S. Tehseen, and S. S. Ali, "Fake news or real? detecting deepfake videos using geometric facial structure and graph neural network," Technological Forecasting and Social Change, vol. 205, p. 123471, 2024.
[36] R. Shao, T. Wu, and Z. Liu, "Detecting and grounding multi-modal media manipulation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6904–6913.
[37] K. Shiohara and T. Yamasaki, "Detecting deepfakes with self-blended images," 2022. [Online]. Available: https://arxiv.org/abs/2204.08376
[38] ——, "Detecting deepfakes with self-blended images," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 18 720–18 729.
[39] S. Tobuta et al., "Polyglotfake: A novel multilingual and multimodal deepfake dataset," arXiv preprint, 2024.
[40] J. Wang, B. Liu, C. Miao, Z. Zhao, W. Zhuang, Q. Chu, and N. Yu, "Exploiting modality-specific features for multi-modal manipulation detection and grounding," in ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024, pp. 4935–4939.
[41] J. Wang, Z. Wu, W. Ouyang, X. Han, J. Chen, Y.-G. Jiang, and S.-N. Li, "M2tr: Multi-modal multi-scale transformers for deepfake detection," in Proceedings of the 2022 international conference on multimedia retrieval, 2022, pp. 615–623.
[42] Y. Wang, K. Yu, C. Chen, X. Hu, and S. Peng, "Dynamic graph learning with content-guided spatial-frequency relation reasoning for deepfake detection," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 7278–7287.
[43] Y. Wang and H. Huang, "Audio–visual deepfake detection using articulatory representation learning," Computer Vision and Image Understanding, vol. 248, p. 104133, 2024.
[44] Z. Yan, P. Sun, Y. Lang, S. Du, S. Zhang, and W. Wang, "Landmark enhanced multi-modal graph learning for deepfake video detection," CoRR, 2022.
[45] W. Yang, X. Zhou, Z. Chen, B. Guo, Z. Ba, Z. Xia, X. Cao, and K. Ren, "Avoid-df: Audio-visual joint learning for detecting deepfake," IEEE Transactions on Information Forensics and Security, vol. 18, pp. 2015–2029, 2023.
[46] Q. Yin, W. Lu, X. Cao, X. Luo, Y. Zhou, and J. Huang, "Fine-grained multimodal deepfake classification via heterogeneous graphs," International Journal of Computer Vision, vol. 132, no. 11, pp. 5255–5269, 2024.
[47] R. Zhang, H. Wang, H. Liu, Y. Zhou, and Q. Zeng, "Generalized face forgery detection with self-supervised face geometry information analysis network," Applied Soft Computing, vol. 166, p. 112143, 2024.
[48] Z. Zhang, Y. Wang, L. Cheng, Z. Zhong, D. Guo, and M. Wang, "Asap: Advancing semantic alignment promotes multi-modal manipulation detecting and grounding," in Proceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 4005–4014.
[49] T. Zhao, X. Xu, M. Xu, H. Ding, Y. Xiong, and W. Xia, "Learning to recognize patch-wise consistency for deepfake detection," CoRR, 2020.
校內:2026-08-01公開