簡易檢索 / 詳目顯示

研究生: 王藝帆
Wang, Yi-Fan
論文名稱: 結合社群媒體語義之圖誘導跨模態融合於Deepfake 偵測方法
Graph-Induced Cross-Modal Fusion with Social Media Semantics for Deepfake Detection
指導教授: 許志仲
Hsu, Chih-Chung
戴安順
Tai , An-Shun
學位類別: 碩士
Master
系所名稱: 管理學院 - 數據科學研究所
Institute of Data Science
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 60
中文關鍵詞: 深度偽造偵測多模態學習異構圖神經網絡電腦視覺
外文關鍵詞: Deepfake Detection, Multimodal Learning, Hetero-GNN, Computer Vision
相關次數: 點閱:44下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 中文摘要 I Abstract II 誌謝 VII 目錄 VIII 表目錄 XI 圖目錄 XII 第一章 緒論 1 1-1 研究動機 1 1-2 研究貢獻 3 第二章 相關研究 6 2-1 傳統 Deepfake 檢測方法 6 2-2多模態 Deepfake 檢測方法 7 2-3 基於圖神經網路 的 Deepfake 偵測方法 9 2-4 社群語言訊號作爲偵測輔助 10 2-5 多模態資料集與語言模態建構策略 10 第三章 模擬資料生成 11 3-1 資料設計動機 11 3-2 視覺異常模擬策略 11 3-3 語言生成策略設計 12 3-3.1 cue與Non-cue設計依據 12 3-4 模擬資料格式與生產流程 14 3-5 模擬資料的訓練使用策略 16 第四章 研究方法 19 4-1 方法論概述 19 4-2 多模態特徵編碼與節點表示 21 4-3 跨膜他語義圖誘導模組 22 4-3.1 圖結構語義誘導模組 22 4-3.2 結構穩定正則項 25 4-3.3 動態軟語義路由 26 4-4 異質圖神經擴散模組 26 4-5 節點注意力聚合與預測 27 4-6 損失函數 28 第五章 實驗結果 29 5-1實驗設置 29 5-1.1 資料集 29 5-1.2 樣本難度調控與動態抽樣策略 30 5-1.3 模型訓練超參數 31 5-2 模態消融實驗 31 5-3 正則項消融實驗 32 5-4 模擬資料寓意泛化驗證策略分析 33 5-5 訓練資料策略消融分析 34 結論 36 第七章未來展望 37 7-1 語義模擬策略的精細化與語境多樣性提升 37 7-2 視覺模態之現實適應性建模與雜訊容忍性擴展 37 7-3語義擴散路徑與跨模態結構控制機制之深化 38 7-4 語義結構穩定性之驗證與可視化 38 參考文獻 39

    [1] D. Afchar, V. Nozick, J. Yamagishi, and I. Echizen, "Mesonet: a compact facial video forgery detection network," in 2018 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, Dec. 2018, p. 1–7. [Online]. Available: http://dx.doi.org/10.1109/WIFS.2018.8630761
    [2] J. Bai, S. Bai, S. Yang, S. Wang, S. Tan, P. Wang, J. Lin, C. Zhou, and J. Zhou, "Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond," 2023. [Online]. Available: https://arxiv.org/abs/2308.12966
    [3] Z. Cai, S. Ghosh, A. Dhall, T. Gedeon, K. Stefanov, and M. Hayat, "Glitch in the matrix: A large scale benchmark for content driven audio–visual forgery detection and localization," Computer Vision and Image Understanding, vol. 236, p. 103818, 2023.
    [4] H. Cheng, Y. Guo, T. Wang, Q. Li, X. Chang, and L. Nie, "Voice-face homogeneity tells deepfake," ACM Transactions on Multimedia Computing, Communications and Applications, vol. 20, no. 3, pp. 1–22, 2023.
    [5] R. Chesney and D. Citron, "Deep fakes: A looming challenge for privacy, democracy, and national security," California Law Review, vol. 107, no. 1, pp. 175–199, 2019.
    [6] Y. Choi et al., "Fakeaveceleb: A novel audio-video multimodal deepfake dataset," arXiv preprint, 2021.
    [7] F. Chollet, "Xception: Deep learning with depthwise separable convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1251–1258.
    [8] K. Chugh, P. Gupta, A. Dhall, and R. Subramanian, "Not made for each other-audio-visual dissonance-based deepfake detection and localization," in Proceedings of the 28th ACM international conference on multimedia, 2020, pp. 439–447.
    [9] ——, "Not made for each other- audio-visual dissonance-based deepfake detection and localization," 2021. [Online]. Available: https://arxiv.org/abs/2005.14405
    [10] U. A. Ciftci, I. Demir, and L. Yin, "Fakecatcher: Detection of synthetic portrait videos using biological signals," IEEE Transactions on Pattern Analysis and Machine Intelligence, p. 1–1, 2024. [Online]. Available: http://dx.doi.org/10.1109/TPAMI.2020.3009287
    [11] B. Dolhansky et al., "The deepfake detection challenge (dfdc) dataset," arXiv preprint, 2020.
    [12] B. Dolhansky, J. Bitton, B. Pflaum, J. Lu, R. Howes, M. Wang, and C. C. Ferrer, "The deepfake detection challenge (dfdc) dataset," arXiv preprint arXiv:2006.07397, 2020.
    [13] Genlong Zhou and Fei Qiao, "Generating a deepfake frame: A text mining study based on reddit," KOME – An International Journal of Pure Communication Inquiry, vol. 13, no. 1, 2025. [Online]. Available: https://doi.org/10.17646/KOME.of.22
    [14] N. Gondru et al., "Explaindif: Multimodal neurosymbolic approach for explainable deepfake detection," ACM TOMM, 2023.
    [15] A. Gu and T. Dao, "Mamba: Linear-time sequence modeling with selective state spaces," 2024. [Online]. Available: https://arxiv.org/abs/2312.00752
    [16] A. Haliassos, K. Vougioukas, S. Petridis, and M. Pantic, "Lips don't lie: A generalisable and robust approach to face forgery detection," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 5039–5049.
    [17] S. Hamed, M. Ab Aziz, and M. Yaakub, "Fake news detection model on social media by leveraging sentiment analysis of news content and emotion analysis of users' comments," Sensors, vol. 23, no. 4, p. 1748, 2023.
    [18] C.-C. Hsu, S.-N. Chen, M.-H. Wu, Y.-F. Wang, C.-M. Lee, and Y.-S. Chou, "Grace: Graph-regularized attentive convolutional entanglement with laplacian smoothing for robust deepfake video detection," 2024. [Online]. Available: https://arxiv.org/abs/2406.19941
    [19] Y.-K. Hung, Y.-C. Huang, T.-Y. Su, Y.-T. Lin, L.-P. Cheng, B. Wang, and S.-H. Sun, "Simtube: Generating simulated video comments through multimodal ai and user personas," 2024. [Online]. Available: https://arxiv.org/abs/2411.09577
    [20] F. Khalid, A. Javed, H. Ilyas, A. Irtaza et al., "Dfgnn: An interpretable and generalized graph neural network for deepfakes detection," Expert Systems with Applications, vol. 222, p. 119843, 2023.
    [21] H. Khalid et al., "Av-deepfake1m: A large-scale llm-driven audio-visual deepfake dataset," arXiv preprint, 2023.
    [22] H. Khalid, S. Tariq, M. Kim, and S. S. Woo, "Fakeaveceleb: A novel audio-video multimodal deepfake dataset," arXiv preprint arXiv:2108.05080, 2021.
    [23] N. Khan, T. Nguyen, A. Bermak, and I. Khalil, "Camme: Adaptive deepfake image detection with multi-modal cross-attention," 2025. [Online]. Available: https://arxiv.org/abs/2505.18035
    [24] J. Kietzmann, L. Lee, I. P. McCarthy, and K. Kietzmann, "Deepfakes: Trick or treat?" Business Horizons, vol. 63, no. 2, pp. 135–146, 2020.
    [25] D. E. King, "Dlib-ml: A machine learning toolkit," The Journal of Machine Learning Research, vol. 10, pp. 1755–1758, 2009.
    [26] A. Kumar, Q. Xie et al., "Truth be told: Fake news detection using user reactions on reddit," in Proceedings of the 29th ACM CIKM, 2020.
    [27] Y. Li, S. Yang, W. Wang, Z. He, B. Peng, and J. Dong, "Counterfactual explanations for face forgery detection via adversarial removal of artifacts," in 2024 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2024, pp. 1–6.
    [28] Y. Li, M.-C. Chang, and S. Lyu, "In ictu oculi: Exposing ai generated fake face videos by detecting eye blinking," 2018. [Online]. Available: https://arxiv.org/abs/1806.02877
    [29] Y. Li and S. Lyu, "Exposing deepfake videos by detecting face warping artifacts," 2019. [Online]. Available: https://arxiv.org/abs/1811.00656
    [30] H. Liu, Z. Tan, Q. Chen, Y. Wei, Y. Zhao, and J. Wang, "Unified frequency-assisted transformer framework for detecting and grounding multi-modal manipulation," International Journal of Computer Vision, pp. 1–18, 2024.
    [31] T. Mittal, U. Bhattacharya, R. Chandra, A. Bera, and D. Manocha, "Emotions don't lie: An audio-visual deepfake detection method using affective cues," in Proceedings of the 28th ACM international conference on multimedia, 2020, pp. 2823–2832.
    [32] S. Mukhopadhyay et al., "Do you really mean that? content driven audio-visual deepfake dataset and multimodal method for temporal forgery localization," arXiv preprint, 2022.
    [33] H. Qi, Q. Guo, F. Juefei-Xu, X. Xie, L. Ma, W. Feng, Y. Liu, and J. Zhao, "Deeprhythm: Exposing deepfakes with attentional visual heartbeat rhythms," 2020. [Online]. Available: https://arxiv.org/abs/2006.07634
    [34] M. Qiao, R. Tian, and Y. Wang, "Towards generalizable deepfake detection with spatial-frequency collaborative learning and hierarchical cross-modal fusion," 2025. [Online]. Available: https://arxiv.org/abs/2504.17223
    [35] S. Saif, S. Tehseen, and S. S. Ali, "Fake news or real? detecting deepfake videos using geometric facial structure and graph neural network," Technological Forecasting and Social Change, vol. 205, p. 123471, 2024.
    [36] R. Shao, T. Wu, and Z. Liu, "Detecting and grounding multi-modal media manipulation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6904–6913.
    [37] K. Shiohara and T. Yamasaki, "Detecting deepfakes with self-blended images," 2022. [Online]. Available: https://arxiv.org/abs/2204.08376
    [38] ——, "Detecting deepfakes with self-blended images," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 18 720–18 729.
    [39] S. Tobuta et al., "Polyglotfake: A novel multilingual and multimodal deepfake dataset," arXiv preprint, 2024.
    [40] J. Wang, B. Liu, C. Miao, Z. Zhao, W. Zhuang, Q. Chu, and N. Yu, "Exploiting modality-specific features for multi-modal manipulation detection and grounding," in ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024, pp. 4935–4939.
    [41] J. Wang, Z. Wu, W. Ouyang, X. Han, J. Chen, Y.-G. Jiang, and S.-N. Li, "M2tr: Multi-modal multi-scale transformers for deepfake detection," in Proceedings of the 2022 international conference on multimedia retrieval, 2022, pp. 615–623.
    [42] Y. Wang, K. Yu, C. Chen, X. Hu, and S. Peng, "Dynamic graph learning with content-guided spatial-frequency relation reasoning for deepfake detection," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 7278–7287.
    [43] Y. Wang and H. Huang, "Audio–visual deepfake detection using articulatory representation learning," Computer Vision and Image Understanding, vol. 248, p. 104133, 2024.
    [44] Z. Yan, P. Sun, Y. Lang, S. Du, S. Zhang, and W. Wang, "Landmark enhanced multi-modal graph learning for deepfake video detection," CoRR, 2022.
    [45] W. Yang, X. Zhou, Z. Chen, B. Guo, Z. Ba, Z. Xia, X. Cao, and K. Ren, "Avoid-df: Audio-visual joint learning for detecting deepfake," IEEE Transactions on Information Forensics and Security, vol. 18, pp. 2015–2029, 2023.
    [46] Q. Yin, W. Lu, X. Cao, X. Luo, Y. Zhou, and J. Huang, "Fine-grained multimodal deepfake classification via heterogeneous graphs," International Journal of Computer Vision, vol. 132, no. 11, pp. 5255–5269, 2024.
    [47] R. Zhang, H. Wang, H. Liu, Y. Zhou, and Q. Zeng, "Generalized face forgery detection with self-supervised face geometry information analysis network," Applied Soft Computing, vol. 166, p. 112143, 2024.
    [48] Z. Zhang, Y. Wang, L. Cheng, Z. Zhong, D. Guo, and M. Wang, "Asap: Advancing semantic alignment promotes multi-modal manipulation detecting and grounding," in Proceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 4005–4014.
    [49] T. Zhao, X. Xu, M. Xu, H. Ding, Y. Xiong, and W. Xia, "Learning to recognize patch-wise consistency for deepfake detection," CoRR, 2020.

    無法下載圖示 校內:2026-08-01公開
    校外:2026-08-01公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE