簡易檢索 / 詳目顯示

研究生: 許凱竣
Hsu, Kai-Chun
論文名稱: 設計與實作基於關聯特徵規則之可攜式文件格式分析系統
Design and Implementation of a Malicious PDF Analysis System Based on Association Rules
指導教授: 楊竹星
Yang, Chu-Sing
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2016
畢業學年度: 104
語文別: 中文
論文頁數: 60
中文關鍵詞: 文件型惡意程式可攜式文件格式靜態分析資料探勘
外文關鍵詞: Document Malware, PDF, Static Analysis, Data Mining
相關次數: 點閱:92下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 可攜式文件格式(PDF)文件由於其便利性、跨平台執行及容易取得的等特點,在現今網路生活中已成為多數文件交流格式的最佳選擇;但PDF文件豐富的特性也令其成為網路攻擊中絕佳的犯罪工具。透過在PDF文件中夾帶惡意的程式碼或惡意的檔案,搭配上吸引人的檔名或標題,很容易令網路上的使用者們疏忽大意,成為下一位受害者。本論文提出了一個基於關聯特徵規則方式分析PDF文件的系統,以資料探勘的方式找出惡意程式碼中,特定函式或參數使用的關聯性,並且以這些關聯性當作比對文件的規則,再搭配幾項只會出現在惡意文件中的特徵,提高對文件分析的成功率。此外本論文也實作一種透過反向操作,在保留原始閱讀內容的前提下,將文件中所有可疑的部份給剔除,還給使用者一個乾淨且可以安心使用的PDF文件。

    Due to the features of cross-platform support, free viewer programs for obtain and plentiful API support, PDF (Portable Document Format) file now is a popular transmission medium on the Internet. The rich amount of API support brings users better user experiences. However, some of these APIs are still under development and therefore may contain some vulnerabilities in the PDF viewers, which provide gaps for those crackers to commit crimes in the net environments.

    The previous works on static analysis have been developed for about ten years, many effective methods have been proposed to detect the malicious PDF files. Although these methods can have good performance, they still need the “human” factor in the process of the whole system work. That is, these works need people to build the rules or to define the file types, and this makes their system can’t be fully automated.

    This research aims at finding good ways to detect these malicious PDF documents. Unlike previous researches, which use developers' experience or observation to find features for their analysis system, this research use data mining methods to trace the word associations from the collected malicious codes, and use these associations to build the rules for analyzing PDF documents. The total flow of building the rules can be finished without any human factors, and the detecting accuracy of the system with the proposed method can reach around 98% detection rate.

    摘要 I Abstract II 誌謝 VIII 目錄 X 圖目錄 XIII 表目錄 XV 1. 緒論 1 1.1 研究背景 1 1.2 研究動機 1 1.3 研究目的 2 1.4 論文架構 3 2. 相關研究 4 2.1 PDF基本架構介紹 4 2.2 PDF惡意技術介紹 5 2.2.1 搶奪系統控制權 - Heap Spray[5] 5 2.2.2 混亂偵測系統監視 - Obfuscation 6 2.3 PDF攻擊手段介紹 7 2.3.1 透過JavaScript程式碼攻擊 7 2.3.2 透過夾帶檔案進行攻擊 8 2.4 PDF分析系統介紹 9 2.4.1 靜態分析 9 2.4.2 動態分析 11 2.4.3 混合式分析 12 2.4.4 三種架構的比較 13 2.5 PDF常見攻擊方式介紹 15 2.5.1 垃圾郵件 15 2.5.2 釣魚郵件 16 2.5.3 APT網路攻擊 18 3. 系統架構 20 3.1 整體架構 20 3.2 PDF文件分析系統 20 3.2.1 文件資訊檢驗 21 3.2.2 文件結尾符號檢查 22 3.2.3 文件編碼還原 23 3.2.4 JavaScript程式碼檢驗 24 3.2.5 額外夾帶檔案檢查 30 3.3 關聯特徵 31 3.3.1 關聯特徵的概念 31 3.3.2 資料探勘用演算法 32 3.3.3 建立規則表 39 3.4 PDF安心點 41 3.4.1 執行流程 41 3.4.2 可疑標籤介紹 42 3.5 使用者介面 43 4. 實驗結果及分析 48 4.1 關聯特徵規則的門檻值測定 48 4.2資料探勘演算法效能比較 50 4.3 PDF文件分析系統效能評測 51 4.4 PDF安心點系統效能評測 54 5. 結論與未來展望 57 參考文獻 58

    [1] A. K. Sood and R. J. Enbody, "Targeted Cyberattacks: A Superset of Advanced Persistent Threats," IEEE Security & Privacy, vol.11, no.1, pp.54-61, 2013.
    [2] A. Beuhring and K. Salous, "Beyond Blacklisting: Cyber defense in the Era of Advanced Persistent Threats," IEEE Security & Privacy, vol.12, no.5, pp.90-93, 2014.
    [3] 《APT 攻擊》南韓 DarkSeoul 大規模 APT 攻擊事件FAQ, http://blog.trendmicro.com.tw/?p=4652
    [4] PDF file structure – four parts, http://www.simpopdf.com/resource/pdf-file-structure.html
    [5] Ding, Yu, et al. "Heap taichi: exploiting memory allocation granularity in heap-spraying attacks." ACM Proceedings of the 26th Annual Computer Security Applications Conference, 2010.
    [6] I. Corona, D. Maiorca, D. Ariu and G. Giacinto, "Lux0R: Detection of Malicious PDF-embedded JavaScript code through Discriminant Analysis of API References," ACM Proceedings of the 2014 ACM Workshop on Artificial Intelligence and Security, pp. 47-57, 2014.
    [7] D. Maiorca, G. Giacinto and I. Corona, "A patterns Recognition System for Malicious PDF Files Detection," Machine Learning and Data Mining in patterns Recognition, Springer Berlin Heidelberg, pp. 510-524, 2012
    [8] P. Laskov and N. Šrndić, "Static Detection of Malicious JavaScript-bearing PDF Documents," Proceedings of the 27th Annual Computer Security Applications Conference, ACM, pp.373-382, 2011.
    [9] SpiderMonkey, https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey.
    [10] Smutz, Charles, and Angelos Stavrou. "Malicious PDF detection using metadata and structural features." ACM Proceedings of the 28th Annual Computer Security Applications Conference, 2012.
    [11] Šrndic, Nedim, and Pavel Laskov. "Detection of malicious pdf files based on hierarchical document structure." Proceedings of the 20th Annual Network & Distributed System Security Symposium. 2013.
    [12] Maiorca, Davide, Igino Corona, and Giorgio Giacinto. "Looking at the bag is not enough to find the bomb: an evasion of structural methods for malicious pdf files detection." ACM Proceedings of the 8th ACM SIGSAC symposium on Information, computer and communications security, 2013.
    [13] Willems, Carsten, Thorsten Holz, and Felix Freiling. "Toward automated dynamic malware analysis using cwsandbox." IEEE Security & Privacy 2 (2007): 32-39.
    [14] Wepawet, https://wepawet.iseclab.org/about.php
    [15] F. Schmitt, J. Gassen and Gerhards-Padilla, "PDF Scrutinizer: Detecting JavaScript-based attacks in PDF documents," IEEE Privacy, Security and Trust (PST), 2012 Tenth Annual International Conference, pp.104-111, 2012.
    [16] H. Cheng, F. Yong, L. Liang and L. R. Wang, "A static detection model of malicious PDF documents based on naive Bayesian classifier technology," IEEE Wavelet Active Media Technology and Information Processing (ICWAMTIP), 2012 International Conference, pp.29-32, 2012.
    [17] T. E. Dube, R. A. Raines, M. R. Grimaila, K. W. Bauer and S. K. Rogers, "Malware Target Recognition of Unknown Threats," Systems Journal, IEEE , vol.7, no.3, pp.467-477, 2013.
    [18] C, Ulucenk, V. Varadharajan, V. Balakrishnan and U, Tupakula, "Techniques for Analysing PDF Malware," Software Engineering Conference (APSEC), 2011 18th Asia Pacific, pp.41-48, 2011.
    [19] Y. H. Choi, B. J. Han, B. C. Bae, H. G. Oh and K. W. Sohn, "Toward extracting malware features for classification using static and dynamic analysis," IEEE Computing and Networking Technology (ICCNT), 2012 8th International Conference, pp.126-129, 2012.
    [20] Z. Tzermias, G. Sykiotakis, M. Polychronakis and E. P. Markatos, "Combining Static and Dynamic Analysis for the Detection of Malicious Documents," ACM Proceedings of the Fourth European Workshop on System Security, 2011.
    [21] X. Lu, J. Zhuge, R. Wang, Y, Cao and Y, Chen, "De-obfuscation and Detection of Malicious PDF Files with High Accuracy," IEEE System Sciences (HICSS), 2013 46th Hawaii International Conference, pp.4890-4899, 2013.
    [22] Liu, Daiping, Haining Wang, and Angelos Stavrou. "Detecting malicious javascript in pdf through document instrumentation." Dependable Systems and Networks (DSN), 2014 44th Annual IEEE/IFIP International Conference, 2014.
    [23] Blackhat, http://www.imdb.com/title/tt2717822/
    [24] QPDF, http://qpdf.sourceforge.net/
    [25] PDF Tool kit, https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/
    [26] Setting up PDF Job Options File, http://www.bestprintingonline.com/job-options.htm
    [27] Apriori algorithm, http://en.wikipedia.org/wiki/Apriori_algorithm.
    [28] Han, Jiawei, Jian Pei, and Yiwen Yin. "Mining frequent patterns without candidate generation." ACM SIGMOD Record. Vol. 29. No. 2. ACM, 2000.
    [29] Contagio, http://contagiodump.blogspot.tw/.
    [30] virustotal, https://www.virustotal.com/.

    無法下載圖示 校內:2021-01-28公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE