簡易檢索 / 詳目顯示

研究生: 林殿智
Lin, Tien-Chih
論文名稱: 基於事件關聯與機器學習之事件日誌自動分析系統設計與實作
Design and Implementation of an Automated Event Log Analysis System based on Event Correlation and Machine Learning
指導教授: 楊竹星
Yang, Chu-Sing
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 75
中文關鍵詞: 惡意程式分類機器學習事件關聯行為分析日誌分析Sysmon
外文關鍵詞: Malware Classification, Machine Learning, Event Correlation, Behavior Analysis, Log Mining, Sysmon
相關次數: 點閱:119下載:13
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 過去惡意程式流行將所有惡意功能集中於單個執行檔內,因此防毒軟體可以依據單 支程式是否涵蓋多種敏感功能做為惡意程式判斷的依據。為了降低防毒軟體的戒心, 現今的駭客會讓諸多惡意行為分散由不同的程式執行,如利用 Dropper、Decryptor、 Injector 等等的分工方式分散被偵測之風險,因此以檔案或是程式為分析單位的防禦 方式已經沒有辦法有效的偵測惡意程式。本論文將透過關聯法則尋找具有關聯性之 程序,以整體事件做為分析單位,以更全局的視角去觀察系統內部是否具有惡意行 為。搭配機器學習達到自動化分析事件日誌,平均一個端點一天的日誌量只需要 5 分鐘即可完成關聯與分析,並且善惡分類的 F1-score 分數達到 99%,惡意程式種類 分類的 F1-score 也有著 82 % 的表現。

    In the past, malware used to integrate multiple malicious functions inside the one ex- ecutable. So that if there are lots of suspicious functions inside an executable, the an- tivirus will say it was a malware with high confidence. In order to reduce the attention of anti-virus, hackers separate malicious functions to different processes, such as divide the work in Dropper, Decryptor, Injector, etc. Using a file or a process as the unit to view system security, there will a lot of malicious behavior be ignored. The system proposed in this paper based on event correlation and machine learning classification to understand the behavior of the process on a more comprehensive view and figure out the malicious behavior. The automated analysis of the event log just cost 5 minutes per endpoint every day. Then, the F1-score of binary classification is 99%, and the F1-score of multiclass classification with malware type is 82%.

    摘要(i) 英文延伸摘要(ii) 誌謝(viii) Table of Contents(x) List of Tables(xii) List of Figures(xiii) List of Listings(xiv) Chapter 1. 緒論(1) 1.1. 研究背景(1) 1.2. 研究動機(2) 1.3. 論文架構(3) Chapter 2. 相關研究(4) 2.1. 靜態分析與動態分析(4) 2.1.1. 靜態分析(4) 2.1.2. 動態分析(7) 2.2. Sysmon(10) 2.3. 機器學習相關演算法(17) 2.3.1.機器學習—監督式學習(17) 2.3.2. 機器學習—非監督式學習(18) 2.3.3. 特徵萃取與特徵解釋(19) Chapter 3. 系統架構與提出方法(21) 3.1.系統概觀(21) 3.2. 端點(EndPoint)與資料庫(Elasticsearch)(23) 3.3. 後端分析平台(Back-end Classification Platform)(24) 3.3.1. 事件關聯(EventCorrelation)(26) 3.3.2.特徵介紹(28) 3.3.3.模組介紹(33) 3.4.樣本標記方法(41) 3.5.模型訓練(42) 3.5.1. 模型訓練概觀(42) 3.5.2. 分群演算法的無效樣本過濾(43) 3.5.3.分類演算法的交叉驗證(44) Chapter 4. 實驗結果及分析(45) 4.1. 實驗環境(45) 4.2. 實驗樣本(46) 4.3. 評估方法(48) 4.4. 實驗結果(50) 4.4.1. tf-idf特徵萃取結果(51) 4.4.2.無效惡意程式過濾結果(52) 4.4.3. 二元分類結果(53) 4.4.4. 多元分類結果(54) 4.4.5. 探討特徵對準確率之影響(59) 4.4.6.探討事件關聯對準確率之影響(61) 4.4.7. 執行效能評估(62) 4.5.個案分析(63) 4.5.1.利用已知假設驗證模型合理性(63) 4.5.2.透過機器學習找到的未知特徵(66) Chapter 5. 結論(68) 5.1.研究貢獻(68) 5.2.未來展望(69) References(71)

    [1] K. Schwab, The fourth industrial revolution. Currency, 2017.
    [2] E. H. Spafford, “Computer viruses–a form of artificial life?,” 1990.
    [3] yara, “YARA - The pattern matching swiss knife for malware researchers.” https://virustotal.github.io/yara/.
    [4] R. Lyda and J. Hamrock, “Using entropy analysis to find encrypted and packed mal- ware,” IEEE Security & Privacy, vol. 5, no. 2, pp. 40–45, 2007.
    [5] I. Santos, F. Brezo, J. Nieves, Y. K. Penya, B. Sanz, C. Laorden, and P. G. Bringas, “Idea: Opcode-sequence-based malware detection,” in International Symposium on En- gineering Secure Software and Systems, pp. 35–43, Springer, 2010.
    [6] D. K. S. Reddy, S. K. Dash, and A. K. Pujari, “New malicious code detection using variable length n-grams,” in International Conference on Information Systems Security, pp. 276–288, Springer, 2006.
    [7] D.K.S.ReddyandA.K.Pujari,“N-gramanalysisforcomputervirusdetection,”Jour- nal in Computer Virology, vol. 2, no. 3, pp. 231–239, 2006.
    [8] I.Santos,Y.K.Penya,J.Devesa,andP.G.Bringas,“N-grams-basedfilesignaturesfor malware detection.,” ICEIS (2), vol. 9, pp. 317–320, 2009.
    [9] K. Huang, Y. Ye, and Q. Jiang, “Ismcs: an intelligent instruction sequence based malware categorization system,” in 2009 3rd International Conference on Anti- counterfeiting, Security, and Identification in Communication, pp. 509–512, IEEE, 2009.
    [10] K. S. Han, B. Kang, and E. G. Im, “Malware classification using instruction frequen- cies,” in Proceedings of the 2011 ACM Symposium on Research in Applied Computa- tion, pp. 298–300, ACM, 2011.
    [11] M. Alazab, S. Venkataraman, and P. Watters, “Towards understanding malware be- haviour by the extraction of api calls,” in 2010 Second Cybercrime and Trustworthy Computing Workshop, pp. 52–59, IEEE, 2010.
    [12] M. K. Shankarapani, S. Ramamoorthy, R. S. Movva, and S. Mukkamala, “Malware detection using assembly and api call sequences,” Journal in computer virology, vol. 7, no. 2, pp. 107–119, 2011.
    [13] X. Wang and S. M. Yiu, “A multi-task learning model for malware classification with useful file access pattern from api call sequence,” arXiv preprint arXiv:1610.05945, 2016.
    [14] J.-Y. Xu, A. H. Sung, P. Chavez, and S. Mukkamala, “Polymorphic malicious exe- cutable scanner by api sequence analysis,” in Fourth International Conference on Hy- brid Intelligent Systems (HIS’04), pp. 378–383, IEEE, 2004.
    [15] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural net- works,” in Advances in neural information processing systems, pp. 3104–3112, 2014.
    [16] A. D. Bolton and C. M. Anderson-Cook, “Apt malware static trace analysis through bigrams and graph edit distance,” Statistical Analysis and Data Mining: The ASA Data Science Journal, vol. 10, no. 3, pp. 182–193, 2017.
    [17] A. Moser, C. Kruegel, and E. Kirda, “Limits of static analysis for malware detection,” in Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007), pp. 421–430, IEEE, 2007.
    [18] M. Egele, T. Scholte, E. Kirda, and C. Kruegel, “A survey on automated dynamic malware-analysis techniques and tools,” ACM computing surveys (CSUR), vol. 44, no. 2, p. 6, 2012.
    [19] O. Yuschuk, “Ollydbg,” http://www. ollydbg. de/, 2007.
    [20] t. D. N. H. mrexodia, Sigma and torusrxxx, “x64dbg,” https://x64dbg.com/.
    [21] VMware, “VMware.” https://www.vmware.com.
    [22] O. Corporation, “VirtualBox.” https://www.virtualbox.org/.
    [23] C. Guarnieri, A. Tanasi, J. Bremer, and M. Schloesser, “The cuckoo sandbox.” https://cuckoosandbox.org, 2012.
    [24] any run, “ANY.RUN - Interactive Online Malware Sandbox.” https://any.run/.
    [25] C. Willems, T. Holz, and F. Freiling, “Toward automated dynamic malware analysis
    using cwsandbox,” IEEE Security & Privacy, vol. 5, no. 2, pp. 32–39, 2007.
    [26] T.Bläsing,L.Batyuk,A.-D.Schmidt,S.A.Camtepe,andS.Albayrak,“Anandroidap- plication sandbox system for suspicious software detection,” in 2010 5th International Conference on Malicious and Unwanted Software, pp. 55–62, IEEE, 2010.
    [27] R. Tian, R. Islam, L. Batten, and S. Versteeg, “Differentiating malware from cleanware using behavioural analysis,” in 2010 5th international conference on malicious and unwanted software, pp. 23–30, IEEE, 2010.
    [28] Y. Ki, E. Kim, and H. K. Kim, “A novel approach to detect malware based on api call sequence analysis,” International Journal of Distributed Sensor Networks, vol. 11, no. 6, p. 659101, 2015.
    [29] Y. Qiao, Y. Yang, L. Ji, and J. He, “Analyzing malware by abstracting the frequent itemsets in api call sequences,” in 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, pp. 265–270, IEEE, 2013.
    [30] Y. Qiao, Y. Yang, J. He, C. Tang, and Z. Liu, “Cbm: free, automatic malware analy- sis framework using api call sequences,” in Knowledge engineering and management, pp. 225–236, Springer, 2014.
    [31] Y. Park, D. Reeves, V. Mulukutla, and B. Sundaravel, “Fast malware classification by automated behavioral graph matching,” in Proceedings of the Sixth Annual Workshop on Cyber Security and Information Intelligence Research, p. 45, ACM, 2010.
    [32] L. Nataraj, V. Yegneswaran, P. Porras, and J. Zhang, “A comparative assessment of malware classification using binary texture analysis and dynamic analysis,” in Pro- ceedings of the 4th ACM Workshop on Security and Artificial Intelligence, pp. 21–30, ACM, 2011.
    [33] K.Rieck,P.Trinius,C.Willems,andT.Holz,“Automaticanalysisofmalwarebehavior using machine learning,” Journal of Computer Security, vol. 19, no. 3, 2011.
    [34] P. Trinius, C. Willems, T. Holz, and K. Rieck, “A malware instruction set for behavior- based analysis,” 2009.
    [35] B.Anderson,D.Quist,J.Neil,C.Storlie,andT.Lane,“Graph-basedmalwaredetection using dynamic analysis,” Journal in computer Virology, vol. 7, no. 4, pp. 247–258, 2011.
    [36] A. Dinaburg, P. Royal, M. Sharif, and W. Lee, “Ether: malware analysis via hardware virtualization extensions,” in Proceedings of the 15th ACM conference on Computer and communications security, pp. 51–62, ACM, 2008.
    [37] Y.-J. Lee and O. L. Mangasarian, “Rsvm: Reduced support vector machines,” in Pro- ceedings of the 2001 SIAM International Conference on Data Mining, pp. 1–17, SIAM, 2001.
    [38] S. Tobiyama, Y. Yamaguchi, H. Shimada, T. Ikuse, and T. Yagi, “Malware detection with deep neural network using process behavior,” in 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), vol. 2, pp. 577–582, IEEE, 2016.
    [39] S. Tobiyama, Y. Yamaguchi, H. Hasegawa, H. Shimada, M. Akiyama, and T. Yagi, “A method for estimating process maliciousness with seq2seq model,” in 2018 Interna- tional Conference on Information Networking (ICOIN), pp. 255–260, IEEE, 2018.
    [40] K. Pei, Z. Gu, B. Saltaformaggio, S. Ma, F. Wang, Z. Zhang, L. Si, X. Zhang, and D. Xu, “Hercule: Attack story reconstruction via community discovery on correlated log graph,” in Proceedings of the 32Nd Annual Conference on Computer Security Ap- plications, pp. 583–595, ACM, 2016.
    [41] S. Ma, X. Zhang, and D. Xu, “Protracer: Towards practical provenance tracing by al- ternating between logging and tainting.,” in NDSS, 2016.
    [42] Microsoft, “Process Monitor - Windows Sysinternals | Microsoft Docs.” https://docs.microsoft.com/en-us/sysinternals/downloads/procmon.
    [43] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
    [44] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep con- volutional neural networks,” in Advances in neural information processing systems, pp. 1097–1105, 2012.
    [45] X.Chen,J.Andersen,Z.M.Mao,M.Bailey,andJ.Nazario,“Towardsanunderstanding of anti-virtualization and anti-debugging behavior in modern malware,” in 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN), pp. 177–186, IEEE, 2008.
    [46] LordNoteworthy, “LordNoteworthy/al-khaser:Public malware techniques used in the wild: Virtual Machine, Emulation, Debuggers, Sandbox detection. - GitHub.” https://github.com/LordNoteworthy/al-khaser.
    [47] M. Russinovich and T. Garnier, “Sysmon - Windows Sysinternals | Microsoft Docs.” https://docs.microsoft.com/en-us/sysinternals/downloads/sysmon.
    [48] Cyb3rWard0g, “Cyb3rWard0g/OSSEM: Open Source Security Events Metadata (OS- SEM) - GitHub.” https://github.com/Cyb3rWard0g/OSSEM.
    [49] V. Mavroeidis and A. Jøsang, “Data-driven threat hunting using sysmon,” in Pro- ceedings of the 2nd International Conference on Cryptography, Security and Privacy, pp. 82–88, ACM, 2018.
    [50] M. Russinovich, “Tracking hackers on your network with sysinternals sysmon,” RSA Conference, 2016.
    [51] M. F. Wataru Matsuda, “Tracking mimikatz by sysmon and elasticsearch,” HITCON CMT, 2017.
    [52] gentilkiwi,“gentilkiwi/mimikatz:AlittletooltoplaywithWindowssecurity-GitHub.” https://github.com/gentilkiwi/mimikatz.
    [53] L. H. Daniel Bohannon, “Revoke-obfuscation: Powershell obfuscation detection using science,” Black Hat USA, 2017.
    [54] L. C. Matt Graeber, “Subverting sysmon - application of a formalized security product evasion methodology,” Black Hat USA, 2018.
    [55] T.ChenandC.Guestrin,“Xgboost:Ascalabletreeboostingsystem,”inProceedingsof the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794, ACM, 2016.
    [56] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32, 2001.
    [57] J.R.Quinlan,“Inductionofdecisiontrees,”Machinelearning,vol.1,no.1,pp.81–106,
    1986.
    [58] G.Ke,Q.Meng,T.Finley,T.Wang,W.Chen,W.Ma,Q.Ye,andT.-Y.Liu,“Lightgbm: A highly efficient gradient boosting decision tree,” in Advances in Neural Information Processing Systems, pp. 3146–3154, 2017.
    [59] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al., “A density-based algorithm for dis- covering clusters in large spatial databases with noise.,” in Kdd, vol. 96, pp. 226–231, 1996.
    [60] M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should i trust you?: Explaining the predictions of any classifier,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1135–1144, ACM, 2016.
    [61] Elastic, “Download Winlogbeat • Ship Windows Event Logs | Elastic.” https://www.elastic.co/downloads/beats/winlogbeat.
    [62] S. Banon, “Open Source Search & Analytics ·Elasticsearch | Elastic).” https://www.elastic.co/.
    [63] danielbohannon, “danielbohannon/Invoke-Obfuscation:PowerShell Obfuscation. - GitHub.” https://github.com/danielbohannon/Invoke-Obfuscation.
    [64] H. Sistemas, “Virustotal.” https://www.virustotal.com, 2004.
    [65] Microsoft, “Malware names.” https://docs.microsoft.com/en- us/windows/security/threat-protection/intelligence/malware-naming.
    [66] VMware, “Server virtualization software | vsphere | vmware.” https://www.vmware.com/products/vsphere.html.
    [67] N.C.forHigh-PerformanceComputing,“NationalCenterforHigh-PerformanceCom- puting.” https://www.nchc.org.tw/.
    [68] Microsoft, “How Microsoft identifies malware and potentially unwanted applications.” https://docs.microsoft.com/en-us/windows/security/threat- protection/intelligence/criteria.
    [69] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in International Conference on Learning Representations (ICLR), 2017.
    [70] danielbohannon, “danielbohannon/Revoke-Obfuscation: PowerShell Obfuscation Detection Framework - GitHub.” https://github.com/danielbohannon/Revoke- Obfuscation.
    [71] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidi- rectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE