| 研究生: |
朱立宬 Chu, Li-Cheng |
|---|---|
| 論文名稱: |
使用集成學習技術應用於惡意程式分類之研究 On the Study of Malware Classification Using Ensemble Learning Techniques |
| 指導教授: |
楊竹星
Yang, Chu-Sing |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 中文 |
| 論文頁數: | 39 |
| 中文關鍵詞: | 機器學習 、集成學習 、惡意程式分類 |
| 外文關鍵詞: | Machine learning, Malware classification, Ensemble learning |
| 相關次數: | 點閱:104 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
惡意程式為網路攻擊的主要威脅之一,近年惡意程式的增長急劇增加,對公司或個人的危害程度與日俱增,如何防禦惡意程式是值得研究的議題,在過去惡意程式主要依靠特徵碼比對,但隨著惡意程式數量的增長與隱匿技巧的進步特徵比對方法已不敷使用,而使用基於程式行為分析的檢測方法可以有效偵測出攻擊。
本研究利用API呼叫序列行為特徵,設計了一個惡意程式分類方法,透過集成學習結合了卷積神經網路、N-gram與詞向量特徵,得到一惡意程式家族分類器。我們使用Mal-API-2019 開放資料集進行實驗,與原始論文實驗做為基準相比,惡意程式種類分類之準確率提升26%,而我們的實驗也證實了,我們的集成分類方法優於單一分類模型。
Malware are one of the main threats to cyber-attacks. In recent years, the growth of malicious programs has increased sharply, and the degree of harm to companies or individuals is increasing day by day. How to defend against malicious programs is a topic worthy of study. In the past, malware mainly relied on signature comparison, but with the increase in the number of malware and concealment techniques. The advanced feature comparison method is no longer sufficient, and the use of detection methods based on program behavior analysis can effectively detect attacks.
This research uses the behavioral characteristics of API call sequence to design a malware classification method, which combines convolutional neural network, N-gram and word vector features through ensemble learning to obtain a Malware family classifier. We use the Mal-API-2019 open data set to conduct experiments. Compared with the original paper experiment as a benchmark, the accuracy of malware classification has increased by 26%, and our experiments have also confirmed that our ensemble classification model is better than single Classification model.
[1] Idika, Nwokedi, and Aditya P. Mathur. "A survey of malware detection techniques." Purdue University 48.2 (2007).
[2] Catak, Ferhat Ozgur, et al. "Deep learning based Sequential model for malware analysis using Windows exe API Calls." PeerJ Computer Science 6 (2020): e285.
[3] Catak, Ferhat Ozgur, and Ahmet Faruk Yazı. "A benchmark API call dataset for windows PE malware classification." arXiv preprint arXiv:1905.01999 (2019).
[4] Chumachenko, Kateryna. "Machine learning methods for malware detection and classification." (2017).
[5] Ye, Yanfang, et al. "A survey on malware detection using data mining techniques." ACM Computing Surveys (CSUR) 50.3 (2017): 1-40.
[6] Santos, Igor, et al. "Idea: Opcode-sequence-based malware detection." International Symposium on Engineering Secure Software and Systems. Springer, Berlin, Heidelberg, 2010.
[7] Santos, Igor, et al. "N-grams-based File Signatures for Malware Detection." ICEIS (2) 9 (2009): 317-320.
[8] Xu, J-Y., et al. "Polymorphic malicious executable scanner by API sequence analysis." Fourth International Conference on Hybrid Intelligent Systems (HIS'04). IEEE, 2004.
[9] Awad, Yara, Mohamed Nassar, and Haidar Safa. "Modeling malware as a language." 2018 IEEE International Conference on Communications (ICC). IEEE, 2018.
[10] Chandak, Aniket, Wendy Lee, and Mark Stamp. "A comparison of word2vec, hmm2vec, and pca2vec for malware classification." Malware Analysis Using Artificial Intelligence and Deep Learning. Springer, Cham, 2021. 287-320.
[11] Damodaran, Anusha, et al. "A comparison of static, dynamic, and hybrid analysis for malware detection." Journal of Computer Virology and Hacking Techniques 13.1 (2017): 1-12.
[12] Guarnieri, Claudio, et al. "The cuckoo sandbox.(2012)." URL https://cuckoosandbox. org (2012).
[13] Church, Kenneth Ward. "Word2Vec." Natural Language Engineering 23.1 (2017): 155-162.
[14] Ahmed, Faraz, et al. "Using spatio-temporal information in API calls with machine learning algorithms for malware detection." Proceedings of the 2nd ACM Workshop on Security and Artificial Intelligence. 2009.
[15] Eskandari, Mojtaba, and Sattar Hashemi. "A graph mining approach for detecting unknown malwares." Journal of Visual Languages & Computing 23.3 (2012): 154-162.
[16] Ki, Youngjoon, Eunjin Kim, and Huy Kang Kim. "A novel approach to detect malware based on API call sequence analysis." International Journal of Distributed Sensor Networks 11.6 (2015): 659101.
[17] Nari, Saeed, and Ali A. Ghorbani. "Automated malware classification based on network behavior." 2013 International Conference on Computing, Networking and Communications (ICNC). IEEE, 2013.
[18] Gupta, Sanchit, Harshit Sharma, and Sarvjeet Kaur. "Malware characterization using windows API call sequences." International Conference on Security, Privacy, and Applied Cryptography Engineering. Springer, Cham, 2016.
[19] Pektaş, Abdurrahman, and Tankut Acarman. "Classification of malware families based on runtime behaviors." Journal of information security and applications 37 (2017): 91-100.
[20] Han, Weijie, et al. "MalDAE: Detecting and explaining malware based on correlation and fusion of static and dynamic characteristics." Computers & Security 83 (2019): 208-233.
[21] Ma, Qianli, et al. "Global-local mutual attention model for text classification." IEEE/ACM Transactions on Audio, Speech, and Language Processing 27.12 (2019): 2127-2139.
[22] Y. Kim, Convolutional neural networks for sentence classification, 2014, [online] Available
[23] Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013).
[24] Spruyt, Vincent. "The Curse of Dimensionality in classification." Computer vision for dummies 21.3 (2014): 35-40.
[25] LeCun, Yann, and Yoshua Bengio. "Convolutional networks for images, speech, and time series." The handbook of brain theory and neural networks 3361.10 (1995): 1995.
[26] Boureau, Y-Lan, et al. "Learning mid-level features for recognition." 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, 2010.
[27] Wang, Shiyao, Minlie Huang, and Zhidong Deng. "Densely connected CNN with multi-scale feature attention for text classification." IJCAI. 2018.
[28] Krogh, Anders, and Jesper Vedelsby. "Neural network ensembles, cross validation, and active learning." Advances in neural information processing systems 7 (1995): 231-238.
[29] https://download.cnet.com/windows/
[30] Lallie, Harjinder Singh, et al. "Cyber security in the age of covid-19: A timeline and analysis of cyber-crime and cyber-attacks during the pandemic." Computers & Security 105 (2021): 102248.
校內:2026-10-22公開