簡易檢索 / 詳目顯示

研究生: 朱立宬
Chu, Li-Cheng
論文名稱: 使用集成學習技術應用於惡意程式分類之研究
On the Study of Malware Classification Using Ensemble Learning Techniques
指導教授: 楊竹星
Yang, Chu-Sing
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 39
中文關鍵詞: 機器學習集成學習惡意程式分類
外文關鍵詞: Machine learning, Malware classification, Ensemble learning
相關次數: 點閱:104下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 惡意程式為網路攻擊的主要威脅之一,近年惡意程式的增長急劇增加,對公司或個人的危害程度與日俱增,如何防禦惡意程式是值得研究的議題,在過去惡意程式主要依靠特徵碼比對,但隨著惡意程式數量的增長與隱匿技巧的進步特徵比對方法已不敷使用,而使用基於程式行為分析的檢測方法可以有效偵測出攻擊。
    本研究利用API呼叫序列行為特徵,設計了一個惡意程式分類方法,透過集成學習結合了卷積神經網路、N-gram與詞向量特徵,得到一惡意程式家族分類器。我們使用Mal-API-2019 開放資料集進行實驗,與原始論文實驗做為基準相比,惡意程式種類分類之準確率提升26%,而我們的實驗也證實了,我們的集成分類方法優於單一分類模型。

    Malware are one of the main threats to cyber-attacks. In recent years, the growth of malicious programs has increased sharply, and the degree of harm to companies or individuals is increasing day by day. How to defend against malicious programs is a topic worthy of study. In the past, malware mainly relied on signature comparison, but with the increase in the number of malware and concealment techniques. The advanced feature comparison method is no longer sufficient, and the use of detection methods based on program behavior analysis can effectively detect attacks.
    This research uses the behavioral characteristics of API call sequence to design a malware classification method, which combines convolutional neural network, N-gram and word vector features through ensemble learning to obtain a Malware family classifier. We use the Mal-API-2019 open data set to conduct experiments. Compared with the original paper experiment as a benchmark, the accuracy of malware classification has increased by 26%, and our experiments have also confirmed that our ensemble classification model is better than single Classification model.

    摘要 I 英文延伸摘要 II 目錄 IX 圖目錄 XI Chapter 1 緒論 1 1.1. 研究背景 1 1.2. 研究動機與目的 1 1.3. 論文貢獻 2 1.4. 論文架構 3 Chapter 2 相關研究 4 2.1. 惡意程式偵測方法 4 2.2. 卷積神經網路 6 2.3. 詞向量 9 2.4. 集成學習 11 Chapter 3 系統架構與實現 15 3.1. 系統架構 15 3.2. 資料來源 16 3.3. 特徵選擇與模型訓練 17 3.4. 預測階段 24 Chapter 4 實驗結果與分析 25 4.1. 實驗說明 25 4.2. 評估標準 25 4.3. 善惡分類實驗 27 4.4. 多元分類實驗 28 Chapter 5 結論 35 5.1. 研究結論 35 5.2. 未來研究方向 35 Chapter 6 參考資料 36

    [1] Idika, Nwokedi, and Aditya P. Mathur. "A survey of malware detection techniques." Purdue University 48.2 (2007).
    [2] Catak, Ferhat Ozgur, et al. "Deep learning based Sequential model for malware analysis using Windows exe API Calls." PeerJ Computer Science 6 (2020): e285.
    [3] Catak, Ferhat Ozgur, and Ahmet Faruk Yazı. "A benchmark API call dataset for windows PE malware classification." arXiv preprint arXiv:1905.01999 (2019).
    [4] Chumachenko, Kateryna. "Machine learning methods for malware detection and classification." (2017).
    [5] Ye, Yanfang, et al. "A survey on malware detection using data mining techniques." ACM Computing Surveys (CSUR) 50.3 (2017): 1-40.
    [6] Santos, Igor, et al. "Idea: Opcode-sequence-based malware detection." International Symposium on Engineering Secure Software and Systems. Springer, Berlin, Heidelberg, 2010.
    [7] Santos, Igor, et al. "N-grams-based File Signatures for Malware Detection." ICEIS (2) 9 (2009): 317-320.
    [8] Xu, J-Y., et al. "Polymorphic malicious executable scanner by API sequence analysis." Fourth International Conference on Hybrid Intelligent Systems (HIS'04). IEEE, 2004.
    [9] Awad, Yara, Mohamed Nassar, and Haidar Safa. "Modeling malware as a language." 2018 IEEE International Conference on Communications (ICC). IEEE, 2018.
    [10] Chandak, Aniket, Wendy Lee, and Mark Stamp. "A comparison of word2vec, hmm2vec, and pca2vec for malware classification." Malware Analysis Using Artificial Intelligence and Deep Learning. Springer, Cham, 2021. 287-320.
    [11] Damodaran, Anusha, et al. "A comparison of static, dynamic, and hybrid analysis for malware detection." Journal of Computer Virology and Hacking Techniques 13.1 (2017): 1-12.
    [12] Guarnieri, Claudio, et al. "The cuckoo sandbox.(2012)." URL https://cuckoosandbox. org (2012).
    [13] Church, Kenneth Ward. "Word2Vec." Natural Language Engineering 23.1 (2017): 155-162.
    [14] Ahmed, Faraz, et al. "Using spatio-temporal information in API calls with machine learning algorithms for malware detection." Proceedings of the 2nd ACM Workshop on Security and Artificial Intelligence. 2009.
    [15] Eskandari, Mojtaba, and Sattar Hashemi. "A graph mining approach for detecting unknown malwares." Journal of Visual Languages & Computing 23.3 (2012): 154-162.
    [16] Ki, Youngjoon, Eunjin Kim, and Huy Kang Kim. "A novel approach to detect malware based on API call sequence analysis." International Journal of Distributed Sensor Networks 11.6 (2015): 659101.
    [17] Nari, Saeed, and Ali A. Ghorbani. "Automated malware classification based on network behavior." 2013 International Conference on Computing, Networking and Communications (ICNC). IEEE, 2013.
    [18] Gupta, Sanchit, Harshit Sharma, and Sarvjeet Kaur. "Malware characterization using windows API call sequences." International Conference on Security, Privacy, and Applied Cryptography Engineering. Springer, Cham, 2016.
    [19] Pektaş, Abdurrahman, and Tankut Acarman. "Classification of malware families based on runtime behaviors." Journal of information security and applications 37 (2017): 91-100.
    [20] Han, Weijie, et al. "MalDAE: Detecting and explaining malware based on correlation and fusion of static and dynamic characteristics." Computers & Security 83 (2019): 208-233.
    [21] Ma, Qianli, et al. "Global-local mutual attention model for text classification." IEEE/ACM Transactions on Audio, Speech, and Language Processing 27.12 (2019): 2127-2139.
    [22] Y. Kim, Convolutional neural networks for sentence classification, 2014, [online] Available
    [23] Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013).
    [24] Spruyt, Vincent. "The Curse of Dimensionality in classification." Computer vision for dummies 21.3 (2014): 35-40.
    [25] LeCun, Yann, and Yoshua Bengio. "Convolutional networks for images, speech, and time series." The handbook of brain theory and neural networks 3361.10 (1995): 1995.
    [26] Boureau, Y-Lan, et al. "Learning mid-level features for recognition." 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, 2010.
    [27] Wang, Shiyao, Minlie Huang, and Zhidong Deng. "Densely connected CNN with multi-scale feature attention for text classification." IJCAI. 2018.
    [28] Krogh, Anders, and Jesper Vedelsby. "Neural network ensembles, cross validation, and active learning." Advances in neural information processing systems 7 (1995): 231-238.
    [29] https://download.cnet.com/windows/
    [30] Lallie, Harjinder Singh, et al. "Cyber security in the age of covid-19: A timeline and analysis of cyber-crime and cyber-attacks during the pandemic." Computers & Security 105 (2021): 102248.

    無法下載圖示 校內:2026-10-22公開
    校外:2026-10-22公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE