成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	蔡宗憲 Tsai, Tzung-Shian
論文名稱：	利用資訊檢索方式於惡意程式分類之研究 Using Information Retrieval Approach for Malware Classification
指導教授：	楊竹星 Yang, Chu-Sing
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering
論文出版年：	2014
畢業學年度：	102
語文別：	中文
論文頁數：	39
中文關鍵詞：	資訊檢索、惡意程式分類、API資訊、TF-IDF
外文關鍵詞：	information retrieval, API, TF-IDF, malware classification
相關次數：	點閱：201 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

現今，因為駭客可以利用自動化的生產工具，快速地生成新的惡意程式，導致惡意程式的數量正在快速增長。為了提升偵測惡意軟體的效能，如何有效的識別惡意程式為已知的或是新的變形種類，已經成為資訊安全領域的關鍵問題。

這些變種的惡意程式多數具有相似的原始碼，並在引發威脅時，對於電腦具有相似的惡意行為以及特徵，在本論文當中，將提出一個可以有效分類惡意程式的方法，主要藉由資訊檢索(Information Retrieval)的方式來進行惡意程式分類。

首先，將訓練樣本逐一送入動態分析器-Cuckoo Sandbox執行，取得樣本在執行期間對系統呼叫的API （Application Programming Interface）資訊，包含了三個部分：函式名稱、參數名稱以及參數值，並將各個樣本的API呼叫資訊儲存成檔案。接著對於測試樣本執行相同之程序，再使用TF-IDF（Term Frequency-Inverse Document Frequency）演算法，基於個別的API呼叫資訊，將測試樣本與所有訓練樣本轉化為向量表示，此向量即描述了惡意程式的行為特徵，用於比對測試樣本與各個訓練樣本的行為相似程度。最後利用相似程度擷取出與測試樣本最為相似的所屬分類，達到分類的目的。

本論文最後將以實驗呈現實作之分類結果，探討IR-Base分類方法在分類上所能達到的準確程度。

In this paper, we propose an Information Retrieval (IR) approach to classify malware samples into known malware families.
First, each training sample will be sent into a dynamic analyzer tool – cuckoo sandbox to obtain the information of API (Application Programming Interface) calls which are called by sample. Every system call consists of three parts: function name, parameter name and parameter value. At the retrieval phase, perform the same procedure with the testing sample. Then, use TF-IDF (Term Frequency-Inverse Document Frequency) algorithm to model the test sample and all training samples as vector representation based on the API call information. This vector describes the behavioral characteristics of malware and is used to compare the similarity of behavior. Finally, find the malware category by retrieving the most similar family to achieve the purpose of malware classification.

摘要	II
English Abstract    III
誌謝	VIII
圖目錄	XI
表目錄	XI
第一章	緒論	1
1	研究背景	1
2	研究動機與目的	2
3	論文架構	2
第二章	相關研究	4
1	惡意程式	4
1.1	惡意程式的行為	4
1.2	惡意程式的種類	5
1.3	惡意程式常用的迴避技術	6
2	惡意程式分類之相關研究	9
3	惡意程式行為分析的方式	10
3.1	靜態分析（Static Analysis）	10
3.2	動態分析（Dynamic Analysis）	12
3.3	靜態分析與動態分析之比較	13
4	API（Application Programming Interface）	14
4.1	Windows API 之常用類別	15
4.2	Windows API 的內容	15
5	資訊檢索（Information Retrieval）	16
5.1	資訊檢索之常用模組	17
5.2	餘絃相似度（Cosine Similarity）	17
6	分類的效能評估	18
6.1	精確率（precision）	19
6.2	召回率（Recall）	20
6.3	準確率（Accuracy）	20
6.4	F度量（F-Measure）	20
7	TF-IDF	21
7.1	字詞頻率（Term Frequency）	22
7.2	逆向文件頻率（Inverse Document Frequency）	22
第三章	實作架構	23
1	執行分類之流程	23
2	執行分類之架構	24
2.1	Malware Corpus Construction	24
2.2	Behavior Analysis Generation	25
2.3	Behavioral Feature Profiling	26
2.4	Irrelevance Reduction	27
2.5	Family Classification	29
第四章	實驗結果	30
1.1	有無利用Irrelevance Reduction之實驗結果	31
1.2	與其他分類方法之比較	33
第五章	結論與未來工作	36
參考文獻	37

                                    

[1]McAfee Labs, McAfee Threats Report: Fourth Quarter 2013, http://www.mcafee.com/us/resources/reports/rp-quarterly-threat-q2-2013.pdf.
[2]NTT Com Security, http://www.nttcomsecurity.com/.
[3]Cuckoo Sandbox, http://www.cuckoosandbox.org/.
[4]Malware definition, http://www.techterms.com/definition/malware.
[5]Malware wiki, http://en.wikipedia.org/wiki/Malware.
[6]Windows api index, http://msdn.microsoft.com/.
[7]Information retrieval wiki,
http://en.wikipedia.org/wiki/Information_retrieval.
[8]餘絃相似度cosine similarity , http://terms.naer.edu.tw/detail/1679004/?index=4.
[9]分類器評價、混淆矩陣與ROC曲線,
http://www.zhizhihu.com/html/y2011/3126.html.
[10]Precision and recall wiki,
http://en.wikipedia.org/wiki/Precision_and_recall.
[11]Tf-idf wiki, http://zh.wikipedia.org/zh-tw/TF-IDF.
[12]Dionaea-catch bugs, http://dionaea.carnivore.it.
[13]Thug-Python low-interaction honey client, https://github.com/buffer/thug.
[14]ClamAV Antivirus, http://www.clamav.net/lang/en/.
[15]WEKA tool, http://www.cs.waikato.ac.nz/ml/weka/index.html.
[16]Khalid Mohamed Abdelrahman Y Alzarooni, “Malware Variant Detection,＂March 2012.
[17]J. Zico Kolter and Marcus A. Maloof “Learning to detect and classify malicious executables in the wild,” Journal of Machine Learning Research, 7, 2721-2744, 2006.
[18]Konrad Rieck, Thorsten Holz, Carsten Willems, Patrick D¨ussel and Pavel Laskov, “Learning and classification of malware behavior,” Detection of Intrusions and Malware, and Vulnerability Assessment. Springer Berlin Heidelberg, 108-125, 2008.
[19]Al Amro, Sulaiman, and Antonio Cau. “Behavioral api based virus analysis and detection,” International Journal of Computer Science and Information Security, Vol. 10, No. 5, 2012.
[20]Ronghua T, Batten L, Islam R and Versteeg S, “An automated classification system based on the strings of trojan and virus families,” International Conference on Malicious and Unwanted Software (MALWARE), 2009.
[21]Yanfang Ye, Lifei Chen, Dingding Wang, Tao Li, Qingshan Jiang, and Min Zhao, “SBMDS: An interpretable string based malware detection system using SVM ensemble with bagging,” Journal in Computer Virology, Volume 5, Issue 4, 283-293, November 2009.
[22]Ronghua Tian, Rafiqul Islam and Lynn Batten, “Differentiating Malware from Cleanware Using Behavioural Analysis,” 2010.
[23]Igor Santos, Felix Brezo, Javier Nieves, Yoseba K. Penya, Borja Sanz, Carlos Laorden and Pablo G. Bringas, “Idea: Opcode-sequence-based malware detection, ” Engineering Secure Software and Systems, 2010.
[24]Christodorescu Mihai, Somesh Jha and Christopher Kruegel, “Mining specifications of malicious behavior,” Proceedings of the 1st India software engineering conference. ACM, 2008.
[25]Orestis Kostakis and Joris Kinable, “Malware Classification based on Call Graph Clustering,” 2010.
[26]Manuel Egele, Theodoor Scholte, Engin Kirda and Christopher Kruegek, “A survey on automated dynamic malware-analysis techniques and tools,” ACM Computing Surveys (CSUR), 2012.
[27]Hu, Xin, “Large-Scale Malware Analysis, Detection, and Signature Generation,” 2011.
[28]Mojtaba Eskandari, Zeinab Khorshidpur and Sattar Hashemi, “To Incorporate Sequential Dynamic Features in Malware Detection Engines,” European Intelligence and Security Informatics Conference, 2012.
[29]Father, Holy, “Hooking Windows API-Technics of hooking API functions on Windows,” The CodeBreakers-Journal, Vol. 1, No. 2, 2004.
[30]Faraz Ahmed, Haider Hameed, M. Zubair Shafiq and Muddassar Farooq, “Using Spatio-Temporal Information in API Calls with Machine Learning Algorithms for Malware Detection,” November 9, 2009.
[31]Salton, Gerard and Michael J. McGill, “Introduction to modern information retrieval,” 1986.

校外：不公開電子論文及紙本論文均尚未授權公開

簡易檢索 / 詳目顯示

相關論文