簡易檢索 / 詳目顯示

研究生: 楊曉蕙
Yang, Hsiao-Hui
論文名稱: 用具多項式模型與特徵萃取之簡易貝氏分類器過濾垃圾郵件
Naive Bayesian Classifiers with Multinomial Model and Feature Extraction for Filtering Spams
指導教授: 翁慈宗
Wong, Tzu-Tsung
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業與資訊管理學系碩士在職專班
Department of Industrial and Information Management (on the job class)
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 37
中文關鍵詞: 垃圾郵件特徵萃取文件分類簡易貝氏分類器
外文關鍵詞: document classification, feature extraction, multinomial model, naive Bayesian classifier, spam
相關次數: 點閱:142下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 電子郵件具有便利、快速和低成本的特性,是現今商務往來普遍使用的溝通管道,同時也是頭號的威脅來源,不請自來的廣告資訊,造成垃圾郵件問題日趨嚴重,駭客利用郵件夾帶各式惡意攻擊,還可能成為攻擊方的跳板。常用的過濾技術是利用關鍵字比對,收集到的關鍵字詞樣本越多,成功攔截垃圾郵件的機率則越高,本研究以個案公司為例,垃圾郵件過濾器採用系統預設模式,篩選的關鍵字詞仍有限制,郵件容易發生誤判,因此,本研究將以簡易貝氏分類器結合多項式模型為研究方法,每個字詞出現的次數作為訓練分類器的特徵,利用特徵值來辨別垃圾郵件,透過不斷的學習與訓練,可以有效處理未分類的郵件。為了評估郵件分類的準確性,本研究使用三種不同的測試情境,實驗結果發現選用新字詞取代原有字詞所得到的分類正確率和F測度是最高,垃圾郵件不斷在變化,原有字詞對於新郵件的辨識不一定有效,可能會干擾到分類的結果,故應保留最新的垃圾郵件特徵值來做辨識,定期更新詞庫、維持郵件過濾品質,可以有效提升郵件過濾的準確性,降低郵件威脅的風險。

    Email is widely used in commercial communication nowadays because of its convenience, rapidity, and low cost. Those characteristics also make email one of the main threats for computer security. The problem of the emails with advertising information or computer viruses is becoming more and more serious. A common way for identifying spams is to collect their keywords. The more keywords are collected, the higher probability of successful interception for spams will be. Since the default keywords provided by an email system can be limited in filtering spams, this study attempts to identify whether applying feature extraction technique on new emails to increase the number of keywords is beneficial. The classification tool of this study is the naïve Bayesian classifier with multinomial models in which every occurrence of a keyword is considered in calculating classification probabilities. Three scenarios are tested by the emails collected from a company, and the experimental results suggest that new keywords filtered by the feature extraction technique is necessary for performance improvement. The latest spams should thus be retained to extract keywords for updating thesaurus, which can effectively improve the accuracy of email filtering and reduce the risk of email threats.

    摘要 I 目錄 VII 表目錄 IX 圖目錄 X 第一章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的 2 1.3 論文架構 2 第二章 文獻探討 3 2.1 垃圾郵件 3 2.1.1 垃圾郵件過濾機制 3 2.1.2 垃圾郵件過濾風險 5 2.2 電子郵件分類方法 5 2.2.1 貝氏分類器 6 2.2.2 支援向量機 9 2.2.3 決策樹 10 2.3 小結 11 第三章 研究方法 13 3.1 問題定義及運作流程 13 3.2 特徵萃取 15 3.3 分類技術 17 3.4 效能評估 23 第四章 實證分析 25 4.1 資料集介紹 25 4.2 分類測試 26 4.3 小結 30 第五章 結論與建議 31 5.1 結論 31 5.2 未來展望 32 參考文獻 33 附錄 36

    王文政(2005)。垃圾郵件過濾系統之分析研究。國立臺灣科技大學資訊管理系。
    李俊宏、鄭原平(2007)。Support vector machines分類技術應用於中文垃圾郵件辨別之探討。工程科技與教育學刊,4(4), 462-474.
    陳秀松(2004)。具使用者回饋機制之個人化垃圾郵件過濾系統。國立中央大學資訊管理研究所。
    陳孟峯、黃源弘(2007)。垃圾郵件過濾效能之分析。南臺灣資訊科技與應用研討會,41-48。
    黃耀庭(2006)。垃圾郵件過濾軟體之市場價位與功能需求分析。國立臺北科技大學工業工程與管理系。
    葉振山、林器弘(2011)。應用分類迴歸樹區別通訊上未經請求之電子郵件。資訊安全通訊,17(4), 23-42.
    熊俊凱(2009)。基於文件分類方法建立多類別決策樹以過濾垃圾郵件之研究。國立東華大學數位知識管理碩士學位學程。
    劉邦典、林俊宇(2005)。垃圾郵件的氾濫與危害分析。績效與策略研究,2(2), 121-139.
    劉超瑞(2013)。應用多項式簡易貝氏分類器於文件分類的推導廣義狄氏分配參數之方法。國立成功大學資訊管理研究所。
    羅淑薰(2007)。具部份漸進學習能力之類神經網路樹及其於垃圾郵件過濾器之應用。國立中央大學資訊工程研究所。
    Asia Spam-message Research Center (2020). https://www.asrc-global.com/insights.html?pg=3&nid=1017
    Bhuiyan, H., Ashiquzzaman, A., Juthi, T. I., Biswas, S., and Ara, J. (2018). A survey of existing e-mail spam filtering methods considering machine learning techniques. Global Journal of Computer Science and Technology, 18(2), 21-29.
    Christina, V., Karpagavalli, S., and Suganya, G. (2010). A study on email spam filtering techniques. International Journal of Computer Applications, 12(1), 7-9.
    Lv, T., Yan, P., Yuan, H., and He, W. (2020). Spam filter based on naive Bayesian classifier. Journal of Physics: Conference Series, Vol.1575, 012054.
    Mocherla, S., Danehy, A., and Impey, C. (2017). Evaluation of naive Bayes and support vector machines for wikipedia. Applied Artificial Intelligence, 31(9-10), 733-744.
    Nisar, N., Rakesh, N., and Chhabra, M. (2021). Review on email spam filtering techniques. International Journal of Performability Engineering, 17(2), 178-190.
    Peng, W., Huang, L., Jia, J., and Ingram, E. (2018). Enhancing the naive Bayes spam filter through intelligent text modification detection. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), 849-854.
    Rathi, M. and Pareek, V. (2013). Spam mail detection through data mining – a comparative performance analysis. International Journal of Modern Education and Computer Science, 5(12), 31-39.
    Rusland, N. F., Wahid, N., Kasim, S., and Hafit, H. (2017). Analysis of naive Bayes algorithm for email spam filtering across multiple datasets. International Research and Innovation Summit (IRIS), Vol.226, 012091.
    Saad, O., Darwish, A., and Faraj, R. (2012). A survey of machine learning techniques for spam filtering. International Journal of Computer Science and Network Security (IJCSNS), 12(2), 66-73.
    Sharma, M. and Sharma, S. (2018). A survey of email spam filtering methods. Control Theory and Informatics, Vol.7, 14-21.
    Shrivas, A. K. and Hota, R. (2015). Decision tree model for classification of e-mail data with feature selection. International Journal of Research Studies in Computer Science and Engineering (IJRSCSE), 15-19.
    Sinaga, A. S., Munandar, M. H., and Sitio, A. S. (2020). Machine learning algorithm to identifies fraud emails with feature selection. Annual Conference on Computer Science and Engineering Technology (AC2SET), Vol.1088, 012011.
    Singh, M., Pamula, R., and Shudhanshu, K. S. (2018). Email spam classification by support vector machine. 2018 International Conference on Computing, Power and Communication Technologies (GUCON), 878-882.
    Singh, G., Kumar, B., Gaur, L., and Tyagi, A. (2019). Comparison between multinomial and Bernoulli naïve Bayes for text classification. 2019 International Conference on Automation, Computational and Technology Management (ICACTM), 593-596.
    Sun, G., Li, S., Chen, T., Li, X., and Zhu, S. (2017). Active learning method for Chinese spam filtering. International Journal of Performability Engineering, 13(4), 511-518.
    Tope, M. (2019). Email spam detection using naive Bayes classifier. International Journal of Scientific Development and Research (IJSDR), 4(6), 1-7.
    Vijayasekaran, G. and Rosi, S. (2018). Spam and email detection in big data platform using naives Bayesian classifier. International Journal of Computer Science and Mobile Computing (IJCSMC), 7(4), 53-58.
    Wei, Q. (2018). Understanding of the naive Bayes classifier in spam filtering. 6th International Conference on Computer-Aided Design, Manufacturing, Modeling and Simulation (CDMMS), Vol.1967, 020007.
    Zhiwei, M., Singh, M. M., and Zaaba, Z. F. (2017). Email spam detection : A method of metaclassifiers stacking. Proceedings of the 6th International Conference on Computing and Informatics (ICOCI), 750-757.

    無法下載圖示 校內:2027-08-18公開
    校外:2027-08-18公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE