簡易檢索 / 詳目顯示

研究生: 石家玲
Shih, Chia-Ling
論文名稱: 資料特性對簡易貝氏分類器執行績效的影響
指導教授: 翁慈宗
Wong, Tzu-Tsung
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業與資訊管理學系
Department of Industrial and Information Management
論文出版年: 2004
畢業學年度: 92
語文別: 中文
論文頁數: 55
中文關鍵詞: 簡易貝氏分類器期望正確率屬性間相依程度屬性可能值個數屬性個數
相關次數: 點閱:92下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   針對領域問題的資料特性對簡易貝氏分類器執行績效的影響,本研究將藉由分析不同資料特性的領域問題,瞭解資料特性對簡易貝氏分類器執行績效的影響程度,讓使用者在面臨領域問題時,可以藉由本研究所提供的簡易貝氏分類器的運作特性,以更明確、便利的方式評估其領域問題在簡易貝氏分類器的執行績效,進而有效的評估簡易貝氏分類器的適用性。本研究將假設「期望正確率」、「屬性間相依程度」、「屬性個數」、「屬性可能值個數」四個因子為影響簡易貝氏分類器執行績效的因素。期望正確率方面,本研究將利用控制各個屬性的條件機率值,模擬出具有不同分類正確率的資料檔進行分析;實證研究中,則將選擇具有最大的單一屬性簡易貝氏分類器分類正確率為期望正確率。此外,任意成對屬性的關係均為一個函數的關係,而成對屬性間的相依程度則代表某一屬性可以透過此函數經由另一個屬性來獲得的機率值。
      本研究發現,當期望正確率愈小的時候,簡易貝氏分類器執行績效受到其它三個資料特性因素影響的變化程度愈明顯;反之,當期望正確率愈大時,簡易貝氏分類器執行績效則不易受到其它三個資料特性因素的影響。在實證研究中我們也發現,當期望正確率愈低的時候,資料檔內所有的屬性共同參與分類學習的效果會比單一屬性的學習效果更好;反之,當期望正確率很高時,簡易貝氏分類器實際的學習效果大致可以經由單一屬性來決定,通常即為具有最大的屬性正確率的屬性。屬性間相依程度方面,整體而言,當屬性間相依程度愈大時,簡易貝氏分類器的分類正確率會逐漸趨向期望的正確率,然而,大多數的實務資料屬性間的相依程度均是偏向中等程度的相依關係,即屬性間相依程度為界於0.5~0.8之間,偏向完全獨立或完全相依的狀況相當罕見,因此與簡易貝氏分類器執行績效變化的關係較不明顯。

    none

    目錄 摘要......................................... I 致謝..........................................II 目錄......................................... III 表目錄....................................... V 圖目錄....................................... VI 符號表....................................... VIII 第一章緒言.................................. 1 第一節研究背景與動機........................ 1 第二節研究目的.............................. 2 第三節論文架構.............................. 3 第二章文獻探討.............................. 4 第一節貝氏網路.............................. 5 第二節簡易貝氏分類器........................ 7 第三節資料特性之於簡易貝氏分類器的相關研究...10 第四節NBC績效改良之相關研究..................17 第五節績效衡量...............................19 第六節本章小結...............................19 第三章研究方法.............................. 21 第一節研究架構...............................21 第二節成對屬性之相依程度衡量.................23 第三節模擬樣本...............................26 第四節本章小結...............................27 第四章模擬分析與實證研究.................... 29 第一節模擬分析...............................29 第二節實證研究...............................43 第三節本章小結...............................49 第五章結論與建議............................ 51 參考文獻.................................... 53

    朱雪龍(2003), 應用資訊理論. 五南圖書公司初版.
    Blake, C.L. and Merz, C.J. (1998), UCI Repository of machine learning databases
    [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.
    Cao, J., Panetta, R., Yue, S., Steyaert, A., Young-Bellido, M., and Ahmad, S. (2003), A
    naive Bayes model to predict coupling between seven transmembrane domain receptors and G-proteins. Department of Molecular Sciences, AstraZeneca R&D
    Montreal, Canada, Vol. 19, No. 2 , pp. 234-240.
    Cheng, C. S. (2000), An Empirical Comparative Study of Heuristics for Modifying The Inconsistency between Priors and Inference Rules of A Bayesian Network. Master’s thesis, National Tsing Hua University, Taiwan, ROC.
    Domingos, P. and Pazzani, M. (1997), On the Optimality of the Simple Bayesian Classifier under Zero-One Loss. Machine Learning, Vol. 29, pp. 103-130.
    Duda, R. O. and Hart, P. E. (1973), Pattern classification and scene analysis. New York,
    Wiley.
    Friedman, N., Geiger, D. and Goldszmidt, M. (1997), Bayesian Network Classifier. Machine Learning, Vol. 29, pp.131-163.
    Hellerstein, J.L., Jayram, T.S. and Rish, I. (2000), Recognizing End User Transactions in Performance Management. Proceedings of American Association for Artificial Intelligence, Austin, Texas, pp. 596-602.
    Horvitz, E.,Breese, J., Heckerman D., Hovel D. and Rommelse K. (1998), The Lumiere Project: Bayesian User Modeling for Inferring the Goals and Needs of Software Users. Proceedings of the Fourteenth Conference on Uncertainty in Artificial 53
    Intelligence.
    Jakulin, A. and Bratko, I. (2003), Analyzing Attribute Dependencies. Proceedings of the
    Fourteenth European Conference on Machine Learning.
    Kononeko, I. (1991), Semi-naïve Bayesian classifier. Proceedings of the Sixth European
    Working Session on Learning, pp. 206-219.
    Kumar, V.P. and Desai, U.B. (1996), Image interpretation using Bayesian networks.
    IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.18, No.1, pp.74–77.
    Langley, P., Iba, W., and Thompson, K. (1992), An Analysis of Bayesian Classifiers. AI Research Branch, NASA Ames Research Center, Moffett Field, CA 94035, USA.
    Langley, P. and Sage, S. (1994), Induction of recursive Bayesian classifiers.
    Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, Seattle, WA.
    Niedermayer, D. (1998), An Introduction to Bayesian Networks and their Contemporary Applications. http://www.niedermayer.ca/papers/bayesian/bayes.html
    Nikovski, D. (2000), Constructing Bayesian Networks for Medical Diagnosis from Incomplete and Partially Correct Statistics. IEEE Transactions on Knowledge and Data Engineering, Vol.12, No. 4, pp.509-516.
    Pazzani, M.J. (1996), Searching for dependencies in Bayesian classifiers. Learning from Data: Artificial intelligence and statistics V, pp. 239-248, New York, Springer-Verlag.
    Pazzani, M.J. and Billsus, D. (1997), Learning and Revising User Profiles:The Identification of Interesting Web Sites. Machine Learning, Vol. 27, pp. 313–331.
    Pearl, J. (1988), Probability Reasoning in Intelligent Systems. San Francisco, Morgan
    Kaufmann.
    Rish, I., Hellerstein, J., and Jayram, T. (2001), An analysis of data characteristics that 54
    affect naïve Bayes performance. Technical Report RC21993, IBM T.J. Watson Research Center.
    Sahami, M., Dumais, S., Heckerman, D., and Horvitz, E. (1998), A Bayesian Approach to Filtering Junk E-Mail. Computer Science Department in Stanford University and Microsoft Research, Redmond, WA.
    Shannon, C. E. (1948), A mathematical theory of communication. Bell System Technical
    Journal, Vol. 27, pp. 379-423 and 623-656.
    Sovarong, L., and Costas, J.S. (1997), A General Equipment Diagnostic System and its Application on Photolithographic Sequences. IEEE Transactions on Semiconductor Manufacturing, Vol.10, No.3, pp. 329-343.
    Stutz, J., Taylor, W., and Cheeseman, P. (1998), AutoClass C - General Information. NASA Ames Research Center.
    http://ic-www.arc.nasa.gov/ic/projects/bayes-group/autoclass/autoclass-c-program.ht
    ml#AutoClass C Witten, I. H. and Frank, E. (2000), Data Mining:pratical machine learning tools and
    techniques with Java implementations. San Francisco, Morgan Kaufmann.
    Ziv, H. and Richardson, D.J. (1997), Constructing Bayesian-network models of software testing and maintenance uncertainties. Proceedings International Conference on Software Maintenance, pp.100-109.

    下載圖示 校內:2019-06-25公開
    校外:2024-06-25公開
    QR CODE