簡易檢索 / 詳目顯示

研究生: 柯映竹
Ko, Ying-Chu
論文名稱: 變數間正負相關對簡易貝氏分類器學習正確率之影響
指導教授: 翁慈宗
Wong, Tzu-Tsung
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理科學系
Department of Industrial Management Science
論文出版年: 2003
畢業學年度: 91
語文別: 中文
論文頁數: 52
中文關鍵詞: 簡易貝氏分類器狄氏分配廣義狄氏分配
外文關鍵詞: naive Bayesian classifier, generalized Dirichlet distribution, Dirichlet distribution
相關次數: 點閱:156下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 簡易貝氏分類器 (naïve Bayesian classifier) 是在先驗 (prior) 分配為狄氏分配 (Dirichlet distribution) 下進行分類的工作,為一運算簡單且有效的分類工具。然而狄氏分配具有任何兩變數間皆為負相關的特性,可是實務上變數間的關係可能為正相關的型態。目前的相關研究中,使用簡易貝氏分類器都在變數間為負相關的假設下進行學習,對於變數間若為正相關時,是否影響簡易貝氏分類器的正確率並未進行探討。本研究擬利用狄氏分配和廣義狄氏分配 (generalized Dirichlet distribution),模擬出具有負相關和正相關特性的資料,來研究變數間的相關性是否會影響到簡易貝氏分類器的正確率,之後再利用真實的資料來進行分析研究。

    摘要 ..........................................................................I 誌謝 .........................................................................II 目錄 ........................................................................III 圖目錄 ........................................................................V 表目錄 .......................................................................VI 第一章 緒言 第一節、研究背景和動機 ......................................................1 第二節、研究目的 ............................................................2 第三節、論文架構 ............................................................3 第二章 文獻探討 第一節、簡易貝氏分類器 ......................................................4 一、假設條件 ............................................................4 二、學習機制 ............................................................7 第二節、廣義狄氏分配 ........................................................8 第三章 研究方法 第一節、研究架構 ...........................................................10 第二節、簡易貝氏分類器的運作 ................................................13 第三節、模擬樣本 ...........................................................15 第四節、模擬資料之先驗分配為狄氏分配 .......................................21 一、 資料和先驗分配之期望值一致 ........................................22 二、 資料和先驗分配之期望值不一致 ......................................24 第五節、模擬資料之先驗分配為廣義狄氏分配 ...................................26 一、 資料和先驗分配之期望值一致 ........................................26 二、 資料和先驗分配之期望值不一致 ......................................28 第四章 實證研究 第一節、Pima Indians Diabetes Database .....................................31 一、 資料和先驗分配之期望值一致 ........................................32 二、 資料和先驗分配之期望值不一致 ......................................33 第二節、Vehicle Silhouettes Dataset ........................................34 一、 資料和先驗分配之期望值一致 ........................................35 二、 資料和先驗分配之期望值不一致 ......................................35 第三節、Glass Identification Database ......................................37 一、 資料和先驗分配之期望值一致 ........................................37 二、 資料和先驗分配之期望值不一致 ......................................38 第四節、討論 ...............................................................39 第五章 結論與建議 ............................................................41 參考文獻 .....................................................................43 附錄 .........................................................................46

    Cestnik, B. and Bratko, C. (1991), On estimating probabilities in tree pruning,
    Machine Learning – EWSL-91, European Working Session on Learning, Berlin,
    Germany, Springer- Verlag, 138-150.

    Clark, P. and Niblett, T. (1989), The CN2 Induction Algorithm, Machine Learning,
    Vol. 3, 261-283.

    Connor, R. J. and Mosimann, J. E. (1969), Concepts of Independence for
    Proportions with a Generalization of the Dirichlet Distribution, Journal of
    the American Statistical Association, Vol. 64, 194-206.

    Dodier, R. (1999), Unified Prediction and Diagnosis in Engineering System by
    Means of Distributed Belief Networks, PH. D dissertation, University of
    Colorado.

    Domingos, P. and Pazzani, M. (1997). On the Optimality of the Simple Bayesian
    Classifier under Zero One Loss, Machine Learning, Vol. 29, 103-130.

    Duda, R. O. and Hart, P. E. (1973), Pattern Classification and Scene Analysis,
    John Wiley, New York.

    Ein-Dor, P. and Feldmesser, J. (1987), UCI Repository of machine learning
    databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA:
    University of California, Department of Information and Computer Science.

    Hellerstein, J., Thathachar, J., and Rish, I. (2000), Recognizing End User
    Transactions in Performance Management, Proceedings of AAAI, Austin, Texas,
    596-602.

    Hsu, C. N., Huang, H. J., and Wong, T. T. (2000), Why Discretization Works for
    Naïve Bayes Classifier, Proceedings of the Seventeenth International
    Conference on Machine Learning , Morgan Kaufmann ,San Mateo.

    Huang, H. and Hsu, C. N. (2002), Bayesian Classification for Data from the Same
    Unknown Class, IEEE Transaction on Systems, Man, and Cybernetics Part B,
    32(2), 137-145.

    John, G. and Langley, P. (1995), Estimating Continuous Distributions in Bayesian
    Classifiers, Proceedings of the Eleventh Conference on Uncertainty in
    Artificial Intelligence, Montreal, Canada, 338-345.

    Kononenko, I. (1991), Semi Naïve Bayse Classifier, Proceedings of the Sixth
    European Working Session on Learning, Porto, Portugal, 206-219.

    Law, M. and Kelton, W. (1991), Simulation modeling and analysis, McGraw-Hill, New
    York.

    Langley, P., Iba, W., and Thompson, K. (1992), An Analysis of Bayesian
    Classifier, Proceedings of the Tenth National Conference on Artificial
    Intelligence, San Jose, 399-406.

    Li, Y. H. and Jain, A. K. (1998), Classification of text documents, Computer
    Journal, Vol. 41, 537-546.

    Lochner, R. H. (1975), A Generalized Dirichlet Distribution in Bayesian Life
    Testing, Journal of Royal Statistical Society, Series B, Vol. 37, 103-113.

    Neapolitan, R. and Kenevan, J. (1991), Investigation of Variances in Belief
    Networks, Proceedings of the Seventh Conference on Uncertainty in Artificial
    Intelligence, Morgan Kaufmann , USA.

    Pearl, J. (1988), Probabilistic Reasoning in Intelligent System:Network of
    Plausible Inference, Morgan Kaufmann.

    Spiegelhalter, D. J., Harris, N. L., Bull, K., and Franklin, R. C. G. (1994),
    Empirical Evaluation of Prior Belief about Frequencies-Methodology and A Case
    Study in Congenital Heart Disease, Journal of the American Statistical
    Association, Vol. 89, 435-443.

    Stewart, B. (2002), Predicting Project Delivery Rates Using the Naïve Bayes
    Classifier, Journal of Software Maintenance and Evolution and Practice, Vol.
    14, 161-179.

    Wilks, S. S. (1962), Mathematical Statistics, John Wiley, New York.

    Wong, T. T. (1998a), Perfect Aggregation in Dependent Bernoulli Systems with
    Bayesian Updating, PH. D dissertation, University of Wisconsin – Madison.

    Wong, T. T. (1998b), Generalized Dirichlet Distribution in Bayesian Analysis,
    Applied Mathematics and Computation, Vol. 97, 165-181

    Witten, I. H. and Frank, E. (1999), Data Mining, Morgan Kaufmann ,San Francisco.

    下載圖示 校內:2004-06-23公開
    校外:2005-06-23公開
    QR CODE