| 研究生: |
張良豪 Chang, Liang-Hao |
|---|---|
| 論文名稱: |
利用貝氏屬性挑選法與先驗分配提升簡易貝氏分類器之效能 Improving the performance of Naive Bayes Classifier by using Selective Naive Bayesian Algorithm and Prior Distributions |
| 指導教授: |
翁慈宗
Wong, Tzu-Tsung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 工業與資訊管理學系 Department of Industrial and Information Management |
| 論文出版年: | 2009 |
| 畢業學年度: | 97 |
| 語文別: | 中文 |
| 論文頁數: | 34 |
| 中文關鍵詞: | 狄氏分配 、廣義狄氏分配 、簡易貝氏分類器 、先驗分配 、貝氏屬性挑選法 |
| 外文關鍵詞: | selective naive Bayesian algorithm, prior distribution, Dirichlet distribution, generalized Dirichlet distribution, naive Bayes classifier |
| 相關次數: | 點閱:151 下載:7 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在資料探勘的資料分類領域中,簡易貝氏分類器由於運算速度快且具有一定的分類正確率,已經被廣泛地應用。簡易貝氏分類器在分類時需使用全部屬性,如此一來容易被資料檔中一些對分類不具貢獻的屬性干擾,為了求得更佳的分類效果,會考慮加入挑選屬性的機制。而眾多屬性挑選法中,由於貝氏屬性挑選法能有效地剔除多餘或影響分類結果的屬性,因此常被使用於簡易貝氏分類器中。另一方面,為了提高簡易貝氏分類器的分類正確率,常會假設屬性的可能值服從先驗分配,一般會考量狄氏分配或廣義狄氏分配。針對先驗分配的參數設定已有許多學者提出設定方法,但在以往的研究中,使用屬性挑選法後很少再加入先驗分配。因此本研究將各屬性的先驗分配加入屬性挑選的流程中以連結這兩塊領域,並提出了兩種模式。模式一是待貝氏屬性挑選法結束後找出所有屬性的先驗分配;模式二為每挑選出一個屬性就針對該屬性找出最適合的先驗分配,挑選結束參數也設定完成。在實證部份則從UCI資料存放站上找出17個資料檔作分析,實驗結果顯示模式一搭配廣義狄氏分配時,整體而言相較於模式二及加入先驗分配但不作屬性挑選的簡易貝氏分類器高,正確率較高且較為穩定。
Naive Bayes classifiers have been widely used for data classification because of its computational efficiency and competitive accuracy. When all attributes are employed for classification, the accuracy of the naive Bayes classifier is generally affected by noisy attributes. A mechanism for attribute selection should be considered for improving its prediction accuracy. Selective naive Bayesian method is a very successful approach for removing noisy and/or redundant attributes. In addition, attributes are generally assumed to have prior distributions, such as Dirichlet or generalized Dirichlet distributions, for achieving a higher prediction accuracy. Many studies have proposed the methods for finding the best priors for attributes, but none of them takes attribute selection into account. Thus, this thesis proposes two models for combining prior distribution and feature selection together for increasing the accuracy of the naive Bayes classifier. Model I finds out the best prior for each attribute after all attributes have been determined by the selective naive Bayesian algorithm. Model II finds the best prior of the newest attribute determined by the selective naive Bayesian algorithm when all predecessors of the newest attribute have their best priors. The experimental result on 17 data sets form UCI data repository shows that Model I with the general Dirichlet prior generally and consistently achieves a higher classification accuracy.
中文
林琦芳 (2007),設定簡易貝氏分類器中各屬性之方法,國立成功大學工業管理研究所碩士班碩士論文。
英文
Blanco, R., Inza, I., Merino, M., Quiroga, J., and Larra˜naga, P. (2005). Feature selection in Bayesian classifiers for the prognosis of survival of cirrhotic patients treated with TIPS. Journal of Biomedical Informatics, 38(5), 376–388.
Connor, R. J. and Mosimann, J. E. (1969). Concepts of independence for proportions with a generalization of the Dirichlet distribution. Journal of the American Statistical Association, 64, 194–206.
Cortizo, J.C., and Gir′aldez, J.I. (2006). Multi criteria wrapper improvements to naïve bayes learning. Lecture Notes in Computer Science, 4224 , 419–427.
Fang, K. T., Kotz, S., and Ng, K. W. (1990). Symmetric Multivariate and Related Distributions, New York: Chapman and Hall.
John, G. H., Kohavi, R., and Pfleger, K. (1994). Irrelevant features and the subset selection problem. Proceedings of ICML-94, 11th International Conference on Machine Learning (New Brunswick, NJ), 121–129.
Kittler, J. (1978). Feature set search algorithms. Pattern Recognition and Signal Processing, 41–60.
Langley, P. and Sage, S. (1994). Induction of Selective Bayesian Classifiers. Proceedings of UAI-94, 10th International Conference on Uncertainty in Artificial Intelligence (Seattle, WA), 399–406.
Pernkopf, F. (2005). Bayesian network classifiers versus selective k-NN classifier. Pattern Recognition, 38, 1–10.
Pudil, P. , Novovicova J., and Kittler J. (1994). Floating search methods in feature selection. Pattern Recognition Letters, 15, 1119–1125.
Wong, T. T. (2009). Alternative prior assumptions for improving the performance of naive Bayesian classifiers. Data Mining and Knowledge Discovery, 18, 183–213.