| 研究生: |
林琦芳 Lin, Chi-Fang |
|---|---|
| 論文名稱: |
設定簡易貝氏分類器中各屬性先驗分配之方法 Individual Attribute Prior Settings for Improving the Performance of Naive Bayesian Classifiers |
| 指導教授: |
翁慈宗
Wong, Tzu-Tsung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 工業與資訊管理學系 Department of Industrial and Information Management |
| 論文出版年: | 2007 |
| 畢業學年度: | 95 |
| 語文別: | 中文 |
| 論文頁數: | 51 |
| 中文關鍵詞: | 先驗分配 、廣義狄氏分配 、羅氏分配 、簡易貝氏分類器 、屬性相關性 |
| 外文關鍵詞: | naïve Bayesian classifier, prior setting, attribute dependency, Liouville distribution, general Dirichlet distribution |
| 相關次數: | 點閱:78 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在貝氏分類法中,簡易貝氏分類器由於運算速度快,已經被廣泛的使用。而在使用簡易貝氏分類器時,對於資料屬性可能值一般會使用狄氏分配來做為其先驗分配,曾有學者對狄氏分配作為先驗分配的之合理性做過相關探討,但是對參數設定的部份則是採用ㄧ致的調法,由於不同的屬性可能值有不同的特性,若只是因為屬性可能值個數相同就做同樣的調整並不合理,因此本研究的重點是把不同屬性參數值彼此之間的調整視為不相關,對不同的屬性找出其專屬的參數值,而廣義狄氏分配與羅氏分配比狄氏分配更ㄧ般化,且可當作簡易貝氏分類器的先驗分配,因此本研究即針對這三種先驗分配做探討。本研究從UCI資料存放站上找出18個適合的資料檔來做分析,整體來說當先驗分配為狄氏分配及羅氏分配時,屬性個別調整的分類正確率會比屬性一致調整及一致調整後再個別調整略高,當先驗分配為廣義狄氏分配時,則是屬性一致調整後再個別調整會有較高的分類正確率,但整體來說建議使用廣義狄氏分配當做先驗分配,而使用的四種屬性排序方法: 、ADC、SU、DML所得的屬性順序並沒有太大的差異,不過 的計算較為複雜,故建議可以從ADC、SU、DML中挑選一個即可。
Naive Bayesian classifiers are a widely used classification tool because its computation complexity is low. In a naive Bayesian classifier, the prior distribution of an attribute is explicitly or implicitly assumed to be a Dirichlet distribution. A study proposed two alternative types of priors, generalized Dirichlet and Liouville distributions, and systematically and concurrently changed the parameters of the priors for all attributes to study the performance of the naïve Bayesian classifier. Since every attribute is unique, it is unreasonable to adjust the parameters of all priors concurrently. In this study, we consider that the parameter settings on the attribute priors are independent. Three methods, named concurrent prior setting, individual prior setting, and concurrent followed by individual prior setting, are then proposed to study their impacts on the prediction accuracy of the naïve Bayesian classifier when a prior is either a Dirichlet, a generalized Dirichlet, or a Liouville distribution. The experimental results on 18 data sets from UCI data repository demonstrate that when a prior is either a Dirichlet or a Liouville Distribution, individual prior setting generally has a higher classification accuracy than the other two methods. However, when a prior is a Generalized Dirichlet distribution, concurrent followed by individual prior setting has the highest performance. The generalized Dirichlet distribution is overall the best choice among the three distribution families. The impacts of the four measures , ADC, SU, and DML for ranking attributes on the performance of the naïve Bayesian classifier are insignificant. Since the computational cost for measure is higher, any one of the other three measures can be used to rank attributes.
黃偉碩(2005),「利用貝氏分類與因子分析法於半導體製程錯誤偵測與診斷」,中華大學/科技管理研究所碩士論文。
鄭宇麟(2006),「樹狀貝氏分類器狄氏先驗分配之合理性」,國立成功大學/工業與資訊管理學系研究所論文
Aitchison, J. (1985). A general class of distributions on the simplex, Journal of the Royal Statistical Society Series B, 47, 136-146.
Biesiada, J., Duch, W., Kachel, A., Maczka, K., and Palucha, S. (2005). Feature ranking methods based on information entropy with parzen window, International Conference on Research in Electrotechnology and Applied Informatics, 109-118, Katowice Poland.
Bier, V. M. and Yi, W. (1995). A Bayesian method for analyzing dependencies in precursor data, International Journal of Forecasting, 11, 25-41.
Blake, C. and Merz, C. (1998). UCI machine learning repository:
http://www.ics.uci.edu/~mlearn/MLRepository.html .
Clark, P. and Niblett, T. (1989). The CN2 induction algorithm, Machine Learning, 3, 261-283.
Cestnik, B. (1990). Estimating probabilities: A crucial task in machine learning, Proceedings of the 9th European Conference on Artificial Intelligence, 147-150, Stockholm, Sweden: Pitman.
Cestnik, B. and Bratko, I. (1991). On estimating probabilities in tree pruning, Proceedings of the 5th European working session on learning on Machine learning, 138-150, Porto, Portugal.
Cornor, R . J. and Mosimann, J. E. (1969). Concepts of independence for proportions with a generalization of the Dirichlet distribution, Journal of the American Statistical Association, 64, 194-206.
Domingos, P. and Plazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero one loss, Machine Learning, 29, 103-130.
Dougherty, J., Kohavi, R., and Sahami, M. (1995). Supervised and unsupervised discretization of continuous features, Proceedings of the 12th International Conference on Machine Learning, 194-202, San Francisco: Morgan Kaufmann.
Fang, K. T., Kotz, S., and Ng, K. W. (1990). Symmetric Multivariate and Related Distributions, New York: Chapman and Hall.
Hsu, C. N., Huang, H. J., and Wong, T. T. (2003). Implications of the Dirichlet assumption for discretization of continuous attributes in naïve Bayesian classifiers, Machine Learning, 53, 235-263.
Kononenko, I. (1991). Semi-naïve Bayesian classifier, Proceedings of the 6th European Working Session on Learning, 206-219 , Porto, Portugal.
Kohavi, R. and Sahami, M. (1996). Error-based and entropy-based discretization of continuous features, Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, 114-119, Portland, OR.
Langley, P., Iba, W., and Thompson, K. (1992). An analysis of Bayesian classifiers, Proceedings of the 10th National Conference on Artificial Intelligence, AAAI Press and MIT Press, 223-228, Seattle, WA: Morgan Kaufmann.
Lochner, R. H. (1975). On decomposition and aggregation error in estimation: some basic principles and examples, Risk Analysis, 22, 203-214.
Lopez de Mantaras, R. (1991). A distance-based attribute selecting measure for decision tree induction, Machine Learning, 6, 81-92.
Press, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T. (1988). Numerical Recipies in C, Cambridge University Press, Cambridge.
Quinlan, J. R. (1986). Induction of decision trees, Machine Learning, l, 81-106.
Rish, I., Hellerstein, J., and Thathachar, J. (2001). An analysis of data characteristics that affect naïve Bayes performance, IBM Technical Report RC21993.
Sahami, M., Dumais, S., Heckerman, D., and Horvitz, E. (1998). A Bayesian approach to filtering junk e-mail, Learning for Text Categorization: Papers from the AAAIWorkshop, 55-62.
Schneider, K. M. (2003). A comparison of event models for naïve Bayes anti-spam e-mail filtering, Proceedings of the 10th conference on European chapter of the Association for Computational Linguistics, 307-314, Budapest, Hungary.
Schneider, K. M. (2005). Techniques for improving the performance of naïve Bayes for text classification, Lecture Notes in Computer Science, 3406, 682-693.
Shannon, C. E. and Weaver, W. (1949). The Mathematical Theory of Communication, Urbana, IL, University of Illionis Press.
Sridhar, D. V., Bartlett, E. B., and Seagrave, R. C. (1998). Information theoretic subset selection, Computers in Chemical Engineering, 22, 613-626.
Wang, S. J. and Wong, S. K. M. (1989). A measure for concept dissimilarity and its applications in machine learning, Proceedings of the International Conference on Computing and Information, 267-273, Toronto, Ontario: North-Holland.
Wong, T. T. (1998). Generalized Dirchlet distribution in Bayesian analysis, Applied Mathematics and Computation, 97, 165-181.
Wong, T. T. (2005). The feasibility of the Dirchlet assumption in naïve Bayesian Classifier, A working paper.
Wong, T. T. (2006). Perfect aggregation of Bayesian analysis on compositional data, Statistical Papers, 48, 265-282.