| 研究生: |
曾麗文 Tseng, Li-Wen |
|---|---|
| 論文名稱: |
貝氏分類器於醫學數據資料探勘之研究 A comparative study of Bayesian classifiers in medical data mining |
| 指導教授: |
李昇暾
Li, Sheng-Tun |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 工業與資訊管理學系碩士在職專班 Department of Industrial and Information Management (on the job class) |
| 論文出版年: | 2011 |
| 畢業學年度: | 99 |
| 語文別: | 中文 |
| 論文頁數: | 52 |
| 中文關鍵詞: | 資料探勘 、貝氏定理分類 、醫療數據集 、單因子變異數分析 |
| 外文關鍵詞: | Data Mining, Bayesian classification, Medical data sets, ANOVA one-way analysis |
| 相關次數: | 點閱:81 下載:13 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
資料探勘主要之目的是在大量之資料中發現有用與有趣之模式。此技術近年來廣泛被應用於商業工業、醫學生物等領域。本文運用資料探勘中之多種貝氏分類法則,進行醫療數據集的預測,並以各分類器實驗來進行多項醫療數據集預測結果的比較研究。
貝氏分類法(Naïve Bayers Classifier)本身具有訓練執行效率快,且對複雜的醫療問題,可預測出良好精確度的優點。且其以機率為分類原則的特性,擁有深厚的統計理論基礎。本研究根據各個數據集,運用於不同的貝氏分類法則的預測結果,進行混沌矩陣、ROC曲線、正確率、精確度、召回率等進行績效評估,並應用ANOVA 單因子變異數分析,比較各分類器於不同數據集實驗結果。
本研究根據實驗結果,在不同的醫療數據集中,分析出各個數據集的分群,並依據實驗結果選出各項績效最好的組合進行分類。最後,希望能透過對醫療數據集比較分析,評斷出較好的貝氏分類方法。應用於不同醫療數據集之結果驗證與評估方式,了解何種醫療數據集在何種貝氏分類方法之預測結果較佳。
The purpose of data mining is that to figure out useful and logistic modules from a considerable number of data. In recent years, data mining is widely getting to be used in fields of commerce, industry, medicine biology and so on. This research is going to perform Naïve Bayers Classifier, one of the data mining methods, to predict the rules of medical data set and compare the prediction results of medical data set through sorts of analyzer experiment.
In terms of Naïve Bayers Classifier, there are a lot of advantages which are higher efficiency of training execution, precise prediction of complicated medical problems. Furthermore, the principle of classification is in accordance with probability which is based on statistics theory. In this research, it is going to conduct different Naïve Bayers Classifier to predict results from each of data set and moreover to evaluate effects through Chaos Matrix、ROC curves、accuracy、precision、recall and so on. Finally, it is going to compare classifier in results of different data set by adopting One-way ANOVA analysis.
According to the results of experiments, in different medical data set, it could be analysed and arranged each data set and select the best association in a basis of evaluation. In conclusion, through analysis of medical data set it could be evaluated better methods from Naïve Bayers Classifier and used in verification of different medical data set and evaluation to indicate what kind of Naïve Bayers Classifier will obtain better predict results.
[中文部分]
呂靜芳,由網站行為歷程以貝式學習建立學習者模式之引導系統,國立中央大學資訊工程研究所碩士論文,1999。
莊宗南、龔榮源、陳俊龍,以資料探勘技術建立病患就醫導引-以胃腸科病患為例,醫療資訊雜誌,15卷,1期,頁17-34,2006。
陳盈秀,SVM 類神經網路於單調性資料探勘之研究,成功大學工業與資訊管理學系碩士論文,2009。
陳蓉蓉、洪昌億、陳春賢、鍾青萍、馬成珉、林建雄、張寓智、張禾坤,決策樹樹於中西醫腦中風診斷指標結合之應用,醫療資訊雜誌,15卷,2期,頁1-15,2006。
陸行、金立人,影響台灣SARS疫情擴散之因素分析-以類神經網路及迴歸為預測模式,智慧科技與應用統計學報,2卷,2期,頁49-71,2004。
曾憲雄、蔡秀滿、蘇東興、曾秋蓉、王慶堯,資料探勘,旗標出版社,臺北,2005。
蔡蕙如、柯明中、張偉斌、劉德明 「應用類神經網路與分類迴歸樹於肝癌分類模式」,北市醫學雜誌,4卷,8期,頁658-667,2007。
鄭宇麟,樹狀貝氏分類器狄氏先驗分配之合理性,國立中央大學資訊工程研究所碩士論文,1996。
魏小乙,Learning Bayesian Networks with Evolutionary Computing,成功大學工業與資訊管理學系碩士論文,2010。
[英文部分]
Balakrishnan, S., R. Narayanaswamy 2009. Feature selection using FCBF in type 2 diabetes databases. International Conference on IT 1-8.
Bellaachia, A., E. Guven 2006. Predicting breast cancer survivability using data mining techniques. Department of Computer Science, George Washington University.
Berry, M., G. Linoff 1997. Data mining techniques for marketing, sales and customer support. New York: John Wiley and Sons.
Bhatia, S., Prakash, P., G. N. Pillai 2008. SVM based decision support system for heart disease classification with integer-coded genetic algorithm to select critical features. Proceedings of the World Congress on Engineering and Computer (WCECS).
Blanco,R., Inza, I., Merino, M., Quiroga, J., P. Larrañaga 2005 Feature selection in Bayesian classifiers for the prognosis of survival of cirrhotic patients treated with TIPS. Biomed. Inform 38 376-388.
Cestnik, B., Kononenko, I., I. Bratko 1987. ASSISTANT-86: a knowledge elicitation tool for sophisticated users, in: I. Bratko, N. Lavrac (Eds.), Progress in Machine Learning. Sigma Press, Wilmslow, U.K. 31-45.
Curt 1995. The deviles in the detail techniques, Tool, & Applications for Data Mining & Konwlegde Discovery-Part I. Intelligent Software Strategies 6 (9) 3.
Delen, D., Walker, G., A. Kadam 2005. Predicting breast cancer survivability: a comparison of three data mining methods. Artificial Intelligence in Medicine 34 113-127.
Dinora Araceli Morales, Endika Bengoetxea, Pedro Larra ˜naga, Miguel Garc´ıa, Yosu Franco, M´onica Fresnada, Marisa Merino 2008. Bayesian classification for the selection of in vitro human embryos using morphological and clinical data. Computer Methods and Programs in Biomedicine 90 104-116.
Druzdzel, M. J., L. C. v. d. Gaag 2000. Building Probabilistic Networks:Where Do the Numbers Come From?. IEEE Transactions on Knowledge and Data Engineering 12 (4) 481-486.
E. Charniak 1991. Bayesian networks without tears. AI Magazine 12 (4) 50-63.
F.V. Jensen 1996. An Introduction to Bayesian Networks. Springer Verlag.
Fayyad, U.M., Piatetsky-Shapior, F., Smyth, P., R. Uthurusamy 1996. The KDD process for extracting useful knowledge from volumes of data. Communications of the ACM 39.
Fleischmann, K.E., Hunink, G.M., Kuntz, K.M., P.S. Douglas 1998. Exercise echocardiography or exercise SPECT imaging. JAMA 280 913-920.
Fogel, D. B., Wasson, E. C., E. M. Boughton 1995. Evolving neural networks for detecting breast cancer. Cancer Letters 96 49-53.
Frawley, W., Piatesky, G. S., C. Matheus 1991. Knowledge discovery in database: an overview, knowledge discovery in database. AAAI/MIT Press.
Friedman, N., Geiger, D., M.,Goldszmidt 1997. Bayesian network classifiers. Machine Learning 29 131-163.
Friedman, N., M. Goldszmidt 1996. Building classifiers using Bayesian networks. Proceedings of the 13th National Conference on Artificial Intelligence 1277-1284.
G. F.Cooper 1989. Current research directions in the development of expert systems based on belief networks. Applied Stochastic Models and Data Analysis 5 (1) 39-52.
Gao, D., Madden, M., Schukat, M., Chambers, D., G. Lyons 2001. Arrhythmia identification from ECG signals with a neural network classifier based on a Bayesian framework. The College of Information Sciences and Technology at Penn State.
Gorunescu, M., Gorunescu, F., Ene, M., Elia Ei-Darizi. 2005. A heuristic approach in hepatic cancer diagnosis using a probabilistic neural network-based model. Heuristic PNN Approach 1016-1025.
Grupe, F. H., M. M. Owrang 1995. Data base mining discovering new knowledge and competitive advantage. Information Systems Management 12 (4) 26-31.
I. J. Good 1950. Probability and the Weighing of Evidence. Charles Griffin, London.
I. Kononenko 1990. Comparison of inductive and Naïve Bayesian learning approaches to automatic knowledge acquisition, in: B. Wielinga, J. Boose, B. Gaines, G. Shereiber, M. van Someren (Eds.), Current Trends in Knowledge Acquisition. IOS Press, Amsterdam 190-197.
I. Kononenko 1991. Semi-naive Bayesian classifier. Proceedings of the 6th European Working Session on Learning on Machine Learning 206-219.
J. Pearl 1988. Probabilistic Reasoning in Intelligent System: Networks of Plausible Inference. Morgan-Kaufmann Publishers, Inc., San Mateo.
Jurisica, I., Mylopoulos, J., Glasgow, J., Shapiro, H., R. Casper 1998. Case-based reasoning in IVF: prediction and knowledge mining. Artif. Intel. Med 12 1-24.
Kayaer, K., T. Yildirim 2003. Medical diagnosis on PIMA Indian diabetes using general regression neural network. International Conference on Artificial Neural Networks and Neural Information Processing (ICANN/ICONIP) 181-184. .
Kukar, M., Kononenko, I., Groselj, C., Kralj, K., J. Fettich 1999. Analysing and improving the diagnosis of ischaemic heart disease with machine learning. Artificial Intelligence in Medicine 16 25-50.
Kurgan, L.A., Cios, K.J., Tadeusiewicz, R., Ogiela, M., L. Goodenday 2001. Knowledge discovery approach to automated cardiac SPECT diagnosis. Artificial Intelligenc in Medicine 23 149-169.
Lee, C.C., S.H. Chen 2007. Classification of liver diseases from CT image using support vector machine. Advanced Computational Intelligence and Intelligent Informatics 11 396-402.
Lee, Y. J., Mangasarian, O.L., W. H. Wolberg 2000. Breast cancer survival and chemotherapy: A support vector machine analysis. DIMACS Series in Discrete Mathematics and Theoretical Computer Science.
Lo, J.Y., Baker, J.A. et.al. 1997. Predicting breast cancer invasion with artificial neural networks on the basis of mammographic features. Radiological 03 159-163.
M. Minsky 1961. Steps toward artificial intelligence, Trans. Inst. Radio Eng 49 8-30.
M. Sahami 1996. Learning limited dependence Bayesian classifiers. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining 335-338.
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten 2009. The WEKA Data Mining Software: An Update; SIGKDD Explorations 11 (1) 3-4.
Mehta, S.S., N.S. Lingayat 2007. Support vector machine for cardiac beat detection in single lead electrocardiogram. International Journal of Applied Mathematics 1630-1635.
Mougiakakou, S.G., Valavanis, I.K., Nikita, A., K.S. Nikita 2006. Computer aided diagnosis of CT focal liver lesions based on texture features, feature selection and ensembles of classifiers. Artificial Intelligence Applications and Innovations IFIP International Federation for Information Processing 204 705-712.
Ohmann, C., Yang, Q., Kunneke, M., Stolzing, H., Thon, K., W. Lorenz 1988. Bayes theorem and conditional dependence of symptoms: different models applied to data of upper gastrointestinal bleeding. Meth. Inform. Med 27 73-83.
Patrizi, G., Manna, C., Moscatelli, C., L. Nieddu 2004. Pattern recognition methods in human-assisted reproduction. International Transactions in Operational Research 11 365-379.
R. A. Kohavi 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. Morgan Kaufmann 1137-1143.
R. E. Neapolitan 1990. Probabilistic Reasoning in Expert Systems, In John Wiley & Sons(Eds.), Theory and Algorithms, New York.
Saith, R., Srinivasan, A., Michie, D., I. Sargent 1998. Relationships between the developmental potential of human in-vitro fertilization embryos and features describing the embryo, oocyte and follicle. Human Reprod Update 4 121-134.
Soria, D., Garibaldi, J. M., Biganzoli, E., I.O. Ellis 2008. A comparison of tree different methods for classification of breast cancer data. Seventh International Conference and Machine Learning & Application.
Street, W.N., Wolberg W.H., O.L. Margasarian 1993. Nuclear feature extraction for breast tumor diagnosis, Science and technology, 861-870.
Tang , K. W., Pingle, G., G. Srikant 1997. Artificial neural networks for the diagnosis of coronary artery disease. Journal of Intelligent Systems 307-338.
Thomas Bayes 1763. Thomas Bayes's Essay Towards Solving a Problem in the Doctrine of Chances. Studies in the History of Probability and Statistics: IX.
Trimarchi, J.R., Goodside, J., Passmore, L., Silberstein, T., Hamel, L., L. Gonzalez 2003. Comparing data mining and logistic regression for predicting IVF outcome. Fertil. Steril 80 100-100.
Witten, I. H., E. Frank 2005. Data mining: Practical machine learning tools and techniques, second edition. Morgan Kaufmann.
Wu, Y., Giger, M. L., Domin, K. et.al. 1993. Artificial neural networks in mammography: Application to decision making in the diagnosis of breast cancer. Radiology 187 81-87.
[網站部分]
Weka. Data Mining Software in Java, (http://www.cs.waikato.ac.nz/ml/weka/.)
Blake, C.L., Merz, C.J, 1998. UCI Repository of machine learning databases. Irvine, CA: University of California, Department of Information and Computer Science. (www.ics.uci.edu/~mlearn/MLRepository.html)