研究生: |
徐芳玲 Hsu, Feng-lin |
---|---|
論文名稱: |
以主成分分析應用在決策樹名目屬性之二元分割上 |
指導教授: |
葉榮懋
Yeh, Jong-Mau |
學位類別: |
碩士 Master |
系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
論文出版年: | 2002 |
畢業學年度: | 90 |
語文別: | 中文 |
論文頁數: | 59 |
中文關鍵詞: | 二元分割 、主成分分析 、決策樹 、不純度 |
外文關鍵詞: | factorial analysis, binary partition, decision tree, entropy |
相關次數: | 點閱:76 下載:4 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於資訊科技的日新月異,資料的儲存與處理規模均與過去有相當大的落差。要如何從龐大的資料量中擷取出有用的資訊以提供給決策者參考,一直是資料探勘領域裡所關注的重點。
決策樹由於其運算容易,又能產生清楚的規則,使其成為資料探勘中最常用的分類技術之一。但是,當欲處理的資料量龐大,且屬性的屬性值相當多的情況之下,會造成決策樹的分支多且過於複雜,資料在處理上的效率也會大打折扣。為此簡化決策樹的方法也不斷的提出,以用來改進決策樹在大型資料庫中的績效表現。
本研究所欲採用的簡化決策樹方法為將各個名目屬性中的值做二元分割,而二元分割基準以主成分分析中各屬性值所對應的成分分數為準,以期能減少過多與不必要的決策樹分支。本研究採用主成份分析的主因:在於其可以利用少數的潛在變量或成分便能有效代表許多彼此有關的變項之結構;此外,利用主成分分析,我們也可以將少數幾個變項予以線性組合,使經由線性組合而得的成分之變異數為最大,以使資料在這些成分方面顯出最大的差異來。亦即,本研究採用可以表示大部分變異的主成分,並利用該成分裡變異最大化且經過標準化的成分分數,作為二元分割屬性值的基準,以冀能減少過多的屬性值,並進一步的達到簡化與改善決策樹的目的。後續的研究工作主要以驗證此種二元分割方式在執行效率、不純度與正確率上的表現與其他類似二元分割方式的比較上是否有令人滿意的結果,並尋找適當之個案以進行實務上之應用。
none
[1] Aha, D.W. & Breslow, L.A. Comparing simplification procedures for decision trees. Artificial Intelligence and Statistics, 5, 199-206, 1998.
[2] Allwein, E. L., Schapire, R. E. & Singer, Y. Reducing multiclass to binary: a unifying approach for margin classifiers. Journal of Machine Learning Research, 1, 113-141, 2000.
[3] Auer, P., Holte, R. C. & Maass, W. Theory and applications of agnostic PAC-learning with small decision trees. Proceedings of the twelfth international conference on machine learning, 21-29, 1995.
[4] Blake, C. L. & Merz, C. J. UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science, 1998.
[5] Brodley, C. E. & Utgoff, P. E. Multivariate decision trees. Machine Learning, 19, 45-77, 1995.
[6] Cherkauer, K. J. & Shavlik, J. W. Growing simpler decision trees to facilitate knowledge discovery. Proceedings of the Second International Conference on Knowledge Discovery and DataMining, 315–318, 1996.
[7] Coppersmith, D., Hong, S. j. Partitioning nominal attributes in decision trees. Data Mining and Knowledge, 3, 197-217, 1999.
[8] Cox, L. A., Qiu, Y. & Kuehner, W. Heuristic least-cost computation of discrete classification functions with uncertain argument values. Annals of Operations Research, 21(1), 1-30, 1989.
[9] Dechter, R. Decomposing a relation into a tree of binary relations. Journal of Computer and System Sciences, 41(1), 2-24, 1990.
[10] Deogun, J. S., Raghavan, V. V., Sarkar, A., & Sever, H. Data mining: research trends, challenges, and applications. Dordrecht: Kluwer Academic Publishers, 1997.
[11] Esposito, F., Malerba, D. & Semeraro. G. A further study of pruning methods in decision tree induction. Proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics, 211-218, 1995.
[12] Fayyad, U. M. & Irani, K. B. Multi interval discretization of continous attributes for classification learning. Proceedings of 13th International Joint Conference on Artificial Intelligence, 1022-1027, 1990.
[13] Frederickson, G. N. Optimal algorithms for tree partitioning. Proceedings of the Second Annual ACM-SIAM Symposium on Discrete Algorithms, 168-177, 1991.
[14] Ganti, V., Gehrke, J. & Ramakrishnan, R. Mining very large databases. IEEE Computer, 32(8), 38-44, 1999.
[15] Gehrke, J., Ganti, V., Ramakrishnan, R. & Loh, W.Y. BOAT---Optimistic decision tree construction. Proceedings of the SIGMOD Conference 1999, 169-180, 1999.
[16] Growe, G. A. Comparing algorithms and clustering data: components of the data mining process. Computer Science Department, Grand Valley State University, 1999.
[17] Heath, D. A geometric framework for machine learning. PhD thesis, Johns Hopkins, 1992.
[18] Ho, K. M. & Scott, P. D. Binary decision trees. Technical report CSM-313, Department of Computer Sciences, University of Essex, 1999.
[19] Hoeffgen, K. U., Simon, H. U. & Horn, K. S. Robust trainability of single neurons. Journal of Computer system Sciences, 50(1), 114-125, 1995.
[20] Hyafil, L. & Rivest, R. L. Constructing optimal binary decision tree is NP-complete. Information Processing Leeters, 5(1),15-17, 1976.
[21] John, G. Robust Decision Trees: Removing Outliers in Databases. Proceedings of the First International Conference on Knowledge Discovery and Data Mining, 174–179, 1995.
[22] Krishnaswamy, R., Alijani , G. S., & Su , S. C. On constructing binary space partitioning trees. NY: ACM Press New York, 230-235, 1990.
[23] Lim, T. S., Loh, W. Y. & Shih, Y. S. An empirical comparison of decision tree and other classification methods. Technical Report 979, Department of Statistics, University of Wisconsin, 1997.
[24] Marshall, R. J. Generation of Boolean classification rules. Proceedings of Computational Statistics 2000, 2000.
[25] Martin, J. K. & Hirschberg, D. S. The time complexity of decision tree induction. Department of Information and Computer Science, University of California, 1995.
[26] Matousek, J. Efficient partition trees. Discrete & Computational Geometry, 8, 315-334, 1992.
[27] Merckt, T. V. D. & Quinlan, J. R. Two-threshold splits of continous attributes in decision trees. The Basser Department of Computer Science, University of Sydney, 1996
[28] Miller, D., Rao, A., Rose, K. & Gersho, A. A global optimization technique for statistical classifier design. IEEE Transactions on Signal Processing, 4, 3108-3121, 1996.
[29] Mitchell, M. Generalization as search. Artificial Intelligence, 18, 203-226, 1982.
[30] Murphy, O. J. & McCraw, R. L. Designing storage efficient decision trees. IEEE Transactions on Computers, 40(3), 315-319, 1991.
[31] Murphy, S. K. Automatic construction of decision trees from data: a multi-disciplinary survey. Data Mining and Knowledge Discovery, 2, 345-389, 1998.
[32] Murthy, S. K. On growing better decision trees from data. Department of Computer Science, Johns Hopkins University, 1995.
[33] Nagaraj, S. V. Optimal binary search trees. Theoretical Computer Science, 188, 1-44, 1996.
[34] Naumov, G. E. NP-completeness of problems of construction of optimal decision trees. Soviet Physics, 36(4), 270-271, 1991.
[35] Oates, T. & Jensen, D. The effects of training set size on decision tree complexity. Proceedings of the Fourteenth International Conference on Machine Learning, 1997.
[36] Pagallo, G. & Haussler, D. Boolean feature discovery in empirical learning. Machine Learning, 5, 71–100, 1990.
[37] Peshkin, L. Dimensionality reduction – a primer. Department of Computer Science, Brown University, 1995.
[38] Putten , P. V. & Someren , M. V. CoIL Challenge 2000: The Insurance Company Case. Published by Sentient Machine Research, Amsterdam. Also a Leiden Institute of Advanced Computer Science Technical Report 2000-09. June 22, 2000.
[39] Quinlan, J. R. Programs for machine learning. San Mateo: Morgan Kaufmann, 1993.
[40] Quinlan, J. R. Improved use of continuous attributes in C4.5. Journal of Artificial IntelligenceResearch, 4, 77–90, 1996.
[41] Ragavan, H. & Rendell, L. Lookahead feagure construction for learning hard concepts. Preceedings of the Tenth International Conference on Machine Learning, 252-259, 1993.
[42] Tino, B. L. & Niels, A. N. A comparison of five elicitation techniques for elicitation of attributes of low involvement products. Journal of Economic Psychology, 20, 315-341, 1998.
[43] Utgoff, P.E. & Clouse , J. A. A Kolmogorov-Smirnoff metric for decision tree induction. Technical Report 96-3, Department of Computer Science, University of Massachusetts, 1996.
[44] Utgoff, P.E. Decision tree induction based on efficient tree restructuring. Technical Report 95-18, Department of Computer Science, University of Massachusetts, 1996.
[45] Wuuthrich, B. & Karlapalem, K. Data mining opportunities in very large object oriented databases. ACM-SIGMOD workshop on Research Issues on Data Mining and Knowledge Discovery, 1996.
[46] Zheng, Z. Constructing nominal X-of-N attributes. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1064–1070, 1995.
[47] Zimenkov, A. Tree classifiers. Department of Information Technology, Lappeenranta University of Technology, 2000.