簡易檢索 / 詳目顯示

研究生: 陳子立
Chen, Tzu-Li
論文名稱: 結合特徵選取與判定係數以建構模式樹之方法
Combining Feature Selection with Coefficient of Determination to Grow Model Trees
指導教授: 翁慈宗
Wong, Tzu-Tsung
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理科學系
Department of Industrial Management Science
論文出版年: 2003
畢業學年度: 91
語文別: 英文
論文頁數: 51
中文關鍵詞: 模式樹特徵選取判定係數
外文關鍵詞: feature selection, coefficient of determination, model tree
相關次數: 點閱:104下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • none

    Model trees are similar to decision trees, while they have a linear regression model at each leaf node for prediction, instead of a majority class for classification. It is a useful method for realistic numeric prediction problems. The growing procedure of the model tree is based on a measure called standard deviation reduction (SDR). The property of the SDR will gather instances with relatively close class values into the same node to derive linear regression models. Growing model trees in this way does not consider the linear relations between attributes and class, hence may distort the meanings of data. Thus, we define a new measure called FAR, which quotes the concept of feature selection and coefficient of determination to consider the linear relations between the attribute values and the class values, to grow model trees. This new scheme hopefully could mine more valuable information for the problems of interest. Our experimental results show that the model trees grown by the FAR achieve almost the same prediction accuracy as the ones grown by the SDR and generally have a smaller size to make the interpretation on the learning results easier.

    ACKNOWLEDGEMENTS ………………………………………ii TABLE OF CONTENTS ……………………………………iii LIST OF TABLES …………………………………………iv LIST OF FIGURES ..………………………………………v CHAPTER 1 INTRODUCTION ………………………………1 1.1 Motivation …………………………………………1 1.2 Objectives …………………………………………2 1.3 Organization ………………………………………3 CHAPTER 2 LITERATURE REVIEW .………………………4 2.1 Linear Regression .………………………………4 2.2 Tree-Structured Prediction ……………………5 2.2.1 Growing Methods ...……………………………5 2.2.2 Leaf Models .……………………………………6 2.2.3 Pruning and Smoothing .………………………6 2.3 Applications ………………………………………7 CHAPTER 3 METHODOLOGY .………………………………8 3.1 Linear Regression .………………………………8 3.1.1 Simple Linear Regression ……………………8 3.1.2 Multiple Linear Regression …………………9 3.2 Model Tree .………………………………………10 3.2.1 Growing …………………………………………11 3.2.2 Pruning …………………………………………12 3.2.3 Smoothing Process ……………………………13 3.2.4 Nominal Attributes .…………………………14 3.2.5 Missing Values .………………………………15 3.3 Discussion .………………………………………16 3.4 The R-square Measure .…………………………17 3.5 Synthetic Example ………………………………20 3.5.1 Overlapped Case ………………………………21 3.5.1.1 Performance of the SDR .…………………22 3.5.1.2 Performance of the R-square Measure …23 3.5.2 Nonoverlapped Case .…………………………24 3.5.2.1 Performance of the SDR .…………………24 3.5.2.2 Performance of the R-square Measure …25 3.6 FAR Measure ………………………………………27 3.6.1 Feature Selection ……………………………27 3.6.2 Evaluation Index .……………………………28 3.6.3 Performance of the FAR .……………………29 CHAPTER 4 EXPERIMENTAL STUDY .……………………30 4.1 Model Construction .……………………………30 4.2 Two Examples .……………………………………31 4.2.1 First Simulated Example ……………………32 4.2.2 Second Simulated Example .…………………32 4.3 CPU Performance …………………………………34 4.3.1 Performance of the SDR .……………………34 4.3.2 Performance of the FAR .……………………36 4.4 Baskball .…………………………………………38 4.4.1 Performance of the SDR .……………………38 4.4.2 Performance of the FAR .……………………40 4.5 Auto-mpg .…………………………………………41 4.5.1 Performance of the SDR .……………………41 4.5.2 Performance of the FAR .……………………41 4.6 Smoothed Results .………………………………43 4.7 Nominal Attributes and Missing Values ……45 CHAPTER 5 CONCLUSION AND FUTURE DIRECTIONS .…47 REFERENCES ………………………………………………49

    Alexander, W. P. and Grimshaw, S. D. (1996). Treed gression, Journal of Computational and Graphical Statistics, 5, 156-175.

    Ari, B. and Guvenir, H. A. (2002). Clustered linear ression, Knowledge-Based Systems, 15, 169-175.

    Berikov, V. B. and Rogozin, I. B. (1999). Regression trees for analysis of mutational spectra in nucleotide sequences, Bioinformatics, 15, 553-562.

    Blum, A. L. and Langley, P. (1997). Selection of relevant features and examples in machine learning, Artificial Intelligence, 97, 245-271.

    Breiman, L. (1996). Bagging predictors, Machine Learning, 24, 123-140.

    Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees. Belmont, CA: Wadsworth International Group.

    Chipman, H. A., George, E. I., and McCulloch, R. E. (1998). Bayesian CART model search, Journal of the American Statistical Association, 93, 935-960.

    Chipman, H., George, E.I., and McCulloch, R.E. (2002). Bayesian treed models, Machine Learning, 48:1-3, 299-320.

    Ein-Dor, P. and Feldmesser, J. 1987. UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.

    Hall, M. A. (1999). Feature selection for discrete and numeric class machine learning, Working Paper. University of Waikato, Department of Computer Science.

    Kampichler, C., Dzeroski, S., and Wieland, R. (2000). Application of machine learning techniques to the analysis of soil ecological data bases: relationships between habitat features and Collembolan community characteristics, Soil Biology & Biochemistry, 32:2, 197-209.

    LeBlanc, M. and Tibshirani, R. (1998). Monotone shrinkage of trees, Journal of Computational and Graphical Statistics, 7, 417-433.

    Neter, J., Kutner, M. H., Nachtsheim, C. J., and Wasserman, W. (1996). Applied Linear Regression Models. Burr Ridge, IL: Irwin.

    Peters, G., Morrissey, M. T., Sylvia, G., and Bolte, J. (1996). Linear regression, neural network and induction analysis to determine harvesting and processing effects on surimi quality, Journal of Food Science, 61:5, 876-880.

    Quinlan, J. R. (1992). Learning with continuous classes, In Proceedings of the Australian Joint Conference on Artificial Intelligence, 343-348. Singapore: World Scientific.

    Quinlan, J. R. (1993). Combining Instance-Based and Model-Based Learning, In Proceedings on the Tenth International Conference of Machine Learning, 236-243. San Mateo, CA: Morgan Kaufmann.

    Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso, Journal of the Royal Statistical Society, Series B, 58, 267-288.

    Torgo, L. (1997). Functional models for regression tree leaves, In Proceedings of the international Machine Learning Conference, 385-393. San Mateo, CA: Morgan Kaufmann.

    Wang, Y. and Witten, I. H. (1997). Inducing model trees for continuous classes, In Proceedings of the poster papers of the European Conference on Machine Learning. Prague: University of Economics, Faculty of Informatics and Statistics.

    下載圖示 校內:立即公開
    校外:2003-06-24公開
    QR CODE