簡易檢索 / 詳目顯示

研究生: 洪秀芳
Hung, Meafen
論文名稱: A Meta-Learning Method to Learn from Small Datasets
A Meta-Learning Method to Learn from Small Datasets
指導教授: 利德江
Li, De-Jiang
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2005
畢業學年度: 93
語文別: 英文
論文頁數: 35
中文關鍵詞: 資料特性機器學習簡易貝氏分類器
外文關鍵詞: meta-learning, small dataset learning, naïve Bayes, characterization of datasets, machine learning
相關次數: 點閱:82下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • none

    The nature of survival suggests that learning from fewer examples is often important, but machine learning has not yet learned well from small datasets. In contrast, human beings often learn well from very small examples, even if the number of potentially features is large. To do so, they successfully use previously learned concepts to improve performance of the current task.
     This thesis is an approach to develop a classifier for a small dataset using other datasets and their learning result. Meta-learning aims at how learning systems can increase performance through experience, but researches in characterization of datasets are still lack. We propose a measurement to select a support dataset, a process to acquire prior knowledge and apply it to new learning problem to improve performance. A proper characterization of datasets to match naïve Bayes algorithm is key to the research.

    ABSTRACT I ACKNOWLEDGEMENTS II TABLE OF CONTENTS III LIST OF FIGURES V LIST OF TABLES V CHAPTER 1 INTRODUCTION 1 1.1 META-LEARNING 1 1.2 LEARNING FROM SMALL DATASETS 2 1.3 CONTRIBUTIONS 3 1.4 OUTLINE 3 CHAPTER 2 LITERATURESG# REVIEW 4 2.1 DIRICHLET DISTRIBUTION 4 2.2 BAYESIAN INFERENCE 5 2.2.1 Bayes’ Theorem 5 2.2.2 Learning Dirichlet Models 6 2.3 BAYESIAN CLASSIFICATION 7 2.4 META-LEARNING 10 2.5 DATA CHARACTERIZATION 14 2.6 INFORMATION MEASUREMENT 16 CHAPTER 3 METHODOLOGY 20 3.1 DATA DESCRIPTION 21 3.2 FRAMEWORK 21 3.3 RELATED TASKS 22 3.3.1 How to find Related Tasks 23 3.3.2 Attribute Selection 24 3.4 Prior Assessment 24 3.5 Procedure 25 CHAPTER 4 EXPERIMENTS 27 4.1 EXPERIMENT RESULT 30 4.2 CONCLUSION 31 CHAPTER 5 CONCLUSIONS AND DISCUSSIONS 32 5.1 CONCLUSIONS 32 5.2 DISCUSSIONS 32 REFERENCES 33

    Balakrishnan, N. & Nevzorov, V. B. (2003). A primer on statistical distributions. John Wiley & Sons.
    Bauer, E. & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning 36, 1-2, pp.105–139.
    Blake, C. & Merz, C. (1998). UCI Repository of machine learning databases.
    Buckley, A. G.. (1976). Constrained Minimization Using Powell's Conjugacy Approach. SIAM Journal on Numerical Analysis, Vol. 13, No. 4. pp. 520-535.
    Castillo, E., Hadi, A. S. & Solares, C. (1997). Learning and Updating of Uncertainty in Dirichlet Models. Machine Learning, 26, pp.43-63.
    Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3, pp.261-284.
    Cheeseman, P. & Stutz, J. (1995). Bayesian classification (AutoClass): Theory and results. In Fayyad, U., Piatesky-Shapiro, G., Smyth, P., and Uthurusamy, R., editors, Advances in Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, CA, pp. 153-180.
    Domingos, P. & Pazzani, M. (1997). On the Optimality of the Simple Bayesian Classifier under Zero-One Loss. Machine Learning 29, 103-130.
    Dougherty, J., Kohavi, R., & Sahami, M. (1995). Supervised and unsupervised discretization of continuous features. Machine Learning: Proceedings of the 12th International Conference (ML ’95). San Francisco, CA, Morgan Kaufmann.
    Duda, R. O. & Hart P. E. (1973). Pattern Classification and Scene Analysis. New York:Wiley and Sons.
    Fahlman, S. & Lebiere, C. (1990). The cascade-correlation learning architecture. Advances in Neural Information Processing Systems 2, Morgan Kaufmann, Vol.2, pp.524-532.
    Heckerman, D., Geiger, D. & Chickering, D. (1995). Learning Bayesian Networks: The combination of knowledge and Statistical Data. Machine Learning, 20, pp.197-243.
    Hsu, C.-N., Huang, H.-J. & Wong, T.-T. (2003). Implications of the Dirichlet assumption for discretization of continuous variables in naïve Bayesian classifiers. Machine Learning 53, 3, pp.235-263.
    Huang, H.-J. & Hsu, C.-N. (2002). Bayesian classification for data from the same unknown class. IEEE Transactions on Systems, Man, and Cybernetics, Part B 32(2), pp. 137-145.
    Jones, D. S. (1979). Elementary Information Theory. Clarendon Press, Oxford.
    Langley & Thompson (1992). An analysis of Bayesian classifier. Proceedings of the 10th National Conference on artificial intelligence. Portland, OR, AAAI Press, pp. 223–228
    Lindley, D. V. (1997). The choice of sample size. Statistician, 46, pp.129-138.
    Linder, C. & Studer, R. (1999). AST: Support for Algorithm Selection with a CBR Approach. Proceedings of the 16th International Conference on Machine Learning, Workshop on Recent Advances in Meta-Learning and Future Work.
    Kuba, P., Brazdil, P., Soares, C. & Woznica, A. (2002). Exploiting sampling and meta-learning for parameter setting for multilayer perceptron on regression tasks. Technical report, LIACC, University of Porto / Masaryk University, Brno,.
    Maron, M. (1961). Automatic indexing: An experimental inquiry. Journal of the Association for Computing Machinery 8, pp.404–417.
    Michie, D., Spiegelhalter, D. & Taylor, C. (1994). Machine learning, neural and statistical classification. Ellis Horwood Series in Artificial Intelligence, New York, NY.
    Mitchell, T. (1997). Machine Learning. MacGraw-Hill.
    Pfahringer, B., Bensusan, H. & Giraud-Carrier, C. (2000). Tell me who can learn you and I can tell you who you are: Landmaking various learning algorithms. Proceedings of the 17th International Conference on Machine Learning, Morgan Kaufman, pp. 743-750.
    Pratt, L. (1993). Experiments in the transfer of knowledge between neural networks. In Hanson, S., Drastal, G., and Rivest, R., editors, Computational Learning Theory and Natural Learning Systems, Constraints and Prospects, chapter 4.1. MIT Press.
    Pratt, L. (1994). Non-literal Transfer Among Neural Network Learners, Artificial Neural Networks for Speech and Vision, Chapman & Hall, pp. 143-169.
    Pratt L. & Thrun, S. (1997). Second Special Issue on Inductive Transfer. Machine Learning, 28.
    Press, W. H., Teukolsky, S. A., Vetterling, W. T. & Flannery B. P. (1992). Numerical Recipes in C: The Art of Scientific Computing. London: Cambridge University Press.
    Shannon, C. E. & Weaver, W. (1949). The mathematical theory of communication. University of Illinois Press, Urbana.
    Soares, C., Brazdil, P. & Kuba, P. (2004). A meta-learning approach to select the kernel width in support vector regression, Machine Learning, 54:3, pp.195-209.
    Sohn, Y. (1999). Meta analysis of classification algorithms for pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21, pp.1137-1144.
    Steck, H. & Jaakkola T. (2003). On the Dirichlet Prior and Bayesian Regularization. Suzanna Becker, Sebastian Thrun, Klaus Obermayer (Eds.): Advances in Neural Information Processing Systems 15, NIPS 2002, MIT Press, pp.697-704.
    Stigler, S. M. (1982). Thomas Bayes’s Bayesian Inference. Journal of the Royal Statistical Society, Ser. A, 145, pp.250-258.
    Webb, G. I. (2000). Multiboosting: A technique for combining boosting and wagging. Machine Learning 40, 2, pp.159–196.
    Wilks, S. (1962). Mathematical Statistics. New York: Wiley and Sons.
    Zabell, S. L. (1992). R. A. Fisher and the Fiducial Argument. Statistical Science, 7, pp.369-387.

    下載圖示 校內:2010-06-06公開
    校外:2010-06-06公開
    QR CODE