| 研究生: |
楊璿衞 Yang, Hsuan-Wei |
|---|---|
| 論文名稱: |
整合監督式與非監督式分析之特徵選取框架 Integrative Framework for Supervised and Unsupervised Feature Selection |
| 指導教授: |
張天豪
Chang, Tien-Hao |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2017 |
| 畢業學年度: | 105 |
| 語文別: | 中文 |
| 論文頁數: | 35 |
| 中文關鍵詞: | 機器學習 、特徵選取 、監督式 、非監督式 |
| 外文關鍵詞: | Machine Learning, Feature Selection, Supervised, Unsupervised |
| 相關次數: | 點閱:135 下載:36 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來機器學習這項學門不斷的進步與發展逐漸深入人類日常生活,在不同的研究領域都能運用此項技術,機器學習最重要的元素可分為兩個部分,其一為學習的演算法,其二則為大量的資料集。有賴於現今科技發展,各式各樣的資料集不斷的產生,然而隨著資料集的量級快速成長,為了從中萃取出關鍵的資訊,特徵選取(Feature Selection)這項技術便應運而生。特徵選取包含了數種不同形式的應用,大致可分為過濾器(Filter)與包裝器(Wrapper)兩類,對於過濾器本身又可分為監督式(Supervised)與非監督式(Unsupervised)兩種不同的分析方式,在過往的研究中為了穩健的效能大多使用過濾器搭配包裝器的組合,然而其中過濾器往往僅使用單一的分析方式,因此容易因資料集或是演算法特性的不同而有效能上的偏差。
本研究提出一個嶄新的整合式特徵選取框架,此框架透過使用監督式與非監督式方法同時進行不同角度的分析,並使用循序前饋式浮點選取法(Sequential Floating Forward Selection)將兩類的結果混合,此框架在本研究中實際結合五種不同類型的預測器,並應用於十二種不同量級的資料集,結果顯示藉由整合不同方法進行完善的分析在主流的預測器中能有效的萃取出關鍵資訊並提升效能。
In today, Machine Learning (ML) is widely used and make progress in numerous research domain. This field is based on two main components which are learning algorithms and datasets. The size of datasets is getting bigger nowadays, as a result, it may contain redundant or irrelevant data which can be discarded. Feature Selection (FS) is an effective way to select informative data, it’s algorithm can be separated into two main categories: wrapper and filter. According to the different target concept, filter methods can be further divided into two types: Supervised and Unsupervised. In this thesis, we propose a novel feature selection framework named Integrative Framework of Supervised and Unsupervised Feature Selection (IFSU) to simultaneously analyze datasets from different perspectives.
1. Das, S. Filters, wrappers and a boosting-based hybrid for feature selection. in ICML. 2001. Citeseer.
2. Kohavi, R. and G.H. John, Wrappers for feature subset selection. Artificial intelligence, 1997. 97(1): p. 273-324.
3. Pudil, P., J. Novovičová, and J. Kittler, Floating search methods in feature selection. Pattern recognition letters, 1994. 15(11): p. 1119-1125.
4. Hall, M.A. and L.A. Smith, Feature subset selection: a correlation based filter approach. 1997.
5. Coifman, R.R. and M.V. Wickerhauser, Entropy-based algorithms for best basis selection. IEEE Transactions on information theory, 1992. 38(2): p. 713-718.
6. Chen, Y.-W. and C.-J. Lin, Combining SVMs with various feature selection strategies, in Feature extraction. 2006, Springer. p. 315-324.
7. Oh, I.-S., J.-S. Lee, and B.-R. Moon, Hybrid genetic algorithms for feature selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004. 26(11): p. 1424-1437.
8. Hsu, H.-H., C.-W. Hsieh, and M.-D. Lu, Hybrid feature selection by combining filters and wrappers. Expert Systems with Applications, 2011. 38(7): p. 8144-8150.
9. Hearst, M.A., et al., Support vector machines. Intelligent Systems and their Applications, IEEE, 1998. 13(4): p. 18-28.
10. Schölkopf, B. and A. Smola, Support Vector Machines. Encyclopedia of Biostatistics, 1998.
11. Suykens, J.A. and J. Vandewalle, Least squares support vector machine classifiers. Neural processing letters, 1999. 9(3): p. 293-300.
12. Duda, R.O. and P.E. Hart, Pattern classification and scene analysis. Vol. 3. 1973: Wiley New York.
13. Friedman, N., D. Geiger, and M. Goldszmidt, Bayesian network classifiers. Machine learning, 1997. 29(2-3): p. 131-163.
14. Cover, T. and P. Hart, Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 1967. 13(1): p. 21-27.
15. Breiman, L., Random Forests. Machine Learning, 2001. 45(1): p. 5-32.
16. Breiman, L., Bagging predictors. Machine learning, 1996. 24(2): p. 123-140.
17. Ho, T.K., The random subspace method for constructing decision forests. IEEE transactions on pattern analysis and machine intelligence, 1998. 20(8): p. 832-844.
18. Pomeroy, S.L., et al., Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 2002. 415(6870): p. 436-442.
19. Golub, T.R., et al., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. science, 1999. 286(5439): p. 531-537.
20. Li, J., et al., Feature Selection: A Data Perspective. arXiv preprint arXiv:1601.07996, 2016.
21. Spearman, C., The proof and measurement of association between two things. The American journal of psychology, 1904. 15(1): p. 72-101.
22. Kendall, M.G., A new measure of rank correlation. Biometrika, 1938. 30(1/2): p. 81-93.
23. Kendall, M.G., The treatment of ties in ranking problems. Biometrika, 1945. 33(3): p. 239-251.
24. Pearson, K., X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 1900. 50(302): p. 157-175.
25. Duda, R.O., P.E. Hart, and D.G. Stork, Pattern classification. 2012: John Wiley & Sons.
26. He, X., D. Cai, and P. Niyogi. Laplacian score for feature selection. in Advances in neural information processing systems. 2005.
27. Reshef, D.N., et al., Detecting novel associations in large data sets. science, 2011. 334(6062): p. 1518-1524.
28. Zhao, Z. and H. Liu. Spectral feature selection for supervised and unsupervised learning. in Proceedings of the 24th international conference on Machine learning. 2007. ACM.