| 研究生: |
陳慶全 Chen, ChingChuan |
|---|---|
| 論文名稱: |
子空間投影之密度函數比估計在二元分類問題之應用 A Classification Approach Based on Density Ratio Estimation with Subspace Projection |
| 指導教授: |
陳瑞彬
Chen, Ray-Bing |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 統計學系 Department of Statistics |
| 論文出版年: | 2014 |
| 畢業學年度: | 102 |
| 語文別: | 英文 |
| 論文頁數: | 94 |
| 中文關鍵詞: | 密度函數比 、維度詛咒 、維度縮減 、AUC 、partial AUC |
| 外文關鍵詞: | density ratio function, curse of dimensionality, dimension reduction, AUC, partial AUC |
| 相關次數: | 點閱:107 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在此篇文章中,我們提出以密度函數比的方法分類的方法。Kanamori et al. (2009)提出利用最小平方法直接估計密度函數比的方法,並以其解決分類問題。然而,維度詛咒造成計算上的問題。為克服此問題,我們提出將資料投影到適當的維度下,再進行密度函數比的估計。我們以AUC做為評估分類的依據,模擬與實際資料都顯示我們的方法與羅吉斯回歸、SVM等方法可相比擬。另外,我們也以partial AUC作為評估分類的依據,結果亦呈現我們方法表現尚佳。
In this work, we consider a classification method based on density ratio estimation. Kanamori et al. (2009) proposed a direct estimation with least-squares approach for the density ratio estimation and showed how to use their density ration estimation approach for classification problem. However, the curse of the dimensionality would be caused the computational problem. To overcome this problem, we suggest projecting data into the proper subspace and then implement the density ratio estimation on this subspace instead of the whole data. We can choose to rotate data or basis. The latter is more efficient than the fronter. Simulation studies with different scenarios and several real examples are used to illustrate the performances of the proposed method. Based on the area under the receiver operating characteristic (ROC) curve (AUC) classification score, the results show the improvements of the proposed method and demonstrate the proposed method is comparable with other approaches, for example, logistic model approach. We also consider other classification score, partial AUC, the results presents that the proposed method performs fairly.
References
D. Bamber. The area above the ordinal dominance graph and the area below the receiver operating journal graph. Journal ofMathematical Psychology, 12(4):387–415, 1975.
L. E. Dodd and M. S. Pepe. Partial auc estimation and regression. Biometrics, 59(3):614–623, September 2003.
G.W. Flake and S. Lawrence. Efficient svm regression training with smo. Machine Learning, 46:271–290, 2002.
T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer,2009.
T. K. Ho and E. M. Kleinberg. Building projectable classifiers of arbitrary complexity. In Proceedings of the 13th International Conference on Pattern Recognition, pages 880–885, August 1996.
R. Hooke and T. A. Jeeves. "direct search" solution of numerical and statistical problems. Journal of the Association for ComputingMachinery, 8(2):212–229, 1961.
G. James,Witten.Witten, T. Hastie, and R. Tibshirani. An introduction to statistical learning. Springer, 2013.
T. Kanamori, S. Hido, and M. Sugiyama. A least-squares approach to direct importance estimation. Journal ofMachine Learning Research, 10:1391–1445, 2009.
T. Kanamori, T. Suzuki, and M. Sugiyama. Statistical analysis of kernel-based least-squares density-ratio estimation. Machine Learning, 86(3):335–367, 2012.
Ker-Chau Li. Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86(414):316–327, June 1991.
B. V. Ramana, M. S. P. Babu, and N. B. Venkateswarlu. A critical study of selected classification algorithms for liver disease diagnosis. International Journal of Database Management Systems, 3(2):506–516,May 2011.
B. V. Ramana, M. S. P. Babu, and N. B. Venkateswarlu. A critical comparative study of liver patients from usa and india: An exploratory analysis. International Journal of Computer Science Issues, 9(2):506–516,May 2012.
G. Rätsch, T. Onoda, and K.-R. Müller. Soft margins for adaboost. Machine Learning, 42:287–320, 2001.
J. Q. Su and J. S. Liu. Linear combinations of multiple diagnostic markers. Journal of the American Statical Association, 88(424):1350–1355, December 1993.
M. Sugiyama, M. Yamada, P. Bünau, T. Suzuki, T. Kanamori, and M. Kawanabe. Direct density-ratio estimation with dimensionality reduction via least-squares heterodistributional subspace search. Neural Networks, 24(2):183–198, 2011.
M. Sugiyama, T. Suzuki, and T. Kanamori. Density Ratio Estimation in Machine Learning. Cambridge University Press, 2012.
Z. Wang and Y.-C. I. Chang. Marker selection via maximizing the partial area under the roc curve of linear risk scores. Biostatistics, pages 1–17, August 2010.
I-C. Yeh, K.-J. Yang, and T.-M. Ting. Knowledge discovery on rfm model using bernoulli sequence. Expert Systems with Applications, 36:5866-5871, April 2009.