簡易檢索 / 詳目顯示

研究生: 陳慶全
Chen, ChingChuan
論文名稱: 子空間投影之密度函數比估計在二元分類問題之應用
A Classification Approach Based on Density Ratio Estimation with Subspace Projection
指導教授: 陳瑞彬
Chen, Ray-Bing
學位類別: 碩士
Master
系所名稱: 管理學院 - 統計學系
Department of Statistics
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 94
中文關鍵詞: 密度函數比維度詛咒維度縮減AUCpartial AUC
外文關鍵詞: density ratio function, curse of dimensionality, dimension reduction, AUC, partial AUC
相關次數: 點閱:107下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在此篇文章中,我們提出以密度函數比的方法分類的方法。Kanamori et al. (2009)提出利用最小平方法直接估計密度函數比的方法,並以其解決分類問題。然而,維度詛咒造成計算上的問題。為克服此問題,我們提出將資料投影到適當的維度下,再進行密度函數比的估計。我們以AUC做為評估分類的依據,模擬與實際資料都顯示我們的方法與羅吉斯回歸、SVM等方法可相比擬。另外,我們也以partial AUC作為評估分類的依據,結果亦呈現我們方法表現尚佳。

    In this work, we consider a classification method based on density ratio estimation. Kanamori et al. (2009) proposed a direct estimation with least-squares approach for the density ratio estimation and showed how to use their density ration estimation approach for classification problem. However, the curse of the dimensionality would be caused the computational problem. To overcome this problem, we suggest projecting data into the proper subspace and then implement the density ratio estimation on this subspace instead of the whole data. We can choose to rotate data or basis. The latter is more efficient than the fronter. Simulation studies with different scenarios and several real examples are used to illustrate the performances of the proposed method. Based on the area under the receiver operating characteristic (ROC) curve (AUC) classification score, the results show the improvements of the proposed method and demonstrate the proposed method is comparable with other approaches, for example, logistic model approach. We also consider other classification score, partial AUC, the results presents that the proposed method performs fairly.

    摘要I Abstract II Acknowledgements III Contents IV List of Tables VI 1 Introduction 1 2 Literature Review 2 2.1 Framework of uLSIF 2 3 Methodology 5 3.1 Framework of PuLSIF_RD 5 3.1.1 ProjectionMatrix 5 3.1.2 RotationMatrix 5 3.1.3 Projection uLSIF 6 3.1.4 Summary of PuLSIF_RD 7 3.2 Framework of PuLSIF_RB 7 4 Results of PuLSIF Rotation Data 9 4.1 Simulation Results 9 4.2 Results of Real Dataset 15 5 Results of PuLSIF Rotation Basis 19 5.1 Simulation Results 19 5.2 Results of Real Dataset 23 5.3 Comparison with PuLSIF_RD 26 6 Results of PuLSIF_RB for Redundant Variables 31 7 Results of PuLSIF_RB Partial AUC 36 7.1 Methodology 36 7.2 Simulation Results and Results of Real Dataset 36 8 Conclusions and FutureWork 41 8.1 Conclusions 41 8.2 FutureWork 41 A Tables: The Results of PuLSIF_RD 42 B Tables: The Results of PuLSIF_RB 59 C Tables: The Results of PuLSIF_RB for Redundant Variables 73 D Tables: The Results of PuLSIF_RB Partial AUC 82 References 93

    References
    D. Bamber. The area above the ordinal dominance graph and the area below the receiver operating journal graph. Journal ofMathematical Psychology, 12(4):387–415, 1975.
    L. E. Dodd and M. S. Pepe. Partial auc estimation and regression. Biometrics, 59(3):614–623, September 2003.
    G.W. Flake and S. Lawrence. Efficient svm regression training with smo. Machine Learning, 46:271–290, 2002.
    T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer,2009.
    T. K. Ho and E. M. Kleinberg. Building projectable classifiers of arbitrary complexity. In Proceedings of the 13th International Conference on Pattern Recognition, pages 880–885, August 1996.
    R. Hooke and T. A. Jeeves. "direct search" solution of numerical and statistical problems. Journal of the Association for ComputingMachinery, 8(2):212–229, 1961.
    G. James,Witten.Witten, T. Hastie, and R. Tibshirani. An introduction to statistical learning. Springer, 2013.
    T. Kanamori, S. Hido, and M. Sugiyama. A least-squares approach to direct importance estimation. Journal ofMachine Learning Research, 10:1391–1445, 2009.
    T. Kanamori, T. Suzuki, and M. Sugiyama. Statistical analysis of kernel-based least-squares density-ratio estimation. Machine Learning, 86(3):335–367, 2012.
    Ker-Chau Li. Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86(414):316–327, June 1991.
    B. V. Ramana, M. S. P. Babu, and N. B. Venkateswarlu. A critical study of selected classification algorithms for liver disease diagnosis. International Journal of Database Management Systems, 3(2):506–516,May 2011.
    B. V. Ramana, M. S. P. Babu, and N. B. Venkateswarlu. A critical comparative study of liver patients from usa and india: An exploratory analysis. International Journal of Computer Science Issues, 9(2):506–516,May 2012.
    G. Rätsch, T. Onoda, and K.-R. Müller. Soft margins for adaboost. Machine Learning, 42:287–320, 2001.
    J. Q. Su and J. S. Liu. Linear combinations of multiple diagnostic markers. Journal of the American Statical Association, 88(424):1350–1355, December 1993.
    M. Sugiyama, M. Yamada, P. Bünau, T. Suzuki, T. Kanamori, and M. Kawanabe. Direct density-ratio estimation with dimensionality reduction via least-squares heterodistributional subspace search. Neural Networks, 24(2):183–198, 2011.
    M. Sugiyama, T. Suzuki, and T. Kanamori. Density Ratio Estimation in Machine Learning. Cambridge University Press, 2012.
    Z. Wang and Y.-C. I. Chang. Marker selection via maximizing the partial area under the roc curve of linear risk scores. Biostatistics, pages 1–17, August 2010.
    I-C. Yeh, K.-J. Yang, and T.-M. Ting. Knowledge discovery on rfm model using bernoulli sequence. Expert Systems with Applications, 36:5866-5871, April 2009.

    下載圖示 校內:2019-09-02公開
    校外:2019-09-02公開
    QR CODE