簡易檢索 / 詳目顯示

研究生: 林鼎晃
Lin, Ding-Huang
論文名稱: 在稀少事件下邏輯式迴歸於三種懲罰項的變數篩選能力之初步探討
A preliminary study of variable selection in penalized logistic regression with rare events data
指導教授: 嵇允嬋
Chi, Yun-Chan
學位類別: 碩士
Master
系所名稱: 管理學院 - 統計學系
Department of Statistics
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 33
中文關鍵詞: 懲罰邏輯式迴歸最小絕對緊縮與選擇算子平滑修剪絕對離差適應性最小絕對緊縮與選擇算子懲罰最大概似估計法
外文關鍵詞: LASSO, SCAD, Adaptive LASSO
相關次數: 點閱:143下載:6
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在研究基因表現量資料時,通常因為做為解釋變數的基因數量遠大於樣本數,造成模型之迴歸係數無法估計。有鑑於此,變數篩選就是一個在建模前必要的步驟。學者們陸續提出懲罰線性迴歸(penalized linear regression)來進行變數篩選。常見的變數篩選方法有Tibshirani(1996)的最小絕對緊縮與選擇算子(Least Absolute Shrinkage and Selection Operator, LASSO)、Fan 和Li(2001)的平滑修剪絕對離差(Smoothly Clipped Absolute Deviation, SCAD)、Zou 和Hastie (2005) 的elastic net,及Zou(2006)的適應性最小絕對緊縮與選擇算子(Adaptive LASSO)。後來多數學者已經將上述方法,從連續型反應變數推廣至二元反應變數,而形成懲罰邏輯式迴歸(penalized logistic regression)。
    在稀少事件下,學者King and Zeng(2001)提出方法修正模型參數的最大概似估計量(Maximum Likelihood Estimator, MLE)之偏誤會增大的問題。他們也提供修正此偏誤的方法。近年,多數學者採用Firth(1993)所提出的懲罰最大概似估計法(penalized maximum likelihood method)降低MLE 的偏誤。學者Leitgöb(2013)的模擬比較了兩者,結果顯示,建議使用後者的懲罰最大概似估計法降低MLE 的偏誤。
    本論文將於高維度資料且事件發生機率很小時,用模擬方式探討 LASSO、SCAD以及Adaptive LASSO 三種懲罰項(penalty)建立的邏輯式迴歸,在篩選解釋變數上的表現。 經由模擬發現,LASSO 的預測結果較其它兩者差。建議研究者,在稀少事件下,應用懲罰邏輯式迴歸進行變數篩選及建模預測,使用Adaptive LASSO。

    It's well known that the accuracy of MLE of the regression coefficient in logistic regression model is seriously affected by rare events. Less attention is given to the performance of variable selection in logistic regression with rare events. Therefore, this thesis studies the performance of three variable selection methods, LASSO (Least Absolute Shrinkage and Selection Operator), SCAD (Smoothly Clipper Absolute Deviation), and Adaptive LASSO, when event rate is low and the number of explanatory variables is much larger than sample sizes.
    A simulation study is conducted to compare the accuracy in selecting important explanatory variables of logistic regression model. Based on limited simulation scenarios, when event rate is as low as 0.05, the simulation results recommended using Adaptive LASSO to select important explanatory variables. Consequently, Adaptive LASSO is recommended for variable selection and prediction with rare events data.

    第一章 緒論 ......................................... 1 第二章 文獻回顧 ..................................... 4 第一節 邏輯式迴歸模型 ............................... 4 第二節 懲罰項介紹 ................................. 6 第三節 Firth 邏輯式迴歸模型 ............................ 8 第三章 模擬探討 ..................................... 9 第一節 模擬設計 ................................... 9 第二節 模擬結果 ................................... 11 第四章 實例分析 ..................................... 17 第一節 資料集介紹 ................................. 17 第二節 由1020 基因中篩選出基因的預測能力 .................. 20 第三節 由TOP 100 基因中篩選出基因的預測能力 ................ 23 第四節 Firth 邏輯式迴歸模型 ............................ 25 第五章 結論與建議 ................................... 31 參考文獻 ........................................... 32

    1. Austin, E., Pan, W. and Shen, X. (2013). “ Penalized regression and risk prediction in
    Genome-Wide association studies.” Statistical Analysis and Data Mining, Vol. 6, pp.
    315-328.
    2. Fan and Li. (2001). “Variable selection via nonconcave penalized likelihood and its
    oracle properties.” Journal of the American Statistical Association, Vol. 96, No. 456,
    pp. 1348-1360.
    3. Firth, D. (1993). “Bias reduction of maximum likelihood estimates” Biomelrika, Vol.
    80, No. 1, pp. 27-38.
    4. Geeleher, P., Cox, N. and Huang, R. (2014). “Clinical drug response can be predicted
    using baseline gene expression levels and in vitrodrug sensitivity in cell lines.”
    Genome Biology, pp. 1-12.
    5. Heinze, G., Wallisch, C. and Dunkler, D. (2018). “Variable selection – A review and
    recommendations for the practicing statistician.” Biometrical Journal, Vol. 60, pp.
    431–449.
    6. Holland, P. and Welsch, R. (1977). “Robust regression using iteratively reweighted
    least-squares.” Communications in Statistics - Theory and Methods, Vol. 6, pp.
    813-827.
    7. Kim, S. and Halabi, S. (2016). “High dimensional variable selection with error
    control.” BioMed Research International, Vol. 2016. pp. 1-11.
    8. King, G. and Zeng, L. (2001). “Logistic regression in rare events data.” Political
    analysis, Vol. 9, No. 2, pp. 137-163.
    9. Kyung, M., Gill, J., Ghosh, M. and Casella, G. (2010). “Penalized Regression, Standard Errors, and Bayesian Lassos.” Bayesian Analysis, Vol. 5, No. 2, pp. 369-411.
    10. Leitgöb, H. (2013). “The problem of modeling rare events in ML-based logistic
    regression.” European Survey Research, pp. 1-19.
    11. Pavlou, M., Ambler, G., Seaman, S., Guttmann, O., Elliott, P., King, M. and Omar, R.
    (2015). “How to develop a more accurate risk prediction model when there are few
    events.” Research Methods & Reporting, pp. 1-5.
    12. Shieh, G., Lok, M. and Chang, J. (2018). “Prediction of Cancer Drug Response.” The
    27th South Taiwan Statistics Conference and 2018 Chinese Institute of Probability and
    Statistics Annual Meeting and Chung-hwa Data Mining Society Annual Meeting, pp.
    1-36.
    13. Tibshirani, R. (1996). “Regression shrinkage and selection via the Lasso.” Journal of
    the Royal Statistical Society. Series B, Vol. 58, No. 1, pp. 267-288.
    14. Zou, H. and Hastie, T. (2005). “Regularization and variable selection via the elastic
    net.” Journal of the Royal Statistical Society. Series B, Vol. 67, Part 2, pp. 301–320.
    15. Zou, H. (2006). “The Adaptive Lasso and its oracle properties.” Journal of the
    American Statistical Association, Vol. 101, pp. 1418-1429.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE