簡易檢索 / 詳目顯示

研究生: 洪啓豪
Hong, Chi-Hao
論文名稱: 使用線性一致估計於連續性狀基因組關聯研究
Linear Consistent Estimator for Continuous Trait Genome-wide Association Studies
指導教授: 張升懋
Chang, Sheng-Mao
學位類別: 碩士
Master
系所名稱: 管理學院 - 統計學系
Department of Statistics
論文出版年: 2011
畢業學年度: 99
語文別: 中文
論文頁數: 45
中文關鍵詞: 線性一致估計量Adaptive LassoLocal False Discovery RateGeneralized cross validation
外文關鍵詞: Linear Consistent Estimator, Adaptive Lasso, Local False Discovery Rate, Generalized cross validation
相關次數: 點閱:105下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究欲發展出一套程序以找出可能影響疾病的基因。我們常使用線性迴歸來解釋反應變數和解釋變數間的關係。根據文獻結果顯示,使用簡單線性迴歸,容易因為和其他變數的相關性造成偽陽性的判斷;相較之下,使用複迴歸則不易出現此種問題。然而,一旦遇到大量解釋變數時,卻可能受限於有限的樣本數,而無法得到一個最佳線性不偏估計量(BLUE)。因此,我們希望透過一個簡單線性迴歸和複迴歸間的近似關係,得到解釋變數的參數估計值。而此近似方式的一部份為解釋變數間相關係數矩陣的反矩陣。當樣本數大於解釋變數個數的時候,相關係數矩陣為一個可逆的正定矩陣;但當樣本數少於解釋變數個數時,其為一個不可逆的矩陣,使得此轉換方式受到限制。除了樣本數之外,使用複迴歸還有一個常見的問題就是變數的選取,雖然過去發展了很多的指標來判定變數選取的合理性,但這些方式容易受到資料的變化而有很大的改變。
    本研究的整個過程包含兩個部分:第一部分是提出一個線性一致估計量來解決相關係數矩陣不可逆性的問題。使用Adaptive Lasso來估計一個基因間的稀疏相關係數矩陣,並使其具有可逆性。;第二部分是估計出的複迴歸係數中,在指定Local False Discovery Rate下找出可能影響疾病的基因。過程中,兩個未知參數,Adaptive Lasso之調整參數λ與Local False Discovery Rate之門檻值q,使用Generalized cross validation(GCV)來決定最適當的數值。本研究將使用模擬的方式來探討整個過程的成果,其中包含樣本數的影響、複迴歸中R-square的影響以及真實顯著變數之位置的影響。

    In this thesis, a novel procedure is proposed to identify disease-causing genes. Simple linear regressions were popularly used to figure out the relationships between the independent variable and dependent variables. It is a good way to find the correlation but not the causality when the underlying (linear) model consists of several independent variables. Instead, multiple regressions could avoid this problem. We utilize the relationship between the regression coefficients of simple linear regressions and the regression coefficients of the corresponding multiple regression in population level to estimate parameters by matching moments. The inverse of dependent variables' sample correlation matrix plays the key role in this moment estimator. A problem arises when the sample size is less than the number of independent variables. In that case the resulting sample correlation matrix is no longer invertible. Another technical problem we face is the variable selection issue. Although a lot of variable selection schemes have been developed in various points of view, it is treated as a multiple testing problem in this work.
    The proposed procedure consists of two parts. First, a linear consistent estimator of regression coefficients is provided. The singular sample correlation among thousands of genes is replaced by the adaptive Lasso correlation estimate which is sparse and nonsingular. Second, under a pre-specified local false discovery rate, the disease-causing genes are identified via multiple regression. Generalized cross validation is applied to adjust two unknown quantities: the turning parameter of Adaptive Lasso, λ , and the threshold of local false discovery, q. Finally, the proposed procedure is examined by simulations. Factors under consideration include the sample size, the noise level of regression measured by coefficient of determination, and the location of affecting genes.

    第一章 緒論 1 第二章 文獻回顧 5 第一節 最小平方法 5 第二節 Bridge Regression 6 第三節 Bridge Regression的特例: 最小平方法、Ridge Regression與Lasso 9 第四節 Adaptive Lasso 10 第三章 研究方法 12 第一節 線性一致估計 12 第二節 稀疏的相關矩陣估計 14 第三節 調整參數的選擇 17 第四節 變數的選擇 17 3.4.1 False Discovery Rate 18 3.4.2 Local False Discovery Rate 19 第四章 模擬研究 24 第一節 模擬探討 25 第二節 模擬設計 28 第三節 預期效果 30 第四節 模擬結果 30 第五章 結論與建議 44 參考文獻 45

    1. Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing, Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 289-300.
    2. Efron, B. (2004). Large-scale simultaneous hypothesis testing: the choice of a null hypothesis, Journal of the American Statistical Association, 99(465), 96-104.
    3. Efron, B. (2009). Correlated z-values and the accuracy of large-scale statistical estimates, Working paper, Stanford University, 2009.
    4. Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96(456), 1348-1360.
    5. Frank, I. E. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools, Technometric, 35(2), 109-135.
    6. Fu, W. J. (1998). Penalized regressions: the bridge versus the lasso, Journal of Computational and Graphical Statistics, 7(3), 397-416.
    7. Hoerl, A. E. and Kennard, R. W. (1970a). Ridge regression: biased estimation for nonorthogonal problems, Technometrics, 42(1), 80-86.
    8. Lu, W. and Zhang, H. H. (2007). Variable selection for proportional odds model, Statistics in Medicine, 26, 3771-3781.
    9. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267-288.
    10. Zhang, H. H. and Lu, W. (2007). Adaptive lasso for Cox's proportional hazards model, Biometrika, 94(3), 691-703.
    11. Zou, H. (2006). The adaptive Lasso and its oracle properties, Journal of the American Statistical Association, 101(476), 1418-1429.

    無法下載圖示 校內:2014-07-18公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE