| 研究生: |
吳方渝 Wu, Fang-Yu |
|---|---|
| 論文名稱: |
利用核密度和廣義估計方程式估計致病基因的個數 Using Kernel Density Estimation and Generalized Estimating Equation to Estimate the Number of Diseased Genes |
| 指導教授: |
馬瀰嘉
Ma, Mi-Chia |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 統計學系 Department of Statistics |
| 論文出版年: | 2017 |
| 畢業學年度: | 105 |
| 語文別: | 中文 |
| 論文頁數: | 38 |
| 中文關鍵詞: | EM演算法 、核密度估計 、廣義估計方程式 |
| 外文關鍵詞: | EM algorithm, kernel density estimation, generalized estimation equation |
| 相關次數: | 點閱:151 下載:14 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
找出致病基因在醫學研究中是非常重要的議題,生物學家藉由同一個病人身上的腫瘤細胞和正常細胞進行基因定序,基因讀值經過RPKM(reads per kilobyte of exon model per million mapped reads)校正後之差值,可利用成對樣本t檢定找出致病基因。但數以萬計的基因進行多重檢定時,如果不調整個別檢定之顯著水準,則整體型一誤差就會膨脹。目前主要解決方法為控制FDR(false discovery rate)和FWER(familywise error rate)。但當虛無假設不為真時,FWER方法會有較小的檢定力而且趨近保守。但是不論控制FDR或FWER,首先需準確地估計虛無假設的個數。
本研究是針對鄭暘諭(2016)所提出對虛無假設個數進行估計的EM演算法,從單維度拓展至多維度的探討。本研究假設基因資料呈混合型多變量常態分配,估計方法主要分為兩個部分,第一部份提出利用EM演算法以及核密度估計(Kernel Density Estimation) 的兩種估計方法,第二部分利用廣義估計方程式(Generalized estimating equation,簡稱GEE)進行估計虛無假設為真的比例和單一顯著水準α值。最後,考慮基因表現值分別在低、中和高相關時,和資料是否呈多變量常態分配下進行模擬,並比較和探討三種提出方法的優劣。
In medical research, it is a very important issue to find diseased genes. The corrected read counts of the cell is transformed by RPKM after gene sequencing. We use the difference of corrected read counts between the same patient's tumor cells and normal cells to perform a paired sample t test and find diseased genes. In statistical testing for a lot of genes, if individual type I error rate is still set significance level α, then the overall type I error rate will be inflated. The main solutions are FDR (false discovery rate) and FWER (familywise error rate). When the null hypotheses are not true in multiple testing problem, the FWER method has less test power and become conservative. But no matter using FDR and FWER, it is important to estimate the exact number of true null hypotheses.
This paper is an extension of EM algorithm method proposed by Zheng (2016) in estimating the number of true null hypotheses. We extend the method from one-dimension to multi-dimension. In this study, we assume that the gene data are the mixed multivariate normal distribution. The estimation method is divided into two parts, the first part is to extend the EM algorithm method and propose the kernel
density estimation method, the second part is to estimate the proportion of the true null hypothesis and FDR by generalized estimating equation method.
Finally, a simulation study is conducted to explore and compare three proposed methods under different distributions and simulated corrected read counts of genes at low, moderate and high correlations, respectively.
1. Benjamini, Y., & Hochberg, Y. (1995). “Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing”, Journal of the Royal Statistical Society, B 57, pp.289-300.
2. Benjamini, Y., Hochberg, Y. (2000). “On the Adaptive Control of the False Discovery Rate in Multiple Testing with Independent Statistic”, Journal of Educational and Behavioral Statistics, 25, pp. 60-83.
3. Højsgaard, S., Halekoh, U.,Yan J. (2006). “The R Package geepack for Generalized Estimating Equations”, Journal of Statistical Software, 15,pp.1—11.
4. Liang, K.Y., Zeger, S. L. (1986). “ Longitudinal Data Analysis Using Generalized Linear Models”, Biometrika, 73, pp. 13-22.
5. Ma, M. C., Chao, W. C. (2011). “A Nonparametric Approach of Estimating the Number of True Null Hypotheses in Multiple Testing”, International Statistical Institute, August, Ireland, pp.4669-4674.
6. Ma, M. C., Tsai, C. Y. (2011). “A Nonparametric Approach to Estimate the Number of True Null Hypotheses in Multiple Testing under Dependency”. Master essay of Department of Statistics, NCKU.
7. Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L., & Wold, B. (2008). “Mapping and quantifying mammalian transcriptomes by RNA-Seq”. Nature method, 5, pp.621-628.
8. Wedderburn, R. W. M. (1974). “Quasi-Likelihood Functions, Generalized Linear Models, and the Gauss-Newton Method”. Biometrika,61,pp.439-447.
9. 許乾柚(2008),「利用混合模型估計多重比較中真實虛無假設個數」,國立台北大學統計學系碩士論文。
10. 鄭暘諭(2016),「利用經驗貝氏方法估計錯誤發現率」,國立成功大學統計學系碩士論文。