| 研究生: |
張挺軒 Chang, Ting-Xuan |
|---|---|
| 論文名稱: |
應用廣義可加模型對全基因體定序資料進行多位點關聯分析 Multilocus association analysis using generalized additive model for whole genome sequencing data |
| 指導教授: |
李俊毅
Li, Chung-I |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 統計學系 Department of Statistics |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 中文 |
| 論文頁數: | 31 |
| 中文關鍵詞: | 生物資訊 、次世代定序資料 、廣義可加模型 、單核苷酸多態性 |
| 外文關鍵詞: | bioinformatics, nextgeneration sequencing data, generalized additive model, singlenucleotide polymorphism |
| 相關次數: | 點閱:91 下載:21 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
全基因體定序技術除了能幫助我們得到人類身上 30 億對鹼基對 (base pair) 的定序結果,也因定序技術的突破其所花費的成本也大幅降低,因此帶動了關聯分析(association analysis) 方法的蓬勃發展,從此基因變異位點與疾病風險之間相關性的研究也開始被廣泛進行。本研究採用原始基因定序資料,針對每一個單核苷酸多態性 (singlenucleotide polymorphism, SNP) 位點擷取測序深度及次要等位基因個數的資訊,計算突變等位基因頻率 (variant allele frequency, VAF) 作為新的風險指標,並在基因組關聯分析 (set based association test) 的基礎上,使用廣義可加模型 (generalized additive model, GAM) 在校正患者之年齡及身體質量指數的影響下,探討產生神經病變與風險指標集合之間的相關性。另外,針對現行已提的方法 VEGAS(versatile gene based association study),藉由統計模擬及實際資料的比較分析,結果可以說明當多位點關聯呈非線性時,本研究提出的方法具有較高的檢定力。
The whole genome sequencing (WGS) technology can help clinical cancer researchers obtain the sequencing results of 3 billion base pairs in humans, which has led to the rapid development of association analysis methods. Researchers have begun to work on the association between disease risks and SNP sites. Our research extracts information from raw WGS data to calculate variant allele frequency (VAF) as a new risk score. To conduct set based association analysis, we use generalized additive model (GAM) to determine the association between VAFs set and disease status. Through statistical simulation and analysis of real data, we compare our method to versatile gene based association study(VEGAS) and conclude that our method GAM has higher power when the relationship is nonlinear.
[1] Leo Breiman and Jerome H Friedman. Estimating optimal transformations for multiple regression and correlation. Journal of the American statistical Association,
80(391):580–598, 1985.
[2] Robert B Davies. The distribution of a linear combination of χ2 random variables.
Journal of the Royal Statistical Society: Series C (Applied Statistics), 29(3):323–333,
1980.
[3] Christiaan A De Leeuw, Benjamin M Neale, Tom Heskes, and Danielle Posthuma. The
statistical properties of geneset analysis. Nature Reviews Genetics, 17(6):353, 2016.
[4] Andriy Derkach, Jerry F Lawless, and Lei Sun. Robust and powerful tests for rare
variants using fisher’s method to combine evidence of association from two or more
complementary tests. Genetic epidemiology, 37(1):110–121, 2013.
[5] Trevor Hastie. gam: Generalized Additive Models, 2020. R package version 1.20.
[6] Trevor Hastie and Robert Tibshirani. Generalized additive models: some applications.
Journal of the American Statistical Association, 82(398):371–386, 1987.
[7] Jaehyun Joo and Blanca Himes. snpsettest: A SetBased Association Test using GWAS
Summary Statistics, 2021. R package version 0.1.0.
[8] Seunggeun Lee, Michael C Wu, and Xihong Lin. Optimal tests for rare variant effects
in sequencing association studies. Biostatistics, 13(4):762–775, 2012.
[9] Seunggeung Lee, Gonçalo R Abecasis, Michael Boehnke, and Xihong Lin. Rarevariant
association analysis: study designs and statistical tests. The American Journal of Human
Genetics, 95(1):5–23, 2014.
[10] DanYu Lin and ZhengZheng Tang. A general framework for detecting disease associations with rare variants in sequencing studies. The American Journal of Human
Genetics, 89(3):354–367, 2011.
[11] Jimmy Z Liu, Allan F Mcrae, Dale R Nyholt, Sarah E Medland, Naomi R Wray,
Kevin M Brown, Nicholas K Hayward, Grant W Montgomery, Peter M Visscher,
Nicholas G Martin, et al. A versatile genebased test for genomewide association studies. The American Journal of Human Genetics, 87(1):139–145, 2010.
[12] Andries T Marees, Hilde de Kluiver, Sven Stringer, Florence Vorspan, Emmanuel Curis,
Cynthia MarieClaire, and Eske M Derks. A tutorial on conducting genomewide association studies: Quality control and statistical analysis. International journal of methods
in psychiatric research, 27(2):e1608, 2018.
[13] Stephan Morgenthaler and William G Thilly. A strategy to discover genes that carry
multiallelic or monoallelic risk for common diseases: a cohort allelic sums test (cast). Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis, 615(1
2):28–56, 2007.
[14] Benjamin M Neale, Manuel A Rivas, Benjamin F Voight, David Altshuler, Bernie
Devlin, Marju OrhoMelander, Sekar Kathiresan, Shaun M Purcell, Kathryn Roeder,
and Mark J Daly. Testing for an unusual distribution of rare variants. PLoS Genet,
7(3):e1001322, 2011.
[15] Wei Pan. Asymptotic tests of association with multiple snps in linkage disequilibrium.
Genetic Epidemiology: The Official Publication of the International Genetic Epidemiology Society, 33(6):497–507, 2009.
[16] S.N Wood. Generalized Additive Models: An Introduction with R. Chapman and
Hall/CRC, 2 edition, 2017.
[17] Michael C Wu, Seunggeun Lee, Tianxi Cai, Yun Li, Michael Boehnke, and Xihong Lin.
Rarevariant association testing for sequencing data with the sequence kernel association test. The American Journal of Human Genetics, 89(1):82–93, 2011.