簡易檢索 / 詳目顯示

研究生: 張挺軒
Chang, Ting-Xuan
論文名稱: 應用廣義可加模型對全基因體定序資料進行多位點關聯分析
Multilocus association analysis using generalized additive model for whole genome sequencing data
指導教授: 李俊毅
Li, Chung-­I
學位類別: 碩士
Master
系所名稱: 管理學院 - 統計學系
Department of Statistics
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 31
中文關鍵詞: 生物資訊次世代定序資料廣義可加模型單核苷酸多態性
外文關鍵詞: bioinformatics, next­generation sequencing data, generalized additive model, single­nucleotide polymorphism
相關次數: 點閱:91下載:21
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 全基因體定序技術除了能幫助我們得到人類身上 30 億對鹼基對 (base pair) 的定序結果,也因定序技術的突破其所花費的成本也大幅降低,因此帶動了關聯分析(association analysis) 方法的蓬勃發展,從此基因變異位點與疾病風險之間相關性的研究也開始被廣泛進行。本研究採用原始基因定序資料,針對每一個單核苷酸多態性 (single­nucleotide polymorphism, SNP) 位點擷取測序深度及次要等位基因個數的資訊,計算突變等位基因頻率 (variant allele frequency, VAF) 作為新的風險指標,並在基因組關聯分析 (set based association test) 的基礎上,使用廣義可加模型 (generalized additive model, GAM) 在校正患者之年齡及身體質量指數的影響下,探討產生神經病變與風險指標集合之間的相關性。另外,針對現行已提的方法 VEGAS(versatile gene­ based association study),藉由統計模擬及實際資料的比較分析,結果可以說明當多位點關聯呈非線性時,本研究提出的方法具有較高的檢定力。

    The whole genome sequencing (WGS) technology can help clinical cancer researchers obtain the sequencing results of 3 billion base pairs in humans, which has led to the rapid development of association analysis methods. Researchers have begun to work on the association between disease risks and SNP sites. Our research extracts information from raw WGS data to calculate variant allele frequency (VAF) as a new risk score. To conduct set based association analysis, we use generalized additive model (GAM) to determine the association between VAFs set and disease status. Through statistical simulation and analysis of real data, we compare our method to versatile gene based association study(VEGAS) and conclude that our method GAM has higher power when the relationship is non­linear.

    摘要 i 英文延伸摘要 ii 目錄 ix 表格 x 圖片 xi 第 1 章. 緒論 1 1.1. 研究背景與動機 1 1.2. 研究目的 1 第 2 章. 文獻回顧 3 2.1. 取得基因組資料 3 2.2. 全基因組關聯分析 3 2.3. 罕見變異關聯分析 4 2.4. 基因組關聯分析 5 第 3 章. 統計方法 6 3.1. 符號定義 6 3.2. 風險指標:VAF 6 3.3. 廣義可加模型 7 3.3.1. 三次樣條函數 cubic spline 7 3.3.2. 局部加權迴歸散布圖平滑法 LOESS 8 3.4. 配適模型 9 第 4 章. 模擬研究 11 4.1. 模擬 VAF 及環境變數資料 11 4.2. 模擬結果 12 4.2.1. 平均距離 13 4.2.2. 型 I 誤差 13 4.2.3. 檢定力 16 第 5 章. 資料分析 22 5.1. 敘述統計 22 5.2. 分析結果 25 第 6 章. 結論與建議 29 References 30

    [1] Leo Breiman and Jerome H Friedman. Estimating optimal transformations for multiple regression and correlation. Journal of the American statistical Association,
    80(391):580–598, 1985.
    [2] Robert B Davies. The distribution of a linear combination of χ2 random variables.
    Journal of the Royal Statistical Society: Series C (Applied Statistics), 29(3):323–333,
    1980.
    [3] Christiaan A De Leeuw, Benjamin M Neale, Tom Heskes, and Danielle Posthuma. The
    statistical properties of gene­set analysis. Nature Reviews Genetics, 17(6):353, 2016.
    [4] Andriy Derkach, Jerry F Lawless, and Lei Sun. Robust and powerful tests for rare
    variants using fisher’s method to combine evidence of association from two or more
    complementary tests. Genetic epidemiology, 37(1):110–121, 2013.
    [5] Trevor Hastie. gam: Generalized Additive Models, 2020. R package version 1.20.
    [6] Trevor Hastie and Robert Tibshirani. Generalized additive models: some applications.
    Journal of the American Statistical Association, 82(398):371–386, 1987.
    [7] Jaehyun Joo and Blanca Himes. snpsettest: A Set­Based Association Test using GWAS
    Summary Statistics, 2021. R package version 0.1.0.
    [8] Seunggeun Lee, Michael C Wu, and Xihong Lin. Optimal tests for rare variant effects
    in sequencing association studies. Biostatistics, 13(4):762–775, 2012.
    [9] Seunggeung Lee, Gonçalo R Abecasis, Michael Boehnke, and Xihong Lin. Rare­variant
    association analysis: study designs and statistical tests. The American Journal of Human
    Genetics, 95(1):5–23, 2014.
    [10] Dan­Yu Lin and Zheng­Zheng Tang. A general framework for detecting disease associations with rare variants in sequencing studies. The American Journal of Human
    Genetics, 89(3):354–367, 2011.
    [11] Jimmy Z Liu, Allan F Mcrae, Dale R Nyholt, Sarah E Medland, Naomi R Wray,
    Kevin M Brown, Nicholas K Hayward, Grant W Montgomery, Peter M Visscher,
    Nicholas G Martin, et al. A versatile gene­based test for genome­wide association studies. The American Journal of Human Genetics, 87(1):139–145, 2010.
    [12] Andries T Marees, Hilde de Kluiver, Sven Stringer, Florence Vorspan, Emmanuel Curis,
    Cynthia Marie­Claire, and Eske M Derks. A tutorial on conducting genome­wide association studies: Quality control and statistical analysis. International journal of methods
    in psychiatric research, 27(2):e1608, 2018.
    [13] Stephan Morgenthaler and William G Thilly. A strategy to discover genes that carry
    multi­allelic or mono­allelic risk for common diseases: a cohort allelic sums test (cast). Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis, 615(1­
    2):28–56, 2007.
    [14] Benjamin M Neale, Manuel A Rivas, Benjamin F Voight, David Altshuler, Bernie
    Devlin, Marju Orho­Melander, Sekar Kathiresan, Shaun M Purcell, Kathryn Roeder,
    and Mark J Daly. Testing for an unusual distribution of rare variants. PLoS Genet,
    7(3):e1001322, 2011.
    [15] Wei Pan. Asymptotic tests of association with multiple snps in linkage disequilibrium.
    Genetic Epidemiology: The Official Publication of the International Genetic Epidemiology Society, 33(6):497–507, 2009.
    [16] S.N Wood. Generalized Additive Models: An Introduction with R. Chapman and
    Hall/CRC, 2 edition, 2017.
    [17] Michael C Wu, Seunggeun Lee, Tianxi Cai, Yun Li, Michael Boehnke, and Xihong Lin.
    Rare­variant association testing for sequencing data with the sequence kernel association test. The American Journal of Human Genetics, 89(1):82–93, 2011.

    下載圖示 校內:2026-07-27公開
    校外:2021-07-27公開
    QR CODE