研究生: |
曾郁婷 Tseng, Yu-Ting |
---|---|
論文名稱: |
利用可解讀式學習為關聯性研究選擇單核苷酸多型性 SNP selection for association studies using interpretable learning |
指導教授: |
張天豪
Chang, Tien-Hao |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2016 |
畢業學年度: | 104 |
語文別: | 中文 |
論文頁數: | 28 |
中文關鍵詞: | 單核苷酸多型性 、SNP-SNP interaction |
外文關鍵詞: | SNP-SNP interaction, interpretable learning, single nucleotide polymorphism(SNP), genome wide association study(GWAS)) |
相關次數: | 點閱:62 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近十年來,隨著次世代高通量定序技術的演進,DNA 定序所需花費的成本急遽下降,全基因組關聯性研究也得以日漸普遍,並成為理解各物種遺傳變化的最新途徑,尤其常見的研究方向是確認與疾病相關的單核甘酸多形性(Sngle-nucleotide polymorphism, SNP)之間相互的作用,然而全部的 SNPs 總數十分龐大,全基因組數據在預測器的尺寸比樣本數大上許多,如此高維度的數據在分析與統計模型的建立上特別是一種挑戰。
在本研究中,我們提出一個可解讀式的學習方法,從高維度的基因組數據中過濾篩選出可能造成疾病的特徵,因許多疾病的發生都是由於多個變異位點相互作用而產生的,如何發掘與疾病相關之單核苷酸多型性間的相互關係,是本研究主要欲討的議題。實驗結果表明,我們提出的分析方法應用在雜訊程度小於 10%的數據時,能夠在 10 個 SNPs 內找到真正與疾病有關連的 SNP,雜訊程度在 10%~15%時比起一般常見的卡方檢驗及 logistic regression 仍然能夠在更小的範圍中找出與疾病相關之具有 SNP-SNP interaction 的 SNPs,而在雜訊程度大於 15%後,本研究提出的方法就不具有優勢了,期盼本研究能讓未來的研究者在全基因組關聯性研究中對 SNP 間的關聯性檢測能有更佳的選擇。
In this research, we offer a interpretable learning method to filter SNPs which may cause particular disease from high dimensional genomic data. Many other method need to select gene which is related to the disease or trait in advance. Then analysis the correlation of the snps in these pre-selected genes. contract with traditional method, our new approach can not only detect the effect between each two snps but deal with whole genome wide data. By pair each single SNP together, we can calculate the association between the effect of specifics traits and the group of snps. The result shows that the method we provide can find the target snp in a smaller range. This research provide a new choice to researchers in analysis SNPs of genome wide association study
1.Consortium, G.P., An integrated map of genetic variation from 1,092 human genomes. Nature, 2012. 491(7422): p. 56-65.
2. Sanger, F., S. Nicklen, and A.R. Coulson, DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences, 1977. 74(12): p.5463-5467.
3. Sanger, F., Sequences, sequences, and sequences. Annual review of biochemistry, 1988. 57(1): p. 1-29.
4. Consortium, I.H.G.S., Finishing the euchromatic sequence of the human genome. Nature, 2004. 431(7011): p. 931-945.
5. Shendure, J., et al., Advanced sequencing technologies: methods and goals. Nature Reviews Genetics, 2004. 5(5): p. 335-344.
6. Metzker, M.L., Sequencing technologies—the next generation. Nature reviews genetics, 2010. 11(1): p. 31-46.
7. Chi, K.R., The year of sequencing. Nature methods, 2008. 5(1): p. 11-14.8. Eid, J., et al., Real-time DNA sequencing from single polymerase molecules. Science, 2009. 323(5910): p. 133-138.
9. Cho, Y.S., et al., Meta-analysis of genome-wide association studies identifies eight new loci for type 2 diabetes in east Asians. Nature genetics, 2012. 44(1): p. 67-72.
10. Ripke, S., et al., Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nature genetics, 2013. 45(10): p. 1150-1159.
11. Figueroa, J.D., et al., Genome-wide association study identifies multiple loci associated with bladder cancer risk. Human molecular genetics, 2014. 23(5): p.1387-1398.
12. Zhang, X., et al., Genetic polymorphisms in cell cycle regulatory genes MDM2 and TP53 are associated with susceptibility to lung cancer. Human mutation,2006. 27(1): p. 110-117.
13. Milne, R.L., et al., A large-scale assessment of two-way SNP interactions in breast cancer susceptibility using 46 450 cases and 42 461 controls from the breast cancer association consortium. Human molecular genetics, 2014. 23(7): p.
1934-1946.
14. Ge, D., et al., Multilocus analyses of renin–angiotensin–aldosterone system gene variants on blood pressure at rest and during behavioral stress in young normotensive subjects. Hypertension, 2007. 49(1): p. 107-112.