| 研究生: |
廖鈺茹 Liao, Yu-Ru |
|---|---|
| 論文名稱: |
DNA 拷貝數及RNA 表現量間的關係建模-以大腸直腸癌患者為例 Modeling Association between DNA Copy Number and RNA Expressions on Colon Adenocarcinoma Patients |
| 指導教授: |
鄭順林
Jeng, Shuen-Lin |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 統計學系 Department of Statistics |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 英文 |
| 論文頁數: | 105 |
| 中文關鍵詞: | COAD 、MIC 、dCor 、MARS 、DNA拷貝數 、RNA 、關聯 |
| 外文關鍵詞: | COAD, MIC, dCor, MARS, DNA copy number, RNA, association |
| 相關次數: | 點閱:82 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
DNA拷貝數(CN)和RNA表達之間的關聯是癌症研究中的重要問題。具有強DNA-RNA結合的關鍵基因可以作為治療靶標。在這項研究中,我們探索了幾種方法來識別特定癌症患者群體中具有強DNA-RNA關聯的基因。我們分析了從Genomic Data Commons(GDC)下載的結腸腺癌(COAD)患者。我們使用工具集'Bedtools'和註釋'非重疊GRCh38'來建立一個新的計算讀數計數方法。新計算的讀數計數方法是解決不同轉錄本的外顯子可能重疊的問題。在獲得讀數後,我們使用最大信息係數(MIC),距離相關(dCor)和多變量自適應回歸樣條(MARS)來找出DNA拷貝數表達與RNA表達之間的二維關聯。本研究中的創新算法稱為A2dMIC,它能夠計算DNA拷貝數表達與基因間RNA表達之間的三維關聯。
The association between DNA Copy Number (CN) and RNA expressions is an important issue in cancer studies. The critical genes with strong DNA-RNA association may serve as therapeutic targets. In this study, we explore several methods to identify genes with strong DNA-RNA association in specific groups of cancer patients. We analyze the patients with the Colon Adenocarcinoma (COAD) downloaded from the Genomic Data Commons (GDC). We used the toolset 'Bedtools” and the annotation 'non-overlap GRCh38” to establish a new calculation read count method. The new calculated read count method is to solve the problem that the exons of the different transcripts may be overlapping. After obtaining the read counts, we using maximal information coefficient (MIC), distance correlation (dCor) and Multivariate Adaptive Regression Splines (MARS) to find out the two-dimensional association between DNA copy number expression and RNA expressions. And the innovative algorithm in this study is called A2dMIC, which is able to calculate the three-dimensional association between the DNA copy number expression and RNA expressions across genes.
[1] Simon Anders, Paul Theodor Pyl, and Wolfgang Huber. Htseq—a python framework
to work with high-throughput sequencing data. Bioinformatics, 31(2):166–169, 2015.
[2] Joydeep Bhattacharya, Ernesto Pereda, and Christos Ioannou. Functional associations
at global brain level during perception of an auditory illusion by applying maximal information
coefficient. Physica A: Statistical Mechanics and its Applications, 491:708–
715, 2018.
[3] Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The elements of statistical
learning, volume 1. Springer series in statistics New York, 2001.
[4] Jerome H Friedman et al. Multivariate adaptive regression splines. The annals of statistics,
19(1):1–67, 1991.
[5] Belinda Giardine, Cathy Riemer, Ross C Hardison, Richard Burhans, Laura Elnitski,
Prachi Shah, Yi Zhang, Daniel Blankenberg, Istvan Albert, James Taylor, et al. Galaxy:
a platform for interactive large-scale genome analysis. Genome research, 15(10):1451–
1455, 2005.
[6] illumina. Bam file format. Technical report, illumina, 2106.
[7] Ying Jin, Oliver H Tam, Eric Paniagua, and Molly Hammell. Tetranscripts: a package
for including transposable elements in differential expression analysis of rna-seq
datasets. Bioinformatics, 31(22):3593–3599, 2015.
[8] Justin B Kinney and Gurinder S Atwal. Equitability, mutual information, and the
maximal information coefficient. Proceedings of the National Academy of Sciences,
111(9):3354–3359, 2014.
[9] Günter Klambauer, Karin Schwarzbauer, Andreas Mayr, Djork-Arne Clevert, Andreas
Mitterecker, Ulrich Bodenhofer, and Sepp Hochreiter. cn. mops: mixture of poissons
for discovering copy number variations in next-generation sequencing data with a low
false discovery rate. Nucleic acids research, 40(9):e69–e69, 2012.
[10] Alexander Kraskov, Harald Stögbauer, and Peter Grassberger. Estimating mutual information.
Physical review E, 69(6):066138, 2004.
[11] Gwenaël GR Leday, Aad W van der Vaart, Wessel N van Wieringen, and Mark A van de
Wiel. Modeling association between dna copy number and gene expression with constrained
piecewise linear regression splines. The Annals of Applied Statistics, pages
823–845, 2013.
[12] Yang Liao, Gordon K Smyth, and Wei Shi. featurecounts: an efficient general purpose
program for assigning sequence reads to genomic features. Bioinformatics, 30(7):923–
930, 2013.
[13] HM Liu, N Rao, D Yang, L Yang, Y Li, and F Ou. A novel method for identifying snp
disease association based on maximal information coefficient. Genetics and molecular
research: GMR, 13(4):10863, 2014.
[14] Abbas Parsaie, Amir Hamzeh Haghiabi, Mojtaba Saneie, and Hasan Torabi. Prediction
of energy dissipation on the stepped spillway using the multivariate adaptive regression
splines. ISH Journal of Hydraulic Engineering, 22(3):281–292, 2016.
[15] Brent S Pedersen and Aaron R Quinlan. Mosdepth: quick coverage calculation for
genomes and exomes. Bioinformatics, 34(5):867–868, 2017.
[16] Jonathan R Pollack, Therese Sørlie, Charles M Perou, Christian A Rees, Stefanie S
Jeffrey, Per E Lonning, Robert Tibshirani, David Botstein, Anne-Lise Børresen-Dale,
and Patrick O Brown. Microarray analysis reveals a major direct role of dna copy
number alteration in the transcriptional program of human breast tumors. Proceedings
of the National Academy of Sciences, 99(20):12963–12968, 2002.
[17] Aaron R Quinlan and Ira M Hall. Bedtools: a flexible suite of utilities for comparing
genomic features. Bioinformatics, 26(6):841–842, 2010.
[18] David N Reshef, Yakir A Reshef, Hilary K Finucane, Sharon R Grossman, Gilean
McVean, Peter J Turnbaugh, Eric S Lander, Michael Mitzenmacher, and Pardis C Sabeti.
Detecting novel associations in large data sets. science, 334(6062):1518–1524,
2011.
[19] Marc W Schmid and Ueli Grossniklaus. Rcount: simple and flexible rna-seq read counting.
Bioinformatics, 31(3):436–437, 2014.
[20] Gabriella Sferra, Federica Fratini, Marta Ponzi, and Elisabetta Pizzi. Phylo_dcor: distance
correlation as a novel metric for phylogenetic profiling. BMC bioinformatics,
18(1):396, 2017.
[21] Fubo Shao, Keping Li, and Yulin Dong. Identifying multi-variable relationships based
on the maximal information coefficient. Intelligent Data Analysis, 21(1):151–166,
2017.
[22] Junhui Shen, Suhas Vasaikar, and Bing Zhang. Dlad4u: deriving and prioritizing disease
lists from pubmed literature. BMC bioinformatics, 19(17):495, 2018.
[23] Terry Speed. A correlation for the 21st century. Science, 334(6062):1502–1503, 2011.
[24] Gábor J Székely and Maria L Rizzo. Partial distance correlation. In Nonparametric
Statistics, pages 179–190. Springer, 2016.
[25] Gábor J Székely, Maria L Rizzo, Nail K Bakirov, et al. Measuring and testing dependence
by correlation of distances. The annals of statistics, 35(6):2769–2794, 2007.
[26] Gábor J Székely, Maria L Rizzo, et al. Partial distance correlation with methods for
dissimilarities. The Annals of Statistics, 42(6):2382–2412, 2014.
[27] D.C. U.S. Capitol Washington. Remarks by the president in state of the union address
| january 20, 2015. Technical report, The White House Office of the Press Secretarys,
January 20, 2015.
[28] James D Watson and Elke Jordan. The human genome program at the national institutes
of health. Genomics, 5(3):654–656, 1989.
[29] Ziheng Yang and Rasmus Nielsen. Estimating synonymous and nonsynonymous substitution
rates under realistic evolutionary models. Molecular biology and evolution,17(1):32–43, 2000.
[30] Wengang Zhang and Anthony TC Goh. Multivariate adaptive regression splines and
neural network models for prediction of pile drivability. Geoscience Frontiers, 7(1):45–52, 2016.