| 研究生: |
黃岑筑 Huang, Cen-Zhu |
|---|---|
| 論文名稱: |
相關基因讀值資料下利用多變量變異數分析決定基因缺失的位置 Using Multivariate Analysis of Variance to Determine Deletion Position under Dependent Gene Reads Count Data |
| 指導教授: |
馬瀰嘉
Ma, Mi-Chia |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 統計學系 Department of Statistics |
| 論文出版年: | 2017 |
| 畢業學年度: | 105 |
| 語文別: | 中文 |
| 論文頁數: | 21 |
| 中文關鍵詞: | 次世代定序 、多變量變異數分析 、改變點 、基因間相關 |
| 外文關鍵詞: | NGS, MANOVA, change point, correlated genes |
| 相關次數: | 點閱:110 下載:4 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
次世代定序(Next Generation Sequencing,簡稱NGS)是近年來被廣泛運用的一種生物技術,如:相較於傳統的羊膜穿刺術,非侵入性胎兒染色體檢測(NIPT)是一種相對安全的產檢方式,僅需要透過靜脈抽取孕婦的血液,取得胎兒在孕婦血液中微量的游離DNA,以檢測胎兒是否有先天性的疾病。
過去有許多文獻提出分析DNA複製數改變之統計方法,如Olshen & Venkatraman (2004)提出的環狀分段法(Circular Binary Segmentation, CBS),將基因數據連接成環狀,通過逐次比較局部有顯著差異的染色體片段以確定改變點(change point),或是Huang et al.(2007)提出的平滑分割法(smoothseg)藉由平滑資料偵測變異的統計方法來分析數據。
然而,從生物學的角度而言,在相同的染色體臂上附近的基因是傾向於高度相關的。但是這些分析方法皆沒有考慮到相鄰基因之間可能存在著相關性(Engler et al., 2006)。因此,本研究考慮在相鄰基因讀值資料有關的情況下,利用多變量變異數分析方法尋找基因缺失的位置(即改變點)。此外,我們將使用統計模擬方法與實例比較過去文獻和本研究所提出之方法的優劣。
Next generation sequencing (NGS) is a biotechnology which is widely used in recent years, such as: Non-invasive Prenatal Testing (NIPT) is a relatively safe method, compared with amniocentesis. Doctors extract the blood of pregnant women to obtain fetuses’ DNA, in order to detect whether fetuses have a congenital disease.
There are many previous studies for analyzing DNA copy number, such as Olshen et al.(2004) proposed a Circular Binary Segmentation (CBS) algorithm that the gene datas are connected into a ring to find out the position with significant differences(Change point). Huang et al. (2007) proposed a smooth segmentation method (smoothseg) to analyze data by statistical method of smoothing data to detect variability.
However, from a biological perspective, genes in the vicinity of the same chromosomal arm tend to be highly correlated. These analyzes do not take into account the possible dependency between adjacent genes (Engler et al., 2006). Lin (2016) expands the CBS approach to multidimensional situations by considering the correlation between genes (multiCBS). Therefore, this study consider data of correlated genes, using multivariate analysis of variance (MANOVA) to find the change point. In addition, we will use statistical simulation and real example to compare the performances of proposed method and the methods of multiCBS.
1. Engler, D. A., Mohapatra, G., Louis, D. N., & Betensky, R. A. (2006). A pseudolikelihood approach for simultaneous analysis of array comparative genomic hybridizations. Biostatistics, 7(3), 399-421.
2. Hotelling, H. (1931). The generalization of Student’s ratio. Annals of Mathematical Statistics 2 (3), 360–378.
3. Huang, J., Gusnanto, A., O'Sullivan, K., Staaf, J., Borg, A., & Pawitan, Y. (2007). Robust smooth segmentation approach for array CGH data analysis. Bioinformatics, 23(18), 2463-2469.
4. Lai, W. R., Johnson, M. D., Kucherlapati, R., & Park, P. J. (2005). Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics, 21(19), 3763-3770.
5. Lin, C. S. (2016). A study for determination of deletion position under dependent copy number data. Master thesis of Department of statistics, National Cheng Kung University, 1-48.
6. Olshen, A. B. and Venkatraman, E. S. (2004). Circular binary segmentation for the analysis of array based DNA copy number data. Biostatistics, 5, 557–572.
7. Olshen, A. B. and Venkatraman, E. S. (2007). A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics, 23, 657-663.
8. Willenbrock, H. and Fridlyand, J. (2005). A comparison study: applying segmentation to array CGH data for downstream analyses. Bioinformatics, 21, 4084-4091.