| 研究生: |
賴偉榮 Lai, Wei-Jung |
|---|---|
| 論文名稱: |
Crosstalk矩陣之選取及鹼基的判定 Crosstalk Matrix Selection and Base Determination |
| 指導教授: |
詹世煌
Chan, Shih-Huang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 統計學系 Department of Statistics |
| 論文出版年: | 2012 |
| 畢業學年度: | 100 |
| 語文別: | 英文 |
| 論文頁數: | 57 |
| 中文關鍵詞: | 次世代基因定序 、cross-talk矩陣 、鹼基判定品質分數 |
| 外文關鍵詞: | Next Generation Sequencing, Cross-talk Matrix, Base-calling Quality Score |
| 相關次數: | 點閱:164 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在次世代基因序列的研究主要聚焦於reads的組裝,鮮有人討論到基因序列中鹼基A、C、G、T的判定與品質評估。由於鹼基判定的精準性會影響後續reads的組裝及其結果的分析,故在生物多樣性的探測與下游統計分析上佔極其重要之地位與份量。有學者提出使用crosstalk matrix來提高鹼基判定的可靠性與降低預測錯誤率,並採用不同的品質分數區分經crosstalk matrix轉化後之鹼基亮度值的優劣成效(Giddings 1993),在本文中,我們利用SN ratio 來選取crosstalk matrix。在鹼基之散佈行為屬均勻分配之假設下,我們將陣列切割成列寬為R的區間,並據以估計crosstalk matrix。切割不同區間而能得到最大的SN ratio者所對應之矩陣即為最佳的crosstalk matrix。
在品質分析上,本文將以芒屬植物(Miscanthus)之DNA資料來建立模型,並用此模型參數來模擬基因晶片(Tile)上鹼基的散佈行為。本文亦利用Lawrence和Solovyev(1994)所提出√(m^2+d^2 )品質分數與高崑閎(2011)提出指標值之分配極端性品質分數來衡量與SN ratio的相關性,最後模擬結果發現√(m^2+d^2 )品質分數與SN ratio間存在著正相關之特性。
The main focus of NGS data analysis is read assemble. For NGS data, relative few people discussed the determination and quality score for bases. Because the accuracy of base determination affects following reads assembly, and hence analysis, it is very important if qualitative findings are to be assured in biodiversity detection and downstream statistics analysis. Cross-talk matrix is proposed by several scholars, say Giddings et al (1993). The application of cross-talk matrix enhances base-call reliability and reduces the prediction error rate. In this thesis, SN ratio will be proposed in the selection of optimal row number used, in the estimation of cross-talk matrix.
As to the issue of quality for base, the DNA data of Miscanthus is used to build model, and to estimate the parameters of model in simulating base scattered behavior. We also use the √(m^2+d^2 ) proposed by Lawrence and Solovyev (1994) to establish the quality score, and the extreme behavior of an index distribution suggested by Kao (2011). We also measure the correlation between SN ratio and the quality score, and through simulation, we discover that √(m^2+d^2 ) has positive correlation with SN ratio.
Giddings, M.C., Brumley, R.L. Jr, Haker, M. and Smith^*, L.M. (1993). “An adaptive, object oriented strategy for base calling in DNA sequence analysis”, Nucleic Acids Res. 19(21) :4530-4540.
Kao, K.H. (2011).” Issues on Cross-Talk Matrix and Quality Measures for Second-Generation Sequence Call”. Department of Statistics National Cheng Kung University.
Lawrence^*,C.B. and Solovyev, V.V. (1994). “Assignment of position-specific error probability to primary DNA sequence data”. Nucleic Acids Res. 7(22) :1272-1280.
Li, L. and Speed, T.P. (1999). “An estimate of the crosstalk matrix in four-dye fluorescence-based DNA sequence”. Electrophoresis. Jun;7(20):1433-1442.
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H. and Teller, E. (1953). ”Equation of State Calculations by Fast Computing Machines”. J.Chem.Phys. 6(21) :1087-1092.
Saiki, R.K., Scharf, S., Faloona, F., Mullis, K.B., Horn, G.T., Erlich, H.A. and Arnheim, N. (1985). “Enzymatic Amplification of $ eta $-Globin Genomic Sequences and Restriction Site Analysis for Diagnosis of Sickle Cell Anemia”. Science New Series. 4732(230) :1350-1354.
Sanger, F., Nicklen, S. and Coulson, A.R. (1977). “DNA sequencing with chain-terminating inhibitors”. Proc. Natl. Acad. Sci. USA. 12(74) :5463-5467.
大石正道 (2002), 圖解人類基因組的構造, 台北:世茂。