| 研究生: | 陳柔安 Chen, Jou-An | 
|---|---|
| 論文名稱: | 次世代基因定序之品質評估 Evaluating the Quality of Base Calling for Next Generation Sequence | 
| 指導教授: | 詹世煌 Chan, Shih-Huang | 
| 學位類別: | 碩士 Master | 
| 系所名稱: | 管理學院 - 統計學系 Department of Statistics | 
| 論文出版年: | 2014 | 
| 畢業學年度: | 102 | 
| 語文別: | 中文 | 
| 論文頁數: | 45 | 
| 中文關鍵詞: | 次世代基因定序 、鹼基判定品質 、對談矩陣 | 
| 外文關鍵詞: | Next Generation Sequence, Quality of Base Calling, Crosstalk Matrix | 
| 相關次數: | 點閱:114 下載:9 | 
| 分享至: | 
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 | 
去氧核苷酸(DNA)是一種長鏈聚合物,可組成遺傳密碼,其主要功能是儲存身體內的遺傳訊息。帶有蛋白質編碼的 DNA 片段稱為基因,它是決定生物遺傳特徵的基本單位,基因含四種鹼基:腺膘呤(A)、胸腺嘧啶(T)、胞嘧啶(C)以及鳥嘌呤(G),此四種鹼基依不同的順序排列組合成基因序列。基因定序,即透過螢光亮度進行鹼基判定並組裝,因此,鹼基判定的準確性會影響基因序列組裝的價值,其重要性不容小覷。 
雖然過去已有文獻討論鹼基判定的品質,但因為不知道真實的鹼基為何,故無法看出鹼基判定的準確性。本文將利用 Illumina 公司提供的資料,估計每個週期下,鹼基之四維分配,並以此為參考基準,模擬螢光亮度值,進行鹼基判定。 
鹼基判定過程主要分四大部份,依序為:(1)處理干擾、(2)訂定門檻值、(3)決定判定位置、(4)判定標準。本文將比較已有的方法,驗證其準確性與穩定性,並加以修正與改善,以找出鹼基判定方法之最佳組合。
Deoxyribonucleic acid (DNA) is a molecule that encodes the genetic instructions. Each gene is composed of four bases: adenine(A), thymine(T), cytosine(C) and guanine(G). The value of assembled reads is judged by the quality of bases, which in turn by the accuracy of base calling. As a result, research about base calling becomes critical if quality of reads is to be assured. 
Although literatures did discuss the quality of base calling, no true sequence is available. Therefore, the accuracy of base calling can’t be obtained.In this article, we use available data from Illumina to estimate the parameters in the 4-variate distribution of true bases in every cycle, and simulate the intensity of the sequences. The best procedure of base calling we obtained include four processes:(1)deal with noise, (2)determine the threshold,(3) find candidate locations, and (4)set up the base calling criterion. In this article, we propose a method to find candidate locations of bases and suggest a better procedure in base calling. 
中文部分
[1] 李佩芳(2012),「次世代基因定序之基因序列的鹼基判定」,國立成功大學
統計學研究所碩士論文。 
[2] 邵筠芬(2013),「次世代基因定序之圖形比對」,國立成功大學統計學研究所碩士論文。 
[3] 高崑閎(2011),「次世代基因定序之對談矩陣與品質指標」,國立成功大學統計學研究所碩士論文。 
英文部分
[1] Giddings M.C., Brumley R.L. Jr., Haker M., and Smith L.M. (1993), “An adaptive, object oriented strategy for base calling in DNA sequence analysis,” Nucleic Acids Res, 21: 4530-4540. 
[2] Hinsdale Township High School Biology Central (n.d.), “Reading DNA,” Retrieved June 5, 2014, from http://www.hinsdale86.org/staff/kgabric/dimacs/readingdna.htm 
[3] Lai, W. J. (2012), “Crosstalk matrix selection and base determination,” Master Thesis, Department of Statistics, NCKU. 
[4] Lawrence, C.B. and Solovyev, V.V. (1994), “Assignment of position-specific error probability to primary DNA sequence data,” Nucleic Acids Res, 22: 1272-1280.   
[5] Li, L. and Speed, T. P. (1999), “An estimate of the crosstalk matrix in four-dye fluorescence-based DNA sequence,” Electrophoresis, Jun; 7(20): 1433-1442.