簡易檢索 / 詳目顯示

研究生: 林怡君
Lin, Yi-Chun
論文名稱: 次世代基因定序之品質
Quality of Base Calling for Next Generation Sequence
指導教授: 詹世煌
Chan, Shih-Huang
學位類別: 碩士
Master
系所名稱: 管理學院 - 統計學系
Department of Statistics
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 27
中文關鍵詞: 次世代基因定序鹼基品質特徵臉主成分分數
外文關鍵詞: Next Generation Sequencing, base quality, eigenfaces, principal component scores
相關次數: 點閱:212下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • DNA 分別由腺嘌呤(Adenine, A)、胸腺嘧啶(Thymine, T)、胞嘧啶(Cytosine, C)以及鳥糞嘌呤(Guanine, G)四種不同單位以不同排列順序所組成。隨著技術的改良與進步,基因定序的儀器也越是精密,相較於傳統基因定序的方法,次世代基因定序(Next Generation Sequencing, NGS),可完整檢測且快速地整合叢集生成、定序與配對組裝完整的基因序列,同時大幅降低所需的時間與成本,其中,定序品質扮演重要的角色。
    Illumina 公司使用前四個週期執行模板生成,其後續週期的鹼基位置則以此模板做為定序。已有研究指出,在第四個週期後鹼基的位置並非固定,李佩芳(2012) 用單點鹼基叢集法,說明鹼基游移;邵筠芬(2013) 採用非鹼基存在之空白區塊,利用環狀編碼概念,探討游移現象;林盈樺(2014) 改用鹼基存在的位子,同以環狀編碼概念做延伸,並且加入長度與角度變化,來判斷鹼基飄移之方向。然而,上述研究著重以週期之局部區域做位移探討,本研究則利用特徵臉之方法,快速了解週期間鹼基及其位置之變化,並進一步探討週期與鹼基位移的關係。

    There are four different bases, adenine (A), guanine (G), cytosine (C), and thymine (T), making up DNA. Next Generation Sequencing is a new technique allowing to sequence DNA much more quickly and cheaply than the previously used Sanger sequencing. However, issue on quality of sequencing, although plays an important role in coding the DNA, does not receive much attention academically and practically. Illumina, one of the famous companies, claims that the positions of bases follow the same sequels vertically after the fourth cycle.
    However, several works have been conducted and proved that the positions of base calling are not fixed. See Li (2012), Shao (2013) and Lin (2014). The above authors basically used a specific region to prove that there does exist a shift in base position. In the thesis, we apply machine learning technique called eigenface recognition and principal component scores to represent the overall behavior of cycles, and find the relationship between shift and cycles using the coefficients of eigenfaces.

    Abstract (Chinese) i Abstract (English) ii Acknowledgments iii Table of Contents iv List of Tables vi List of Figures vii Chapter 1. Introduction 1 Chapter 2. Evaluating the Quality for NGS 2 2.1 Data Characteristics.......... 2 2.1.1 DNA Sequencing......... 2 2.1.2 Independent and Uniform Property....... 2 2.2 Determination of Base Call.......... 3 2.2.1 Cluster Size.......... 3 2.2.2 Threshold........... 4 2.2.3 Determination of Real Base....... 5 2.3 Eigenface Approach for Cycle Quality........ 5 Chapter 3. Simulation 8 3.1 Simulating Data............ 8 3.1.1 Intensity and Position......... 8 3.1.2 Data Settings........... 8 3.2 Simulation Results and Analysis......... 11 Chapter 4. NGS Analysis 17 4.1 Independence and Uniformity........ 18 4.2 Evaluating Overall Quality of Cycles....... 19 Chapter 5. Conclusions 26 References 27

    [1] Chen, J. A. (2014). "Evaluating the Quality of Base Calling for Next Generation Sequence",
    Master Thesis, Department of statistics, National Cheng-Kung University.
    [2] Illumina (2013). "MiSeq® System User Guide", San Diego, California 92122 U.S.A.
    [3] Li, P. F. (2012). "Base Calling of Read Sequencing for Next Generation Sequencing
    (NGS)", Master Thesis, Department of statistics, National Cheng-Kung University.
    [4] Lin, Y. H. (2014). "The Shift Phenomenon of Bases for Next Generation Sequence and
    its Effect", Master Thesis, Department of statistics, National Cheng-Kung University.
    [5] Sanger, F., Nicklen, S. and Coulson, A. R. (1977). "DNA sequencing with chainterminating
    inhibitors", Proceedings of the National Academy of Sciences of the USA
    74(12):5463-5467.
    [6] Shao, Y. F. (2013). "Pattern Recognition for Next Generation Sequence", Master Thesis,
    Department of statistics, National Cheng-Kung University.
    [7] Turk, M. and Pentland, A. (1991). "Eigenfaces for Recognition", Journal of Cognitive
    Neuroscience, Vol. 3, pp.71–86.

    下載圖示 校內:2020-07-31公開
    校外:2020-07-31公開
    QR CODE