| 研究生: |
邵筠芬 Shao, Yun-Fen |
|---|---|
| 論文名稱: |
次世代基因定序之圖形比對 Pattern Recognition for Next Generation Sequence |
| 指導教授: |
詹世煌
Chan, Shih-Huang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 統計學系 Department of Statistics |
| 論文出版年: | 2013 |
| 畢業學年度: | 101 |
| 語文別: | 中文 |
| 論文頁數: | 37 |
| 中文關鍵詞: | 基因定序 、統計圖形比對 、特徵向量個數 |
| 外文關鍵詞: | genome sequencing, pattern recognition, number of features |
| 相關次數: | 點閱:171 下載:5 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
自1990年10月由美國能源部(DOE)及國家衛生院(NIH)提出之人類基因體研究計畫(Human Genome Project, HGP)正式啟動以來,基因體相關研究就已蔚為風行。在基因體定序中,基因序列的組裝始終是大家研究的焦點,雖有人討論到鹼基的判定,但並不多見。由於鹼基判定的精準度會影響組裝的價值,提高鹼基的可靠性以降低預測錯誤率使顯得特別重要。2007年Illumina公司整合次世代基因定序平台,並輔以資訊軟體系統配對出完整的基因定序。Illumina公司認為在基因模板上的基因序列鹼基位置是固定的。李佩芳(2012)利用單點發現第五週期後的鹼基位置並非固定。在本文中,我們利用有一定大小的區域圖形來檢驗每個週期基因序列的鹼基位置具有漂移的現象。此外,我們發現鹼基位置可能有集體游移的現象,我們利用圖形比對以檢驗集體游移的現象。
在標記區域圖形上,我們就一特定點由固定方向出發描繪圖形。模擬結果說明此一標記圖形方法具有一定的圖形辨識能力。在圖形比對選取特徵值個數上,為了減低資料分析的困難度,此地利用彭國軒(2003)提出的環狀編碼觀念,固定每個週期的特徵數一致。在次世代序列的資料分析上,我們證實每個週期基因序列的鹼基位置並不固定,且有集體漂移的現象。
Since the United States Department of Energy (DOE) and National Institutes of Health (NIH) set up the Human Genome Project (HGP) in 1990, topics related to genome analysis became much more popular. In DNA sequencing research, most people focus on the assembly of reads, less discuss the determination and quality of bases. Since the value of assembled reads is judged by the quality of bases, which in turn by the accuracy of bases determination, the research about bases determination becomes critical if quality of reads is to be assured.
Illumina, one of the leading companies in DNA sequencing, deems that the position of the genome sequence on the genome plate is fixed, but Li (2012), by observing the positions of a single base in different cycles, found that it is not true. In this paper, we propose using an area which does not contain bases to examine the stability of the base position. Techniques of pattern recognition are applied to validate the movement of the blocks. Simulation study shows that the algorithm we develop is capable of capturing the pattern of the designated graphs. To reduce the difficulty in data analysis, Peng’s ring coding techniques (2003), is applied to fix the number of features. From the NGS data made by Illumina, we approve that the positions of genome sequence on the genome plate is not fixed, and its mobility is collective.
[1] A.k. Jain, R.P. W. Duin and J.C. Mao(2000).”Statistical pattern recognition : A review”. IEEE Transactions on Pattern Analysis and Machine Intelligence. 22(1):4-37
[2] Keinosuke Fukunaga. (1990). Introduction to statistical pattern recognition. 2nd ed. 1-10. Boston : Academic Press.
[3] 李思源與莊以光(2010). DNA定序技術之演進與發展. J Biomed Lab Sci Vol22 No2:49-58
[4] 李佩芳(2012).次世代基因定序之基因序列的鹼基判定. 國立成功大學.碩士論文
[5] 陳中庸與蔡世峰(2003).基因體定序之現況與展望. 載於張明富(主編).後基因體時代之生物技術. 205-213:台北市:教育部顧問室
[6] 彭國軒(2003).快速物件辨認與定位-環狀樣板比對.國立清華大學.碩士論文
[7] 賴偉榮(2012).Crosstalk矩陣之選取及鹼基的判定. 國立成功大學.碩士論文
[8] 龔威儒(2004). 局部遮蔽圖形之自動比對. 國立清華大學.碩士論文