研究生: |
房俊傑 Fen, Jun-Jeng |
---|---|
論文名稱: |
建立人類Y染色體三核甘酸重複序列資料庫系統 Establishment of Bioinformatic Database of Tri-nucleotide Repeats in Human Chromosome Y |
指導教授: |
陳啟清
Chen, Chi-Ching |
學位類別: |
碩士 Master |
系所名稱: |
生物科學與科技學院 - 生物學系 Department of Biology |
論文出版年: | 2002 |
畢業學年度: | 90 |
語文別: | 中文 |
論文頁數: | 46 |
中文關鍵詞: | 生物資訊 、微衛星 、Y染色體 、三核甘酸 、重複序列 、資料庫 |
外文關鍵詞: | Y chromosome, bioinformatics, database, repeat, TNR, microsatellite |
相關次數: | 點閱:137 下載:3 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Y染色體的有無決定了人類的性別。然而,人類成對染色體的比較中,性染色體是成對染色體中差異最為明顯。Y染色體與其它體染色體間有許多的差異:在Y染色體中,大部分的區域並沒有重組的現象;並且,在Y染色體中可以發現具有大量重複的DNA序列。Microsatellite DNA是一段短序列呈現序列重複現象的DNA片段,它們大多是高度的多型性,最近的許多學者,常利用Y染色體上的microsatellite作為親源鑑定或是族群演化的研究工具。
Data mining主要是用來研究如何在龐大的資料庫中整理並發現出隱藏的有用資訊。利用data mining對於資料庫的應用概念,利用程式針對microsatellite的特性,對於Ensembl database加以分析、整理,重新建立一個以microsatellite特性為主的資料庫系統,將是這次研究的主要目的之一。
在這次的研究中,我們新設計了一個專為microsatellite應用的模組,這個模組主要提供了兩個部分:一個是針對microsatellite所使用的重複序列由電腦自行排列組合,產生可能的重複序列單元;另一個部分則是提供了連結到遠端資料庫的方法,使得使用者不必再自行等待下載所需要分析的序列片段。當microsatellite database被建立後,利用關鍵索引欄位,可以快速的與其他資料庫進行模擬比對的工作。如此利用程式加以分析,不僅可以所花費的時間大幅減少,也免去了過去人工比對所可能產生的誤差。
很多的數據和資料僅僅只是以文字的方式加以表達,很難在極短的時間內找出可能的關聯性。因此,本次研究中加入了圖形的呈現,透過圖形的繪製,將有助於釐清並且集中焦點在可能的關聯性
Sex is determined by the presence or absence of the Y chromosome in humans. Of all human chromosomes, there are most difference in the X chromosome and Y chromosome. Y chromosome does not recombine during meiosis over most of its length and it is composed the high density of repeated sequence. Microsatellite is consisted short tandem repeat sequence. Microsatellite is abundant and polymorphic in common eukaryotic genome, so it has been used as genetic markers to molecular biology studies as useful tools in gene mapping, population genetics, etc.
Data mining is the object that it can assist to find the hidden useful information in the great database. One of the researches, we take advantage of data mining to coding some programs for building the microsatellite database system.
In the researches, we coded the Perl programs module specific to microsatellite analysis. This module provides several outstanding advantages in definition and searching for microsatellite investigation. First, every possible and suitable repeat patterns could be generated and output automatically by random arrangement and combination via computer science. Second, local users could find desirous DNA sequence for repeats analysis directly by connecting to remote DNA database instead of download any DNA sequence. Finally, when microsatellite database had ever been constructed, researcher could embed other databases, such as disease or SNP databases, for further research rapidly by key word index fields. This study provide a powerful methods, which not only saving considerable time and harddisk loading for but avoiding possible artificial error of analysis process in repeat sequence investigation.
The data display not only by words but by figures will contribute to appear the relation of microsatellite in human Y chromosome. We preferred to the application of the figures in the microsatellite analysis researches.
Aron Marchler-Bauer, Anna R. Panchenko, Benjamin A. Shoemaker, Paul A. Thiessen, Lewis Y. Geer and Stephen H. Bryant. CDD:a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Research, 2002, vol.30, No. 1, 281~283.
Benson, D. A., Boguski, M., Lipman, D. J. and Ostell, J. GenBank. Nucleic Acids Res. 22(17): 3441-3444 (1994).
Cooper, G., Amos, W., Hofman, D., Rubinsztein, D.C. Network analysis of human Y microsatellite haplotypes. Hum Mol Genet , 1996,5:1759-1766
Craig Venter, J., et al. Human genomes, public and private. Nature, 2001, Vol. 409, 745
Craig Venter, J., et al. The Human Genome. Science, Vol. 291, 2001, 1304~1351.
David L. Wheeler, Deanna M. Church, Alex E. Lash, Detlef D. Leipe, Thomas L. Madden, Joan U. Pontius, Gregory D. Schuler, Lynn M. Schriml, Tatiana A. Tatusova, Lukas Wagner and Barbara A. Rapp. Database resources of the National Center for Biotechnology Information:2002 update. Nucleic Acids Research, 2002, vol.30, No. 1, 13~16.
Doris Bachtrog and Brian Charlesworth. Towards a complete sequence of the human Y chromosome. Genome Biology 2001, 2(5):reviews1016.1-1016.5
Gibas & Jamback. Developing Bioinformatics Computer Skills. By Cynthia ibas. O'Reilly. April 2001.
Gary Benson, Michael S. Waterman. A method for fast database search for all k-nucleotide repeats. Nucleic Acids Research, 1994, vol.22, No. 22, 4828~4836.
Hugues Roest Crollius, Alain Bernot, Corinine Dasilva, Laurence Bouneau, Cecile Fischer, Cecile Fizames, Patrick Wincker, Philippe Brottier, Francis Quetier, William Saurin & Jean Weissenbach. Estimate of human gene number provided by genomewide analysis using Tetraodon nigroviridis DNA sequence. Nature Genetics (25), 2000, 235~238.
James D. Tisdall. Beginning Perl for Bioinformatics, O'reilly, Oct. 2001.
Kim D. Pruitt and Donna R. Maglott. RefSeq and LocusLink:NCBI gene-centered resources. Nucleic Acids Research, 2001, vol.29, No. 1, 137~140.
Lahn BT, Page DC. Four evolutionary strata on the human X chromosome. Science 1999, 286:964-967.
Lander, E.S., Linton, L.M., Birren, B., et al. Initial sequencing and analysis of the human genome. Nature (2001) 409: 860-921.
Larry Wall, Tom Christiansen, Jon Orwant. Programming Perl (2nd Edition), O'Reilly. Jan 1999.
Lee, J., Kotliarova, SE., Ewis, AA., Hida, A., Shinka, T., Kuroki, Y., Tokunaga, K., Nakahori, Y. Y chromosome compound haplotypes with the microsatellite markers DXYS265, DXYS266, and DXYS241. Journal of Human Genetics. 2001,46(2):80-4.
Shinka, T., Tomita, K., Toda, T., Kotliarova, S.E., Lee, J., Kuroki, Y., Jin, D.K., Tokunaga, K., Nakamura, H., Nakahori, Y. Genetic variations on the Y chromosome in the Japanese population and implications for modern human Y chromosome lineage. Journal of Human Genetics , 1999,44:240-245
Sinclair AH, Berta P, Palmer MS, Hawkins JR, Griffiths BL, Smith MJ, Foster JW, Frischauf AM, Lovell-Badge R, Goodfellow PN: A gene from the human sex-determining region encodes a protein with homology to a conserved DNA-binding motif. Nature 1990, 346:240-244.
Sriram Srinivasan. Advance Perl Programming, O'Reilly. Mar 1999.
S. T. Sherry, M.-H. Ward, M. Kholodov, J. Baker, L. Phan, E. M. Smigielski and K.Sirotkin. dbSNP:the NCBI database of genetic variation. Nucleic Acids Research, 2001, vol.29, No. 1, 308~311.
T. Hubbard, D. Barker, E. Birney, G.. Cameron, Y. Chen, L. Clark, T. Cox, J. Cuff, V. Curwen, T. Down, R. Durbin, E. Eyras, J. Gilbert, M. Hammond, L. Huminiecki, A. Kasprzyk, H. Lehvaslaiho, P. Lijnzaad, C. Melsopp, E. Mongin, R. Pettett, M. Pocock, S. Potter, A. Rust, E. Schmidt, S. Searle, G.. Slater, J. Smith, W. Spooner, A. Stabenau, J. Stalker, E. Stupka, A. Ureta-Vidal, I. Vastrik and M. Clamp.The Ensembl genome database project. Nucleic Acids Research, 2002, vol.30, No. 1, 38~41.
Yanli Wang, John B. Anderson, Jie Chen, Lewis Y. Geer, Siqian He, David I. Hurwitz, Cynthia A. Liebert, Thomas Madej, Gabriele H. Marchler, Aron Marchler-Bauer, Anna R. Panchenko, Benjamin A. Shoemaker, James S. Song, Paul A. Thiessen, Roxanna A. Yamashita and Stephen H. Bryant. MMDB:Entrez's 3D-structure database. Nucleic Acids Research, 2002, vol.30, No. 1, 249~252.