| 研究生: |
嚴國何 Yen, Kuo-ho |
|---|---|
| 論文名稱: |
進階細胞遺傳查詢系統與整合型資料庫和一致性分析 Extended Cytogenetic Query System with Integrated Databases and Consistency Analysis |
| 指導教授: |
李強
Lee, Chiang |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2009 |
| 畢業學年度: | 97 |
| 語文別: | 英文 |
| 論文頁數: | 73 |
| 中文關鍵詞: | 染色體 、細胞遺傳 、生物資訊 、基因 |
| 外文關鍵詞: | chromosome, cytogenetic band, cytoband, bioinformatics, gene |
| 相關次數: | 點閱:145 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
對人類在細胞分裂中期的染色體染色顯現出的帶狀特徵被稱為 cytogentic bands 或 cytobands。以細胞分裂中期染色體為基礎的技術,研究人員已經累積了許多人類疾病和特定 cytoband 變異相關性的知識,說明存在疾病相關的基因在這些染色體區間。隨著人類基因組計劃的進行和螢光原位雜交(fluorescent in situ hybridization, FISH) 之類的技術,許多基因已被指定到特定的 cytobands 並且在資料庫中有相關的註解,以便能夠在資料庫中查詢疾病相關 cytobands 中的所有基因。然而,尋找在 cytobands 中的基因仍然是一個不精確的過程,部分原因來自於現行的方法不足以做完整的 cytoband 查詢,尤其是那些基於遺傳學所定的位置。經由將 cytoband 的位置轉換成數值的區段,任何人類染色體的區段都可以被正確的定義,一個新的查詢方法也因此被開發出來。一個查詢系統(稱為CQS)被實作出來使用在公共領域的 cytoband 資訊。經由使用 NCBI Map Viewer的資料來做效能測試的判斷,CQS 執行的準確性和預期的一樣精確。新的方法有其擴展性,可應用於其他物種的基因組。
在公開的資料庫中有很多人類基因的細胞遺傳的註解(cytogenetic annotations) 是有問題的。例如,housekeeping gene beta-actin (ACTB) 已經被註解在 7p22 的位置,而不是在 Entrez Gene 中紀錄的 7p15-p12。一般相信基因位置的相對次序在序列圖譜(sequence map)和細胞遺傳註解是相同的。然而,有很多對的基因在cytogenetic 註解與序列圖譜中的位置是有差異的,一個有系統的搜尋這些差異應該用來解決基因組中細胞遺傳註解的問題。在現行的資料庫中有許多基因的細胞遺傳的註解(cytogenetic annotations)和序列圖譜(sequence map) 的位置是不一致的。然而,並非所有的不一致情況是相同的。有一些可能是資料有問題,需要在未來更正;而另一些可能是源自於染色體的帶狀特徵不夠精確所造成的,這些情況可能是可以容忍的。辨認出不同精確程度的 cytogenetic banding ,並且將這些細胞遺傳的位置資訊分類成不同的信賴群(confidence groups) 是重要的。
當把基因的 cytobands 的位置訊息對序列圖譜(sequence map)的位置標在二維平面時,有一致性的基因往往會形成一個緊湊的線性分佈(linear distribution);而位置不一致的基因則是較為分散。在這兩區的重疊區,經由線性迴歸(linear regression)和距離的分析定義為可接受的不精確區(imprecision-zones)。這個系統的實作,使用的序列圖譜資訊來自NCBI Map Viewer Build 36.3,細胞遺傳學的位置資訊則是來自 NCBI Entrez Gene 和 NCBI OMIM。基因的位置資訊被分類成5個信賴群(confidence groups):不一致不能容忍(inconsistent-intolerable),不一致尚可接受(inconsistent-tolerable),一致不準確(consistent-imprecise),一致準確(consistent-precise) 和一致粗糙(consistent-rough)。這5個信賴群的百分比分別為 1.4% , 7.1% , 54.0% , 35.4%和2.2% 。使用來自 NCBI Map Viewer Build 36.3 和 NCBI OMIM 的資訊,這個百分比分別為 3.6%,17.0%,49.0%,19.0% 和 11.4%。結合這兩個結果,建構一個基因位置資訊的信賴表格。進階細胞遺傳查詢系統(ECQS)建構在一個整合的單一資料庫。這個資料庫的資訊來自 NCBI Entrez Gene,NCBI Map Viewer 和 NCBI OMIM。我們分析了在細胞遺傳表示法(cytogenetic annotation)和序列圖譜(sequence mapping)的不一致來定義不精確區(imprecision-zones)的 cytogenetic banding。 ECQS 是一個網頁架構 (web-based) 的應用程式,在這個系統研究人員可以經由送出一個染色體位置,以查詢基因的資訊和相關不一致資訊的分析結果。該系統也會自動延長查詢區域以包括基因的不精確區。
關鍵字:染色體,細胞遺傳,生物資訊,基因
Staining the human metaphase chromosomes reveals characteristic banding patterns known as cytogenetic bands or cytobands. Using technologies based on metaphase chromosomes, researchers have accumulated much knowledge about the correlations between human diseases and specific cytoband aberrations, indicating the presence of disease-associated genes in those bands. With the progress of human genome project and techniques such as fluorescent in situ hybridization, many genes have been assigned to the cytobands and annotated in public database, making it possible to find all genes in the disease-related cytobands through database query. However, finding genes in cytobands remains to be an imprecise process, partly due to the insufficiency of current methods for cytoband query, especially for those based on cytogenetic annotations. By transforming the cytoband annotations into numerical segments, a new query method is developed that is able to accurately define any cytogenetic ranges in human chromosomes. A query system (designated CQS) is implemented using cytogenetic annotations in the public domain. Judged by a performance test, CQS executed as accurately as expected using cytogenetic annotations from NCBI Map Viewer. The new method is scalable and can be applied to genomes from other species.
Many human genes have problematic cytogenetic annotations in the public database. For example, the common housekeeping gene beta-actin (ACTB) had been mapped to 7p22, but not 7p15-p12 as annotated in Entrez Gene. It is believed that the positional order of genes on the sequence map is the same as that of cytogenetic locations. However, there are many pairs of genes with discrepancy between cytogenetic annotations and sequence map positions. A systematic search for such discrepancies should uncover problematic cytogenetic annotations genome-wide. There are many genes with inconsistency between their cytogenetic annotations and sequence map positions in current databases. However, not all inconsistencies are the same. Some of them may be problematic which should be corrected in the future; while others may result from the imprecise nature of chromosomal banding which may be tolerable. It is important to stratify the cytogenetic position information into different confidence groups with the recognition of the impreciseness of cytogenetic banding.
When plotting their cytogenetic annotations against sequence map positions on a 2-D plane, the consistent genes tend to have a compact linear distribution; while genes with inconsistent positions are more scattered. The overlapping areas between these 2 groups are defined as the tolerable imprecision-zones by linear regression and distance analysis. The system was implemented using sequence information from NCBI Map Viewer Build 36.3 and cytogenetic annotations from NCBI Entrez Gene. The genes position information is classified into five confidence groups: inconsistent-intolerable, inconsistent-tolerable, consistent-imprecise, consistent-precise and consistent-rough. The percentages of these confidence groups are 1.4%, 7.1%, 54.0%, 35.4% and 2.2%, respectively. Using information from NCBI Map Viewer Build 36.3 and NCBI OMIM, the percentages are 3.6%, 17.0%, 49.0%, 19.0%, and 11.4%, respectively. Combining these two results, a confidence table of genes position information was constructed. The Extended Cytogenetic Query System (ECQS) was built based on a unitary database with integrated information from NCBI Entrez Gene, NCBI Map Viewer and NCBI OMIM. We analyze the inconsistencies between cytogenetic annotations and sequence mapping by defining imprecision-zones of cytogenetic banding. ECQS is a web-based application, in which researchers can retrieve genes information and the related inconsistencies results by submitting a cytogenetic banding region as the query. The system can also automatically extend the query region to include genes in imprecision zones.
Keywords: chromosome, cytogenetic band, cytoband, bioinformatics, gene
Bradtke J. et al. Computer aided analysis of additional chromosome aberrations in Philadelphia chromo-some positive acute lymphoblastic leukaemia using a simplified computer readable cytogenetic notation, BMC Bioinformatics, 28, 4, 2003.
Caspersson, T. et al. Chemical differentiation along metaphase chromosomes, Exp Cell Res, 49,219-22, 1968.
Caspersson, T. et al. Identification of the Philadelphia chromosome as a number 22 by quinacrine mustard fluorescence analysis, Exp Cell Res, 63,238-40, 1970.
Caspersson, T. et al. Identification of human chromosomes in a mouse-human hybrid by fluorescence techniques, Exp Cell Res, 65,475-8, 1971.
Chillon MC et al. Two new 3' PML breakpoints in t(15;17)(q22;q21)-positive acute promyelocytic leukemia, Genes Chromosomes Cancer.,27:35-43, 2000.
Cheung, V.G. et al. Integration of cytogenetic landmarks into the draft sequence of the human genome, Nature, 409,953-8, 2001.
Collins, F.S. Positional cloning moves from perditional to traditional, Nat Genet, 9,347-50, 1995.
Collins, S., Coleman, H., and Groudine, M. Expression of bcr and bcr-abl fusion transcripts in normal and leukemic cells, Mol Cell Biol, 7,2870-6, 1987.
Druker, B.J. Inhibition of the Bcr-Abl tyrosine kinase as a therapeutic strategy for CML, Oncogene, 21,8541-6, 2002.
Cuticchia et al. Inconsistencies between human genetic cytolocations and those derived using genomic sequence. Cytogenet Genome Res, 112, 1-5, 2006.
Kallioniemi, A. et al. Comparative ge-nomic hybridization for molecular cytogenetic analysis of solid tumors, Science, 258,818-21, 1992.
Davies, J.J. et al. Array CGH technologies and their applications to cancer genomes. Chromosome Research, 13, 237-248, 2005.
Druker, B.J. Inhibition of the Bcr-Abl tyrosine kinase as a therapeutic strategy for CML, Oncogene, 21,8541-6, 2002.
Erdogan F et al. Characterization of a 5.3 Mb deletion in 15q14 by comparative genomic hybridization using a whole genome "tiling path" BAC array in a girl with heart defect, cleft palate, and developmental delay. Am J Med Genet A. 143:172-8, 2007.
Furey, T. S. and Haussler, D. Integration of the cytogenetic map with the draft human genome sequence. Hum. Mol. Genet., 12, 1037-44, 2003.
Kallioniemi, A., Kallioniemi, O.P., Sudar, D., Rutovitz, D., Gray, J.W., Waldman, F., and Pinkel, D. Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors, Science, 258,818-21, 1992.
Kirsch, I.R.et al. A systematic, high-resolution linkage of the cytogenetic and physical maps of the human genome, Nat Genet, 24,339-40, 2000.
Knutsen, T. et al. The interactive online SKY/M-FISH & CGH database and the entrez cancer chromosomes search database: linkage of chromosomal aberrations with the genome sequence. Genes, Chro-mosomes and Cancer, 44, 52-64, 2005.
Korenberg, J.R.et al. Human genome anatomy: BACs integrating the genetic and cytogenetic maps for bridging genome and biomedicine, Genome Res, 9,994-1001, 1999.
Kurz T et al. Fine mapping and positional candidate studies on chromosome 5p13 identify multiple asthma susceptibility loci. J Allergy Clin Immunol, 118:396-402, 2006.
Lestou VS et al. Characterization of the recurrent translocation t(1;1)(p36.3;q21.1-2) in non-Hodgkin lymphoma by multicolor banding and fluorescence in situ hybridization analysis.Genes Chromosomes Cancer, 36:375-81, 2003.
Liyanage, M.et al. Multicolour spectral karyotyping of mouse chromo-somes, Nat Genet, 14,312-5, 1996.
Luo C et al. DNA sequence comparative analysis of the 3pter-p26 region of human genome. Sci China C Life Sci., 48:34-40, 2005.
Maglott, Donna et al. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res., 33, D54-D58, 2005.
Maslen CL et al. Gene mapping, alternate splicing, and comparative genomic identification of the promoter region. Gene, 382:111-20, 2006.
McKusick, V.A.: Mendelian Inheritance in Man. A Catalog of Human Genes and Genetic Disorders. Baltimore: Johns Hopkins University Press, 1998 (12th edition)
Menzel S et al. Localization of the glucagon receptor gene to human chromosome band 17q25. Genomics 1994, 20:327-8.16, 1994.
Mitelman, F., and Heim, S. Consistent involvement of only 71 of the 329 chromosomal bands of the human genome in pri-mary neoplasia-associated rearrangements, Cancer Res, 48,7115-9, 1988.
Mitelman, F. et al. Mitelman Database of Chromosome Aberrations in Cancer. http://cgap.nci.nih.gov/Chromosomes/Mitelman, 2006.
Mu ZM et al. PML, a growth suppressor disrupted in acute promyelocytic leukemia, Mol Cell Biol.,14:6858-67, 1994.
Nowell, P.C.a.D.A.H. A minute chromosome in human chronic granulocytic leukemia, Science, 132,1497, 1960.
Pruitt, K.D. et al. Introducing RefSeq and LocusLink: curated human genome resources at the NCBI, Trends Genet, 16,44-7, 2000.
Pruitt, K.D. et al. NCBI Reference Sequence Project: update and current status, Nucleic Acids Res, 31,34-7, 2003.
Pruitt, K.D. et al. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts, and proteins. Nucleic Acids Res., 33, D501-D504, 2005.
Reiter A et al. Pathogenesis, diagnosis and monitoring of residual disease in acute promyelocytic leukaemia, Acta Haematol,112:55-67, 2004.
Rowley JD et al. 15/17 translocation, a consistent chromosomal change in acute promyelocytic leukaemia, Lancet.,1:549-50, 1977.
Schrock, E. et al. Multicolor spectral karyotyping of human chro-mosomes. Science, 273,494-7, 1996.
Shikoshi, K. et al. Therapy-related acute non-lymphocytic leukemia (M2) with 7;11 chromosome translocation induced into complete remission by low dose cytosine arabinoside and cytarabine ocfosfate therapy. Nippon Ronen Igakkai Zasshi, 31,468-71, 1994.
Solinas-Toldo, S. et al. Matrix-based comparative genomic hybridi-zation: Biochips to screen for genomic imbalances. Genes, Chromo-somes and Cancer, 20,399-407, 1997.
Stock AD et al. Precise localization by microdissection/reverse ISH and FISH of the t(15;17)(q24;q21.1) chromosomal breakpoints associated with acute promyelocytic leukemia. Cancer Genet Cytogenet, 119:15-7, 2000.
Tjio, J.H.a.A.L. The chromosome number of man, Hereditas, 42,1-6, 1956.
Torres MP et al. Human endopeptidase 24.15 (THOP1) is localized on chromosome 19p13.3 and is excluded from the linkage region for late-onset Alzheimer disease.Genomics, 53:239-40.16, 1998.
Ueyama H et al. FISH localization of human cytoplasmic actin genes ACTB to 7p22 and ACTG1 to 17q25 and characterization of related pseudogenes, Cytogenet Cell Genet., 74:221-4, 1996.
Wang, T.-L. et al. Digital karyotyping. Proc. Natl. Acad. Sci., 99,16156-61, 2002.
Weinstein, I.H. et al. Down's syndrome due to chromosome translocation. Prensa Med Argent, 56,1273-4, 1969.
Wassink TH et al. Evaluation of the chromosome 2q37.3 gene CENTG2 as an autism susceptibility gene.Am J Med Genet B Neuropsychiatr Genet 2005, 136:36-44.16, 2005.
Wheeler, D. L. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res., 34, D173-80, 2006.
Yen, K.-H. et al. A precise and scalable method for querying genes in chromosomal banding regions based on cytogenetic annotations. Bioinformatics, 21, 3469-74, 2005.
Yen, K.-H. et al. The analysis of inconsistencies between cytogenetic annota-tions and sequence mapping by defining the imprecision-zones of cytogenetic banding . Bioinformatics, 25, 845-52, 2009.
Zech, L. et al. Charac-teristic chromosomal abnormalities in biopsies and lymphoid-cell lines from patients with Burkitt and non-Burkitt lympho-mas, Int J Cancer, 17,47-56, 1976.