簡易檢索 / 詳目顯示

研究生: 李宜勳
Lee, i-hsun
論文名稱: 整合異質性資料以預測基因網路
Combining heterogeneous data for predicting gene network
指導教授: 王惠嘉
Wang, Hei-Chia
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2005
畢業學年度: 93
語文別: 中文
論文頁數: 57
中文關鍵詞: 基因微陣列關聯法則異質性資料整合資訊擷取時間序列基因網路
外文關鍵詞: information retreival, Association rule, time series, microarray, Combining heterogeneous data
相關次數: 點閱:111下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   由於生物資訊的蓬勃發展,各種的生物資料不斷的被提出,像是序列資料、微陣列資料以及文獻資料庫等。然而這些生物異質資料中,雖彼此之間有著高度的相關性,但少被一併探討,因此本研究希望藉由異質性資料之整合來預測出新的生物知識。在這些生物資料中,生物晶片資料是目前較受到高度重視的一種資料,主要是因為生物晶片技術的崛起使得生物學家得以一次針對大量基因做不同的處理,但生物晶片技術將會產生龐大的資料,然而,目前的工具所能從微陣列資料獲得的資訊有限,因此需要一套系統化的方法處理這些數據,並結合其它生物知識將隱藏在這些資料背後的重要訊息擷取出來。在處理這個問題上,目前多數學者採用叢集演算法來分析生物晶片中的基因微陣列的資料,但此方法只能找出相同表現型態之基因,其無法找出基因彼此之間更密切的調控關係,因此有學者藉由建置基因網路來探討其基因之間詳細之關聯但目前在建置基因網路的研究中大多沒有考慮基因微陣列本身的特性--基因數量會遠大於實驗之次數,此種特性將會導致基因微陣列資料其預測結果之正確率不高。因此,本研究認為,要建置出一個優良的基因網路首先就是要找出有相關聯性的基因,以減低基因網路之複雜度。因此本研究將採用關聯法則配合著時間序列的處理來分析基因與基因在此實驗內的關連性,以找出微陣列資料中所隱藏之資訊。由於生物中各種異質性資料其彼此之間也許存在著相同的結果,因此本研究藉由擷取生物文獻資料的方法以便將隱藏在這些資料背後的重要訊息發掘出來。在找出真正有關聯性之基因後,由於本研究是法則無法表示出基因彼此之因果關係,本研究利用動態貝氏網路(Dynamic Bayesian Network ,DBN)自動建置基因網路,以達到多方考慮及兼顧效目標。

     In order to understand gene and put gene to use, we must know the gene’s function. The regulation of gene expression is achieved through gene networks of interactions between DNA, RNA, proteins, and small molecules. Using microarray technology to predict gene network has become important in research. However, microarray data are complicated and require a powerful systematic method to handle these data. Dynamic Bayesian network (DBN) is an suitable method to predict gene regulatory network. If we use all genes in microarray experiment, we may find some problems which may lead to the low accuracy and excessive computational time. In this paper, we use time-interval approach to transform microarray data in order to use apriori algorithm to find the gene’s relation, and then use these related genes to apply a dynamic Bayesian network to find gene network. Unlike the previous techniques, this method not only reduces the comparison complexity but also reveals more mutual interaction among genes. But microarray data can’t contain enough information for finding gene relations, we may use other biological data to raise the accuracy. We choose a large number of literature data to filter mining result to raise the accuracy.

    1. 序論  1  1.1. 研究背景  1  1.2. 研究動機與目的 1  1.3. 研究流程   4  1.4. 研究範圍與架設 4  1.5. 論文章節說明 4 2. 文獻探討 6  2.1. 基因微陣列晶片技術簡介 6  2.2. 資料探勘技術在基因微陣列之應用 9   2.2.1. 關聯法則 9  2.3. 資訊擷取相關技術 11   2.3.1. 向量模式 11  2.4. 建置基因網路相關技術 13   2.4.1. 布林網路 13   2.4.2. 線性模式 14   2.4.3. 貝式網路 14   2.4.4. 動態貝式網路 17 3. 研究方法 20  3.1. 相關研究之異同 20  3.2. 研究架構 21  3.3. 探勘引擎模組 23  3.4. 資訊擷取模組 29   3.4.1. 基因關聯引擎 29   3.4.2. 法則基因關聯強度 30   3.4.3. 法則排序演算法 31  3.5. 建置網路模組 32   3.5.1. 建置基因網路 32 4. 實作驗證 34  4.1. 系統建構 34   4.1.1. Mining Engine Server 35   4.1.2. Information Retrieval Server 35   4.1.3. Constructing Network Server 37  4.2. 資料來源 37  4.3. 實驗方法與比較項目 37   4.3.1. 實驗方法 37   4.3.2. 評估指標 38   4.3.3. 參數設定 39  4.4. 實驗結果與分析 40   4.4.1. 實驗一之結果與分析 40   4.4.2. 實驗二之結果與分析 42   4.4.3. 實驗三之結果與分析 44   4.4.4. 結論 47 5. 結論與未來研究方向 49  5.1. 結論 49  5.2. 未來研究方向 50 6. Reference 51 附錄A 時間資料轉換流程演算法 56

    Agrawal, R. and Srikant, R. (1994) Fast algorithm for mining association rules in large databases. In 20th International Conference on Very Large Data Bases, 487-499.
    Agrawal, R., Umielinski, T. and Swami, A. (1993) Mining association rules between sets of items in large database. In The ACM SIGMOD International Conference on Management of Data, 207-216.
    Baeza-Yates, R. and Ribeiro-Neto, B. (1999) Modern Information Retrieval. New York: The ACM Press.
    Bailis, J.M., Bernard, P., Antonelli, R., Allshire, R.C. and Forsburg, S.L. (2003) Hsk1-Dfp1 is required for heterochromatin-mediated cohesion at centromeres. Nature cell biology, 5(12), 1111-1116.
    Becquet, C., Blachon, S., Jeudy, B., Boulicaut, J.F. and Gandrillon, O. (2003) Strong-association-rule mining for large-scale gene-expression data analysis: a case study o human SAGE data. Genome Biology, 12, 1-16.
    Botstein, D. and Altman, R. B. (2001) Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6), 520-525.
    Creighton, C. and Hanash, S. (2003) Mining gene expression databases for association rules. Bioinformatics, 19, 79-86.
    Becquet, C., Blachon, S., Jeudy, B., Boulicaut, J.F. and Gandrillon,O. (2003) Strong-association-rule mining for large-scale gene-expression data analysis: a case study o human SAGE data. Genome Biology, 12, 1-16.
    Claverie, J.M. (1999) Computational methods for the identification of differential and coordinated gene expression. Human Molecular Genetic, 8, 1821–1832.
    Doddi, S., Marathe, A., Ravi, S.S. and Torney, D.C. (2001) Discovery of association rules in medical data. Med Information Internet Med., 26, 25–33.
    Eisen, M.B., Spellman, P. T., Brown, P. O. and Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Natl Acad. Sceience , 95, 14863-14868.
    Eisenberg, D., Marcotte, M.E., Xenarios, I. and Yeates, O.T. (2000) Protein function in the post-genomic era. Nature, 405, 823-826.
    Ewing, B. and Green, P. (2000) Analysis of expressed sequence tags indicates 35,000 human genes. Nature Genetics, 25, 232-234.
    Friedman, N., Linial, M., Nachman, I. and Pe’er, D. (2000) Using Bayesian network to analyze expression data. Computational biology, 7(3), 601-620.
    Hieter, P. and Boguski, M. (1997) Functional Genomics: It’s All How You Read It. Science, 278, 601-602.
    Jenssen, T., Lagreid, A., Komorowski, J. and Hovig; E. (2001) A literature network of human genes for high-throughput analysis of gene expression. Nature genetics, 28, 21 – 28.
    Ji, L. anf Tan, K. L. (2004) Mining Gene expression data for positive and negative co-regulated gene cluster. Bioinformatics, 20(16), 2711-2718
    John, H. D. (2002) Modeling and simulation of genetic regulatory system: a literature review. Computational biology, 9(1), 67-103.
    Imoto, S., Goto, T. and Miyano, S. (2002) Estimation of genetic networks and functional structures between genes by using Bayesian networks and nonparametric regression. Pacific Symposium on Biocomputing, 1, 175-186
    Imoto, S., Higuchi, T., Goto, T., Tashiro, K., Kuhara, S. and Miyano, S. (2003) Combining microarrays and biological knowledge for estimating gene networks via Bayesian networks. Computational biology, 2(1), 77-98.
    Jackson, A.L., Pahl, P.M., Harrison, K., Rosamond, J. and Sclafani, R.A. (1993) Cell cycle regulation of the yeast Cdc7 protein kinase by association with the Dbf4 protein. Molecular and cellular biology, 13(5), 2899-2908.
    Jong, H.D. (2002) Modeling and simulation of genetic regulatory systems: A literature review. Computational biology, 9(1), 67-103
    Kim, S. Y., Imoto ,S. and Miyano, S. (2003) Inferring gene networks from time series microarray data using Dynamic Bayesian Networks. Briefing in bioinformatics, 4(3), 228-235.
    Kim, S.Y., Imoto, S. and Miyano, S. (2004) Dynamic Bayesian networks and non parametric regression for nonlinear modeling of gene networks from time series gene expression data. Biosystems, 75, 57-65.
    Koc, A., Wheeler, L.J., Mathews, C.K. and Merrill, G.F. (2003) Replication-independent MCB gene induction and deoxyribonucleotide accumulation at G1/S in Saccharomyces cerevisiae. JOURNAL OF BIOLOGICAL CHEMISTRY, 278(11), 9345-9352. 
    Le, P.P., Bahl, A. and Ungar, L.H. (2004) Using prior knowledge to improve genetic network reconstruction from microarray data. In silico biology, 4, 335-353.
    Lee, S. J. and Siau, K. (2001) A review of data mining techniques. Industrial Management & Data Systems, 101(1), 41-46.
    Lelandais, G., Crom, S. L., Devaux1, F., Vialette1, S., Church, G. M., Jacq, C. and Marc, P. (2004) YMGV: a cross-species expression data mining tool. Nucleic Acid Research, 32, 323-325.
    Leung, Y.F. and Cavalieri, D. (2003) Fundamentalof cDNA microarray data analysis. Trends in genetics, 19(11), 649-659.
    Liang, S., Fuhrman, S., and Somogyi, R. (1998). Reveal, A General Reverse Engineering Algorithm for Inference of Genetic Network Architectures. Pacific Symposium on Bioinformatics, 3, 18-29.
    Murphy, K. and Mian, S. (1999) Modeling gene expression data using dynamic Bayesian networks. Technical Report, Computer Science Division, University of California, Berkeley, CA.
    Nariai, N., Kim, S., Imoto, S. and Miyano, S. (2004) Using protein-protein interactions for refining gene networks estimated from microarray data by Bayesian networks. Pac Symp Biocomput, 336-46.
    Narayanasamy, V., Mukhopadhyay, S., Palakal, M. and Potter, D.A. (2004) TransMiner:Mining Transitive Associations among Biological Objects form Text. journal of biomedical science, 11, 864-873.
    Ong, I.M., Glasner, J.D. and Page, D. (2002) Modeling regulatory pathways in E.coli from time series expression profiles. Bioinformatics, 18, 241-248.
    Shmulevich, I., Dougherty, E.R., Kim, S. and Zhang, W. (2002). Probabilistic Boolean Networks: A Rule-based Uncertainty Model for Gene Regulatory Networks. Bioinformatics, 18(2), 261-274.
    Smolen, J.S., Butcher, B., Fritzler, M.J., Gordon, T., Hardin, J., Kalden, J.R., Lahita, R., Maini, R.N., Reeves, W., Reichlin, M., Rothfield, N., Takasaki, Y., van Venrooij, W.J. and Tan ,E.M. (1997) Reference sera for antinuclear antibodies. II. Further definition of antibody specificities in international antinuclear antibody reference sera by immunofluorescence and western blotting. Arthritis Rheum, 40(3), 413-418.
    Torgeir, R.H., Astrid, L. and Jan, K. (2002) Learning rule-based models of biological process from gene expression time profiles using Gene Ontology. Bioinformatics, 19, 1116-1123.
    Raychaudhuri, S., Sutphin, P.D., Chang, J.T., and Altman, R.B. (2001) Basic microarray analysis: Grouping and feature reduction. Trends in Biotechnology, 19, 189-193.
    van Someren E.P., Wessels, L.F.A. and Reinders, M.J.T. (2000) Linear Modeling of Genetic Networks from Experimental Data. Intelligent Systems for Molecular Biology, 8, 355-366.
    Webb, G.I. and Zhang, S. (2005) K-Optimal Rule Discovery. Data mining and Knowledge Discovery, 10(1), 39-79.
    Zou, M. and Conzen, S.D. (2004) A new dynamic Bayesian network approach for identifying gene regulatory networks from time course microarray data. Bioinformatics, 21(1), 71-79.
    Cui, Z., Horecka, J. and Jigami, Y. (2002) Cdc4 is involved in the transcriptional control of OCH1, a gene encoding alpha-1,6-mannosyltransferase in Saccharomyces cerevisiae. Yeast, 19(1), 69-77.
    Sherlock, G. and Rosamond, J. (1993) Starting to cycle: G1 controls regulating cell division in budding yeast. Microbiology, 139(11), 2531-2541.

    下載圖示 校內:2006-06-27公開
    校外:2008-06-27公開
    QR CODE