簡易檢索 / 詳目顯示

研究生: 蔡逸翩
Tsai, I-pien
論文名稱: 利用啟發式搜尋法和最大概率條件重建親緣關係樹
Reconstructing Phylogenetic Trees based on Heuristic Search and Maximum Likelihood
指導教授: 謝孫源
Hsieh, Sun-yuan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 醫學資訊研究所
Institute of Medical Informatics
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 48
中文關鍵詞: 演化樹重建計算生物學最大概率條件生物資訊學
外文關鍵詞: computational biology, bioinformatics, phylogeny reconstruction, maximum likelihood
相關次數: 點閱:156下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 演化樹重建是一個在計算分子生物學和生物化學中的基本問題。自從越來越多物種的基因序列被定序之後,快速而正確的重建演化樹也益發重要。過去有許多研究指出利用Maximum Likelihood (最大概率)條件所重建出的演化樹正確性會比其他方法來的高。但受限於計算複雜度,所以我們結合啟發式搜尋法來降低搜尋空間。這個方法是基於登山搜尋法的概念,從初始的樹型開始不斷改變它的topology以改進該樹的maximum likelihood。
    Tree bisection and reconnection (簡稱TBR) 是一種改變樹型的方法,他可以產生相當廣泛的tree space。為了使TBR更有效率的改進樹型,我們結合了minimum evolution的方法來過濾掉一些不可能的topology以降低搜索的時間。我們的方法利用計算改變前與後兩樹edge總和之差來選擇較佳的tree以避免對每個TBR後所產生的tree進行ML複雜的運算
    由使用基準序列資料作為評估,當物種資料量小時,所有的方法除了執行時間上的差別外,皆可找到接近最佳ML值的tree。當物種資料量增加時,我們的方法比起其他的方法仍然能找到最正確的tree。因此可以顯示出我們的方法表現比早先已知的方法為佳。

    The phylogeny reconstruction problem is a fundamental problem in computational molecular biology and biochemical physics. Since the number of data sets have increased astonishingly, the accuracy and speed of constructing phylogenies has became more and more important. Numerous studies have demonstrated that maximum likelihood (ML) is the most popular method to reconstruct an evolutionary tree from sequence data. Due to the computation complexity, the heuristic approach based on hill climbing is considered. The method starts from an initial tree and rearranges the tree to improve its likelihood. Tree bisection and reconnection (TBR) is a topological rearrangement that can generate extensively tree space than other method.
    To make TBR works more efficiently, we modify the operation of TBR and combine minimum evolution to filter the useless reconnected positions for reducing searching time. Our method avoids maximum likelihood calculation for every TBR operation by computing the change of tree length and updating new average distances of original and reconnected positions. In the small data set, our method together with other compared methods can find the near optimal maximum likelihood tree although the executing time are different. As the number of taxa grows, our method still can find the most accurate tree which is better than other methods. The experiment result shows that the performance of our method is superior than other methods.

    Abstract(in Chinese) i Abstract(in English) ii Acknowledgement iii Contents v List of Tables vii List of Figures viii 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 2 Bioinformatics Background 5 2.1 DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 2.2 Sequence alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Phylogenetic tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 Related Methods and Literature Survey 12 3.1 Distance-based method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12 3.1.1 Neighbor-Joining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12 3.1.2 Unweighted Pair Group Method with Arithmetic mean . . . . . . . 14 3.2 Character-based method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2.1 Maximum-parsimony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16 3.2.2 Maximum-likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4 Preliminaries 21 4.1 Basic Definition and Notation of Trees . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.3 Minimum Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24 4.4 Moving Distance of the Subtree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25 5 The Method 27 5.1 Tree Rearrangement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27 5.2 Edge Length Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29 5.3 Relationship Between Tree Length and ML . . . . . . . . . . . . . . . . . . . . .34 6 Experiment and Result 39 7 Concluding Remarks 45

    [1] J. Felsenstein, Inferring Phylogenies, Sinauer, 2004.
    [2] J. Felsenstein, "Evolutionary trees from DNA sequences: A maximum likelihood approach," Journal of Molecular Evolution, 17: 368-376, 1981.
    [3] N. C. Jones, P. A. Pevzner, An Introduction to Bioinformatics Algorithms, The MIT Press, 2004.
    [4] S. Guindon, O. Gascuel, "A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood," Systematic Biology, 52: 696-704, 2003.
    [5] Z. Yang, Computational Molecular Evolution, Oxford University Press, 2006.
    [6] D. B. West, Introduction to Graph Theory (Second Edition), Prentice Hall Inc., 2001.
    [7] V. Ranwez, O. Gascuel, "Improvement of distance-based phylogenetic methods by a local maximum likelihood approach using triplets," Molecular Biology and Evolution, 19: 1952-1963, 2002.
    [8] M. Rosenberg, S. Kumar, "Traditional phylogenetic reconstruction methods reconstruct shallow and deep evolutionary relationships equally well," Molecular Biology and Evolution, 18: 1823-1827 , 2001.
    [9] T. H. Jukes, C. R. Cantor, Evolution of Protein Molecules, H. N. Munro, Ed. Academy Press, 1969.
    [10] M. Kimura, "A simple method for estimating evolutionary rate of base substitutions through comparative studies of nucleotide sequences," Journal of Molecular Evolution, 16: 111-120, 1980.
    [11] M. Hasegawa, H. Kishino, T. Yano, "Dating of the human-ape splitting by a molecular clock of mitochondrial DNA," Journal of Molecular Evolution, 21: 160-174, 1985.
    [12] R. Desper, O. Gascuel, "Fast adn Accurate Phylogeny Reconstruction Algorithms Based on the Minimum-Evolution Principle," Journal of Computational Biology, 9: 687-705, 2002.
    [13] R. Desper, O. Gascuel, "The minimum evolution distance-based approach to phylogenetic inference," In O. Gascuel, Ed., Mathmatics of Evolution and Phylogeny, Oxford University Press, pp. 1-32, 2005.
    [14] O. Gascuel, "BIONJ: An improved version of the NJ algorithm based on a simple model of sequence data," Molecular Biology and Evolution, 14: 685-695, 1997.
    [15] A. Stamatakis, T. Ludwig, H. Meier, "RAxML-III: A Fast Program for Maximum Likelihood-based Inference of Large Phylogenetic Trees," Bioinformatics, 21: 456-463, 2005.
    [16] A. Stamatakis, P. Hoover, J. Rougemont, "A rapid bootstrap algorithm for the RAxML web servers," Systematic Biology, 75: 758-771, 2008.
    [17] J. P. Huelsenbeck, D. M. Hillis, "Success of phylogenetic methods in the four-taxon case," Systematic Biology, 42: 247-264, 1993.
    [18] M. K. Kuhner, J. Felsenstein, "A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates," Molecular Biology and Evolution, 11: 459-468, 1994.
    [19] A. W. F. Edwards, L. L. Cavalli-Sforza, "The reconstruction of evolution," Annals of Human Genetics, 27: 105-106, 1963.
    [20] J. Neyman, Statistical Decision Theory and Related Topics, Academy Press, 1971.
    [21] R. Brent, Algorithms for Minimization Without Derivatives, Prentice Hall Inc., 1973.
    [22] Y. Pauplin, "Direct calculation of a tree length using a distance matrix," Journal of Molecular Evolution, 51: 41-47, 2000.
    [23] T. M. Keane, T. J. Naughton, S. A. A. Travers, J. O. Mclnerney, G. P. McCormack, "DPRml: distributed phylogeny reconstruction by maximum likelihood," Bioinformatics, 21: 969-974, 2005.
    [24] K. G. McCracken, M. D. Sorenson, "Is homoplasy or lineage sorting the source of incongruent mtDNA and nucler gene tree in the sti®-tailed ducks," System Biology, 54: 35-55, 2005.
    [25] A. P. Vogler, A. Cardoso, T. G. Barraclough, "Exploring rate variation among and whithin sites in a densely sampled species tree: species level phlogenetics of north american tiger beetles," System Biology, 54: 4-20, 2005.
    [26] R. C. Winkworeth, D. Bryant, P. J. Lockhart, D. Havell, V. Moulton, "Biogeographic interpretation of splits graph: least squares optimization of branch length," System Biology, 54: 56-65, 2005.
    [27] Y. M. Yuan, S. Wohlhauser, M. MÄoller, J. Klackenberg, M. Callmander, P. KÄupfer, "Phylogeny and biogeography of exacum: a disjunctive distribution in the Indian Ocean basin resulting from long distance dispersal and extensive radiation," System Biology, 54: 21-34, 2005.
    [28] M. Bordewich, O. Gascuel, K. T. Huber, V. Moulton, "Consistency of topological moves based on the balanced minimum evolution principle of phylogenetic inference," IEEE/ACM Transaction on Computational Bioilogy and Bioinformatics, 6: January-March, 2009.
    [29] H. Jiang, C. Blouin, "Insertions and the emergence of novel protein structure: a structure-based phylogenetic study of insertions," BMC Bioinformatics, 8: 444-458, 2007.
    [30] S. Shuman, "The mRNA capping apparatus as drug target and guide to eukaryotic phylogeny," Cold Spring Harb Symp. Quant. Biol., 66: 301-312, 2001.
    [31] A. Lemmon, M. Milinkovitch, "The metapopulation genetic algorithm: an e±cient solution for the problem of large phylogeny estimation", Proceedings of the National Academy of Sciences of the United States of America, 99: 10516-10521, 2001.
    [32] J. Fauvel, "Algorithms in the pre-calculus classroom: Who was Newton- Raphson?", Mathematics in School, 27: 45-47, 1998.
    [33] S. R. Santos, D. J. Taylor, R. A. Kinzie, M. Hidaka, K. Sakai, M. A. Coffroth, "Molecular phylogeny of symbiotic dinoflagellates inferred from partial chloroplast large subunit (23S)-rDNA sequences," Molecular phylogenetics and evolution, 23: 97-111, 2002.
    [34] C. Michener, R. Sokal, "A quantitative approach to a problem in classification", Evolution, 11: 130-162, 1957.
    [35] N. Saitou, M. Nei, "The neighbor-joining method: A new method for reconstructing phylogenetic trees", Molecular Biology and Evolution, 4: 406-425, 1987.
    [36] U.S. National Library of Medicine, Genetics Home Reference, http://ghr.nlm.nih.gov/
    [37] Benjamin Cummings, an imprint of Addison Wesley Longman, Inc., 2001.
    [38] N. A. Campbell, J. B. Reece, Biology (Sixth Edition), Pearson Education, Inc., 2002.
    [39] M. Robinson, L. Foulds, "Comparison of weighted labeled trees", Lecture Notes in Mathmatics, 748: 119-126, 1979.
    [40] J. D. Thompson, D. G. Higgins, T. J. Gibson, "CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choic", Nucleic Acids Research, 22: 467-380, 1994.
    [41] T. Hodge, M. Cope, "A myosin family tree", Journal of Cell Science, 113: 3353-3354, 2000.
    [42] Access Excellence @ the National Health Museum, http://www.accessexcellence.org/RC/VL/GG/protein_synthesis.php
    [43] W. Ludwig et al., "ARB: A software environment for sequence data", Nucleic Acids Research, 32: 1363-1371, 2004.

    下載圖示 校內:2012-08-03公開
    校外:2012-08-03公開
    QR CODE