簡易檢索 / 詳目顯示

研究生: 黃瑋婷
Huang, Wei-ting
論文名稱: 結合基因演算法與吉氏抽樣數值方法預測DNA序列上之特殊功能序列
Motif Finding in DNA Sequences Using a Genetic Algorithm with Gibbs Sampler
指導教授: 王振興
Wan, Jeen-Shing
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2007
畢業學年度: 95
語文別: 英文
論文頁數: 54
中文關鍵詞: 基因演算法特殊功能序列
外文關鍵詞: GA, motif
相關次數: 點閱:55下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在分子生物學中,共同序列通常代表保留功能區域(conserved functional domains)。預測去氧核醣核酸(DNA)序列上之特殊功能序列在計算生物學上是一個十分重要的問題,其重要的應用為尋找保留功能區域。在這篇論文中,我們評估數種預測DNA序列上之特殊功能序列的方法,包括基因演算法、結構基因演算法以及吉氏抽樣數值方法(Gibbs sampler method)。其中結構基因演算法不需輸入所要尋找之功能序列的長度,而能自行收斂至較合適的長度是與其他兩種方法不同的地方。從這三種方法的模擬結果中,基因演算法的效能比其他兩種方法好。然而,吉氏抽樣數值方法在計算上較有效率。為了在預測特殊功能序列問題上有更好的效能及計算效率,我們結合基因演算法與吉氏抽樣數值方法。我們所提之方法的有效性可從測試範例的模擬結果證實。

    In molecular biology, consensus sequences often represent conserved functional domains. Motif finding in DNA sequences is a fundamental problem in computational biology, with important applications in finding conserved functional domains. In this thesis, we evaluate several methods including a genetic algorithm (GA), the structured genetic algorithm (S-GA), and Gibbs sampler method in the motif finding problem. The structured genetic algorithm can search for motifs of varying lengths without a fixed motif length as the input, which is different from the other two methods. From the simulation results of these three methods, the GA performs better than the other two approaches. However, the Gibbs sampler method is more efficient in computing. To enjoy better performance and efficiency, we combine the GA with the Gibbs sampler method for motif finding problems. We have validated the effectiveness of our approach through extensive simulations in diverse benchmark examples.

    CHINESE ABSTRACT i ABSTRACT ii ACKNOWLEDGEMENT iii LIST OF TABLES vi LIST OF FIGURES vii 1 Introduction 1-1 1.1 Biological Background 1-1 1.1.1 The Representation and Technicalities 1-1 1.1.2 Protein Synthesis 1-2 1.1.3 Transcription Factor Binding Sites 1-2 1.2 Motivation 1-4 1.3 Literature Review 1-5 1.4 Goal 1-6 1.5 Architecture 1-7 2 The Motif Finding Problem 2-1 2.1 The Motif Finding Problem 2-1 2.2 Some Methods for Motif Finding 2-5 2.2.1 Gibbs Sampler 2-6 2.2.2 CONSENSUS 2-7 2.2.3 Expectation Maximization 2-8 3 Method 3-1 3.1 Introduction of the Genetic Algorithm 3-1 3.2 Chromosome and Encoding 3-2 3.2.1 Chromosome 3-2 3.2.2 Encoding 3-4 3.3 The Flow of the Algorithm 3-4 3.4 Genetic Operators 3-6 3.4.1 Selection 3-6 3.4.2 Crossover 3-7 3.4.3 Mutation 3-9 3.5 Fitness Function 3-9 3.6 Structured Genetic Algorithm 3-10 3.7 Combine Gibbs Sampler with Genetic Algorithm 3-12 4 Simulation Results 4-1 4.1 Experimental Results 4-1 4.1.1 Genetic Algorithm 4-2 4.1.2 Genetic Algorithm vs. Structured Genetic Algorithm 4-4 4.1.3 Gibbs Sampler Method 4-5 4.1.4 Gibbs Sampler Method with Genetic Algorithm 4-6 5 Conclusion 5-1 5.1 Conclusion and Contribution 5-1 5.2 Future Work 5-2 References

    [1] F. Crick, “Central Dogma of Molecular Biology,” Nature, vol. 227, pp. 561-563, Aug 1970.
    [2] Y. Moreau, F. de Smet, G. Thijs, K. Marchal and B. de Moor, ”Functional Bioinformatics of Microarray Data: From Expression to Regulation,” Proceedings of the IEEE, vol. 90, no. 11, Nov 2002.
    [3] F. F. M. Liu, J. J. P. Tsai, R. M. Chen, S.N. Chen and S.H. Shih, “FMGA:Finding Motifs by Genetic Algorithm,” Proceedings of the Fourth IEEE Symposium on Bioinformatics and Bioengineering (BIBE’04), pp. 459-466, May 2004.
    [4] G. D. Stormo, “DNA binding sites: representation and discovery,” Bioinformatics, vol. 16, no. 1, pp. 16-23, 2000.
    [5] C. E. Lawrence, S. F. Altshul, M. S. Boguski, J. S. Liu, A. F. Neuwald and J. C. Wootton, “Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment,” Science, vol. 262, pp. 208-214, 1993.
    [6] T. L. Bailey, C. Elkan, “Fitting a mixture model by expectation maximization to discover motifs in biopolymers,” Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, vol. 2, pp. 28-36, 1994.
    [7] G. Z. Hertz and G. D. Stormo, “Identifying DNA and protein patterns with statistically significant alignments of multiple sequences,” Bioinformatics, vol. 15, no. 1, pp. 563-577, 1999.
    [8] J. Buhler and M. Tompa, “Finding Motifs Using Random Projections,” Journal of Computational Biology, vol. 9, no. 2, pp. 225-242, 2002.
    [9] E. Eskin and P.A. Pevzner, “Finding Composite Regulatory Patterns in DNA Sequences,” Bioinformatics, vol. 18, no. 1, pp. 354-363, 2002.
    [10] U. Keich and P.A. Pevzner, “Finding Motifs in the Twilight Zone,” Bioinformatics, vol. 18, no. 10, pp. 1374-1381, 2002.
    [11] A. Price, S. Ramabhadran and P. A. Pevzner, “Finding subtle motifs by branching from sample Strings,” Bioinformatics, vol. 1, no. 1, pp. 1-7, 2003.
    [12] B. Raphael, L. T. Liu and G. Varghese, “A Uniform Projection Method for Motif Discovery in DNA Sequences,” IEEE Transactions on Computational biology and Bioinformatics, vol. 1, no. 2, pp. 91-94, 2004.
    [13] D. Liu, X. Xiong, B. DasGupta and H. Zhang, ”Motif discoveries in unaligned molecular sequences using self-organizing neural networks,” IEEE Transaction on Neural networks, vol. 17,no. 4, July 2006.
    [14] L. R. Cardon and G. D. Stormo, “Expectation maximization algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragments,” Journal of Molecular Biology, vol. 223, pp. 159-70, 1992.
    [15] C. E. Lawrence and A. A. Reilly, “An expectation maximization algorithm for the identification and characterization of common sites in unaligned biopolymer sequences,” Proteins: Structure, Function, and Genetics, vol. 7, iss. 1, pp.41-51, Feb 2004.
    [16] P. A. Pevzner and S. H. Sze, “Combinatorial approaches to finding subtle signals in DNA sequences,” Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology, pp. 269-78, 2000.
    [17] N. C. Jones and P. A. Pevzner, An Introduction to Bioinformatics Algorithms, MIT Press Cambridge, Mass, 2004.
    [18] Y. J. Liao, “Motif finding in biological sequences”, M.S. thesis, Department of Computer Science and Engineering, National Sun Yat-sen University, Taiwan, 2003.
    [19] J. H. Holland, “Genetic algorithms,” Scientific American, vol. 267, pp. 44-50, 1992.
    [20] D. E. Goldberg, Genetic algorithms in Search, optimization and machine learning, Kluwer Academic Publishers, Boston, MA, 1989.
    [21] P. J. Fleming, R. C. Purshouse. “Evolutionary algorithms in control systems engineering: a survey.” Control Engineering Practice, vol.10, pp. 1223-1241, 2002.
    [22] A. J. F. Griffiths et al, An Introduction to Genetic Analysis, W. H. Freeman, New York, seventh edition, 2000.
    [23] M. Stine, D. Dasgupta and S. Mukatira, “Motif Discovery in Upstream Sequences of Coordinately Expressed Genes”, Evolutionary Computation of the IEEE, vol. 3, pp. 1596-1603, Dec 2003.
    [24] D. Dasgupta and D. R. McGregor, “Non-stationary function optimization using the structured genetic algorithm,” Proceedings of the Second International Conference on Parallel Problem Solving From Nature (PPSN-2) Conference, pp. 145-154, Sep 1992.
    [25] D. Dasgupta and D. R. McGregor, “sGA: A structured genetic algorithm,” Technical Report IKBS-8-92, University of Strathclyde, 1992.
    [26] M. T. Lee, “Motif finding”, class notes for GCB 535 / CIS 535, Department of Computer and Information Science, University of Pennsylvania, 10 Oct 2004.

    下載圖示 校內:2009-08-06公開
    校外:2009-08-06公開
    QR CODE