簡易檢索 / 詳目顯示

研究生: 詹博丞
Chan, Bo-Cheng
論文名稱: 使用家族特定殘基之圖形探勘演算法進行蛋白質分類
A Graph-Mining-Based Algorithm for Classifying Protein Using Family-Specific Residue
指導教授: 謝孫源
Hsieh, Sun-Yuan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2016
畢業學年度: 105
語文別: 英文
論文頁數: 54
中文關鍵詞: 子圖探勘B因子生物資訊學蛋白質分類
外文關鍵詞: Subgraph mining, B-Factor, bioinformatics, protein classifications
相關次數: 點閱:86下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 蛋白質結構分類在生物資訊學中是一個相當重要的議題。在本篇論文裡,我們使用簡單的連通圖來表示蛋白質的三維結構,其中,每個節點用來表示一個氨基酸,每個邊表示兩個氨基酸之間的接觸距離。在本篇論文中,我們使用B-factor(原子位移參數)減少每個圖中的節點和邊的數量。我們使用圖形探勘演算法找出家族中的關鍵子圖來對蛋白質結構進行家族分類。
    實驗的部分,我們利用序列比對結果所生成的e -value (e-value = 1-e^-10,e=2.718)。將所提出來的演算法與BLAST,BLAT 和DALI 進行比較,並利用SCOP資料庫來驗證分類的結果。實驗結果也顯示該方法可以準確且有效地分類蛋白質結構。

    Protein structural classification is vital in bioinformatics. In this study, a simple
    and connected graph was used to represent a three-dimensional protein structure in
    which each node represented an amino acid and each edge represented a contact distance
    between two amino acids. A B-factor (an atomic displacement parameters) was
    then used to substantially reduce the number of nodes and edges in each graph representation.
    A graph mining approach was applied to determine the critical subgraphs
    among these graphs, which can be used to classify protein structural families. The
    proposed algorithm was compared with BLAST, BLAT and DALI. The alignment results
    generated e-value, based on e-value = 1 − e^−10 (e=2.718), and the classification
    result were examined accordingly. Moreover, an experiment was conducted, and in
    this experiment, characteristic substructural patterns were identified in several protein
    families in the SCOP database. The experimental results demonstrated that the proposed
    method can classify the protein structures accurately and effciently.

    Contents 1 Introduction 1 2 Related Works 5 2.1 Breadth-First-Search (BFS) Strategy 6 2.2 Depth-First-Search (DFS) Strategy 6 3 Preliminaries and Background 8 3.1 Basic of Graph Theory 8 3.2 Graph Isomorphism Problems 11 3.2.1 Notations 11 3.2.2 Graph Isomorphism 12 3.2.3 Graph Isomorphism Problems 13 3.3 Protein Structure Characteristics 15 3.3.1 Levels of Protein Structure 15 3.3.2 The Protein Data Bank 20 4 Proposed Algorithm 22 4.1 Creating Protein Graphs 22 4.2 Representing a Protein Graph by an Adjacency Matrix 26 4.3 Canonical Adjacency Matrix (CAM) Tree 27 4.4 Classification of a protein family 33 5 Experimental Study 36 5.1 Implementation and Test Platform 36 5.2 Protein Representation as a Simple Graph 38 5.3 The Accuracy of Classification of Each Protein Family 40 6 Conclusion 49

    [1] P. Aloy, E. Querol, F. X. Aviles, and M. J. E. Sternberg, "Automates Structure based Prediction of Functional Sites in Proteins: Applications to Assessing the Validity of Inheriting Protein Function from Homology in Genome Annotation and to Protein Docking," Journal of Molecular Biology , vol. 311, no. 2,pp. 395-408, 2001.
    [2] S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang,W. Miller and D. J. Limpman, "Gapped BLAST and PSI-BLAST: a New Generation of Protein Database Search Programs." Nucleic acids research , vol. 25, no. 17, pp. 3389-3402, 1997.
    [3] Z. Aung and K. L. Tan, "Automatic Protein Structure Classification through Structural Fingerprinting," in Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering (ICBBE), pp. 508-515, March 19-21, 2004.
    [4] D. Bandyopadhyay, J. Huan, J. Liu, J. Prins, J. Snoeyink, A.Tropsha, and W.Wang, "Using Fast Subgraph Isomorphism Checking for Protein Functional Annotation Using SCOP and Gene Ontology,"The University of North Carolina at Chapel Hill Department of Computer Science Technical Report , 2005.
    [5] D. Bandyopadhyay, J. Huan, J. Liu, J. Prins, J. Snoeyink, W. Wang, andA.Tropsha, "Functional Neighbors: Inferring Relationships between Nonhomologous Protein Families Using Family-Specific Packing Motifs," IEEE Transactions on Information Technology in Biomedicine , vol. 14, no. 5, pp. 1137-1143, 2010.
    [6] D. Fischer, H. Wolfson, and R. Nussinov, "Three-dimensional, sequence Order-Independent Structural Comparison of a Serine Protease Against the Crystallographic Database Reveals Active Site Similarities: Potential implications to Evolution and to Protein Folding," Protein Science, vol. 3, no. 5, pp. 769-778, 1994.
    [7] Thomas G. Gligoris, Johanna C. Scheinost, Frank Brmann, NaomiPetela, Kok-Lung Chan, Pelin Uluocak, Frdric Beckout, Stephan Gruber, Kim Nasmyth1, and Jan Lwe, "Closing the Cohesin Ring: Structure and Function of its Smc3-kleisin Interface,"Science, vol. 346, no. 6212, pp. 963-967, 21 November,2014.
    [8] S. Henikoff, J. G. Henikoff, and S. Pietrokovski, "Blocks+: Anon-Redundant Database of Protein Alignment Blocks Derived from Multiple Compilations," Bioinformatics , vol. 15, no. 6, pp. 471-479, 1999.
    [9] L. B. Holder, D. J. Cook, and S. Djoko, "Substructure Discovery in the SUBDUE System," in Proceedings of the Association for the Advancement of Artificial Intelligence Workshop on Knowledge, Discovery in Database (AAAI), pp. 169-180, July 31, 1994.
    [10] L. Holm and P. RosenstrO m, "DALI server: conservation mapping in 3D". Nucleic Acids Research, vol. 38, suppl 2, July 1, 2010, pp. W545-W549.
    [11] J. Huan, W. Wang, and J. Prins, "Efficient Mining of Frequent Subgraph in the Presence of Isomorphism," in Proceedings of the 3th IEEE International Conference on Data Mining (ICDM), pp. 549-552, November 19-22, 2003.
    [12] J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, and A.Tropsha, "Mining Protein Family Specific Residue Packing Patterns From Protein Structure Graphs," in Proceedings of the 8th Annual International Conference on Research in Computational Molecular Biology (RECOMB), pp. 308-315, March 27-31, 2004.
    [13] J. Huan, D. Bandyopadhyay, W. Wang, J. Snoeyink, J. Prins, and A.Tropsha, "Comparing Graph Representations of Protein Structure for Mining Family-Specific Residue-Based Packing Motifs,"Journal of Computational Biology, vol.12, no. 6, pp. 657-671, 2005.
    [14] A. Inokuchi, T. Washio, and H. Motoda, "An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data," in Proceedings of the 4th European Conference on Principles and Practice of Knowledge Discovery in Data Mining (PKDD), pp.13-23, 2000.
    [15] W. J. Kent "BLAT-the BLAST-Like Alignment Tool," Genome Research, vol. 12, no. 4, pp. 656-664, 2000.
    [16] V. Krishna, N. N. R. R. Suri, and G. Athithan, "A Comparative Survey of Algorithms for Frequent Subgraph Discovery,"Current Science, vol. 100, no. 25, pp. 190-198, 2011.
    [17] M. Kuramochi and G. Karypis, "Frequent Subgraph Discovery," in Proceedings of the 1st IEEE Conference on Data Mining(ICDM), pp. 313-320, November 29 -December 2, 2001.
    [18] M. Laberge and T. Yonetani "Common Dynamics of Globin Family Proteins,"International Union of Biochemistry and Molecular Biology , vol. 59, no.8, pp. 528-534, 2007.
    [19] W. W. M. Lam, and K. C. C. Chan, "A Graph Mining Algorithm for Classifying Chemical Compounds," in Proceedings of IEEE International Conference on Bioinformatics and Biomedicine(BIBM) , pp. 321-324, November 3 - 5, 2008.
    [20] D. Landsman "Common Sequence and Structural Features in the Heat-Shock Factor and Ets Families of DNA-Binding Domains,"Trends in Biochem Science , vol. 20, no. 6, pp. 225-226,1995.
    [21] J. A. Lees, M. Saito, M. Vidal, M. Valentine, T. Look, E. Harlow,and K. Helin, "The Retinoblastoma Protein Binds to a Family of E2F Transcription Factors, "Molecular and Cellular Biology, vol. 13, no. 12, pp. 7813-7825, 1993.
    [22] Marc Lenoir, Micha Grzybek, Micha Majkowski, Sandya Rajesh,Jaswant Kaur, Sara B.-M. Whittaker, nal Coskun, Michael Overduin,"Structural Basis of Dynamic Membrane Recognition by trans-Golgi Network Specific FAPP Proteins," Journal of Molecular Biology, vol. 427, issue 4, pp. 966-981, 27 February 2015.
    [23] D. S. Lieber, O. Elemento, and S. Tavazoie, "Large-Scale Discovery and Characterization of Protein Regulatory Motifs in Eukaryotes," PloS One, vol. 5, no. 12, e14444, 2010.
    [24] S. Nijssen and J. N. Kok, "A Quick start in Frequent Structure Mining can make a Difference," in Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 647-652, August 22-25, 2004.
    [25] S. Nijssen and J. N. Kok, "Frequent Graph Mining and its Application to Molecular Databases," in Proceedings of the IEEE International Conference on Systems, Man and Cybernetics(SMC) , vol. 5, pp. 4571-4577, October 10-13, 2004
    [26] S. Parthasarathy and M. R. N. Murthy, "Protein Thermal Stability:Insights from Atomic Displacement Parameters (B-values),"Protein Engineering , vol. 13, no. 1, pp. 9-13, 2000.
    [27] A. M. Petros, E. T. Olejniczak, and S. W. Fesik, "Structural Biology of the Bcl-2 Family of Proteins," Biochimica et Biophysica Acta (BBA)-Molecular Cell Research, vol. 1644, no. 2,pp. 83-94, 2004.
    [28] E. Remold-O'Donnell, "The Ovalbumin Family of Serpin Proteins,"Federation of European Biochemical Societies Letters, vol.315, no, 2, pp. 105-108, 1993.
    [29] D. E. Tronrud and E. Dale, "Knowledge-Based B-factor Restraints for the Refinement of Proteins," Journal of applied crystallography, vol. 29, no. 2, pp. 100-104, 1996.
    [30] S. Vishveshwara, K. V. Brinda, and N. Kannan,"Protein Structure: Insights From Graph Theory,"Journal of Theoretical and Computational Chemistry, vol. 1, no. 1, pp. 187-211, 2002.
    [31] B. Wackersreuther, P. Wackersreuther, and A. Oswald, "Frequent Subgraph Discovery in Dynamic Networks,"in Proceedings of the 8th Workshop on Mining and
    Learning with Graphs (MLG), pp.155-162, August 5, 2010.
    [32] Shu-Lin Wang, Xueling Li, Shanwen Zhang, Jie Gui, and D.S.Huang, "Tumor Classification by Combining PNN Classifier Ensemble with Neighborhood Rough Set Based Gene Reduction,"Computers in Biology and Medicine, vol. 40, no.2, pp. 179-189, 2010.
    [33] Bing Wang, D. S. Huang, and Changjun Jiang, "A new Strategy for Protein Interface Identification Using Manifold Learning Method,"IEEE Transactions on NanoBioscience, vol.13, no.2, pp. 118-123, 2014.
    [34] N. Weskamp, D. Kuhn, E. Hllermeier, and G. Klebe, "Efficient Similarity Search in Protein Structure Databases by k - Clique Hashing," Bioinformatics, vol. 20, no. 10, pp. 1522-1526,2005.
    [35] D. W. Williams, J. Huan, and W. Wang, "Graph Database Indexing Using Structured Graph Decomposition," in Proceedings of the 23th International Conference on Data Engineering (ICDE), pp. 976-975, April 15-20, 2007.
    [36] X. Yan and J. Han, "gSpan: Graph-Based Substructure Pattern Mining," in Proceedings of the 3th IEEE International Conference on Data Mining (ICDM), pp. 721-724, November 19-22,2002.
    [37] Hong-Jie Yu and D.S.Huang, "Novel 20-D Descriptors of Protein Sequences and It's Applications in Similarity Analysis," Chemical Physics Letters, vol. 531, pp. 261-266, 2012.
    [38] Z. Yuan, T. L. Bailey, and R. D. Teasdale, "Prediction of Protein B-factor Profiles," Proteins, vol. 58, no. 4, pp. 905-912, 2005.
    [39] K. Zhang, F. Rao, L. Wang, B. K. Rana, S. Ghosh, M. Mahata, R. m.Salem, J. L. Rodriquez-Flores, M. M. Funq, J. Waalen, B. Tayo, L.Taupenot, S. K. Mahata, and D. T. O'Connor, "Common Functional Genetic Variants in Catecholamine Storage Vesicle Protein Promoter Motifs Interact to Trigger Systemic Hypertension," Journal of the American College of Cardiology, vol. 55, no. 14, pp. 1463-1475, 2010.
    [40] B-factors and coordinate error. http://www.proteopedia.org/wiki/index.php/Temperature_value
    [41] Protein data obtained from Protein Data Bank. http://www.rcsb.org/pdb/home/home.do

    無法下載圖示 校內:2021-11-28公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE