簡易檢索 / 詳目顯示

研究生: 鄭少斐
Cheng, Shaun-Fei
論文名稱: 以資料探勘法探索可轉換生質能之微生物
Exploring the microorganisms for biomass energy conversion by using data mining
指導教授: 洪振益
Hung, Chen-I
陳朝光
Chen, Cha`o-Kuang
學位類別: 博士
Doctor
系所名稱: 工學院 - 機械工程學系
Department of Mechanical Engineering
論文出版年: 2013
畢業學年度: 101
語文別: 中文
論文頁數: 102
中文關鍵詞: 生質能源古生菌模糊群聚分析法樹狀結構圖
外文關鍵詞: biomass energy, Archaea, fuzzy C-means, dendrogram
相關次數: 點閱:59下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 屬於潔淨能源的生質能具備了環保的概念與優勢,它可透過生物轉換的方式得到可燃性的氣體,此轉換關鍵在於微生物的選擇,而古生菌(Archaea)能在極端的環境下生存,並且可透過基因的代謝來釋放能量,具備這些性質使得古生菌有助於生質能的轉換,此外古生菌主要包含了甲烷菌、嗜鹽菌、極端嗜熱菌三大類。
    這些古生菌對於生質能轉換的幫助所呈現的效果優劣並非一致,然而,為了要從這三大類的古生菌中選出符合生物轉換條件的菌種,且考量到實驗時篩選的時間與成本,因此本文嘗試提供群聚分析的方法作為初期的篩選工具。方法方面,則是使用模糊群聚分析法與階層式群聚分析法,並選擇27株古生菌的基因組,即密碼子使用偏向數據做為分析的資料量與變量,最後經過實例驗證,這兩種群聚分析法,可以達到篩選出最適的古生菌,而這些古生菌在性質上符合了生質能轉換的條件。
    上述方法中模糊群聚分析的優勢為可以從少量的資料發掘出有用的資訊;另外,階層式群聚分析的優勢則是擁有可以快速掌握分群狀態的樹狀結構圖,它們共同的特點是:皆以亮胺酸的群聚結果與生物學上的分類相吻合、演算過程快速穩定、分群明確。
    本研究所提出的方法,可達到篩選的目的,而被探索出的菌種,皆可被應用於生質能轉換,因此藉由此方法,不須透過繁雜的實驗,即可得到適合的菌種,所以本文的方法可以作為實驗的前置作業,在效益方面,除了提升解析上的效率,也可以降低成本;此外,本研究發現了胺基酸與生質能之間,具有相對的影響力與關係。

    Biomass energy is a type of clean energy that can produce inflammable gas through biological conversion. This conversion depends on the choice of microorganism. Archaea are microbes that can survive in extreme environments and that release energy via genetic metabolism. These properties make Archaea useful for biomass energy conversion.
    The Archaea, however, are made up of three major categories, methanogens, halophiles and extremophiles, that produce varying effects when used for biomass conversion. In order, then, to more accurately single out the most effective organisms for such conversion while also considering the time and cost involved in such selection, this article attempts to provide a method of cluster analysis as an initial screening tool, methods. the use of fuzzy c-means and hierarchical clustering analysis, we selected 27 Archaea and the codon usage of their genomes as items and variances for analysis, finally, after verification instances, these two cluster analysis method, both cluster analysis method can achieve optimal screening Archaea, and these Archaea in nature in line with the conditions of biomass energy conversion.
    Where fuzzy c-means, can dig out useful information from the data of a little amount. The results show that the methodology used was effective for initial selection, simplifying the experimental process, increasing its efficiency, and lowering its cost. In addition, there are advantages to generating dendrograms and agglomerative coefficients in the hierarchical clustering analysis method, we used dendrograms to identify the clustering status of organisms and to determine their kinship based on agglomerative coefficients. Their common features are: (1)all match by the amino acids Leu clustering results and biology for classification. (2)calculation process is fast and stable. (3)clustering explicit. Comprehensive analysis of the results, the method presented in this paper, without going through a complicated experiment, can immediately achieve the purpose of initial screening.

    目錄 摘要 I Abstract Ⅱ 誌謝 Ⅲ 目錄 Ⅳ 表目錄 Ⅵ 圖目錄 Ⅸ 符號說明 X 第一章 緒論 1 1-1 前言 1 1-2 究背景與動機 2 1-2-1 微生物之簡介 2 1-2-2 密碼子使用偏向性 3 1-2-3 群聚分析法 4 1-3 研究重點與內容架構 4 第二章 資料探勘理論與模糊群聚分析法 8 2-1 古生菌的特性與密碼子使用偏向數據 9 2-2 模糊群聚演算法 11 2-3 演算程序與進行步驟 17 第三章 資料探勘理論與階層式群聚分析法 35 3-1 階層式群聚分析法 35 3-2 階層式演算法與凝聚程序 37 3-2-1相異性與相似性的計算 37 3-2-2 群組的形成與分析方法 37 3-3 演算程序與進行步驟 39 第四章 實例驗證_模糊群聚分析法 45 4-1 程式執行的過程 45 4-2 程式執行的結果 47 4-3 應用糢糊群聚分析法於生質能源轉換之探討 50 第五章 實例驗證_階層式群聚分析法 74 5-1 程式執行的過程 74 5-2 程式執行的結果 75 5-3 應用階層式群聚分析法於生質能轉換之探討 78 第六章 結論與未來展望 91 6-1 模糊群聚演算法之結論 91 6-2 未來研究發展與建議 92 參考文獻 94 附錄 99 表目錄 表 1-1 氨基酸與密碼子的對應 7 表 2-10 Methanococcus maripaludis strain S2的密碼子使用數據 23 表 2-11 Methanopyrus kandleri strain AV19的密碼子使用數據 24 表 2-12 Methanosarcina acetivorans strain C2A的密碼子使用數據 24 表 2-13 Methanosarcina barkeri strain fusaro的密碼子使用數據 25 表 2-14 Methanosarcina mazei strain Goe1的密碼子使用數據 25 表 2-15 Methanosphaera stadtmanae strain DSM3091的密碼子使用數據 26 表 2-16 Methanospirillum hungatei strain JF-1的密碼子使用數據 26 表 2-17 Natronomonas pharaonis strain DSM2160的密碼子使用數據 27 表 2-18 Picrophilus torridus strain DSM9790的密碼子使用數據 27 表 2-19 Pyrococcus abyssi strain GE5的密碼子使用數據 28 表 2-20 Pyrococcus furiosus strain DSM3638的密碼子使用數據 28 表 2-21 Pyrococcus horikoshii strain OT3的密碼子使用數據 29 表 2-22 Thermococcus kodakaraensis strain KOD1的密碼子使用數據 29 表 2-23 Thermoplasma acidophilum strain DSM1728的密碼子使用數據 30 表 2-24 Thermoplasma volcanium strain GSS1的密碼子使用數據 30 表 2-25 Aeropyrum pernix strain K1的密碼子使用數據 31 表 2-26 Pyrobaculum aerophilum strain IM2的密碼子使用數據 31 表 2-27 Sulfolobus acidocaldarius strain DSM639的密碼子使用數據 32 表 2-28 Sulfolobus solfataricus strain P2的密碼子使用數據 32 表 2-29 Sulfolobus tokodaii strain 7的密碼子使用數據 33 表 2-30 Nanoarchaeum equitans strain Kin4-M的密碼子使用數據 33 表 2-31 27種古生菌三種胺基酸的密碼子使用數據。 34 表 4-1 各群組中出現單一菌種自成一群的數量統計表 53 表 4-2 胺基酸Ser於群組數量為10的結果,其中第2和第6群組分別由 編號17與8的古生菌自成一個群組 53 表 4-3 以胺基酸Leu為參數資料時,所獲得的目標函數的結果 54 表 4-4 以胺基酸Ser為參數資料時,所獲得的目標函數的結果 55 表 4-5 以胺基酸Arg為參數資料時,所獲得的目標函數的結果 56 表 4-6 以Leu為演算數據,所獲得的隸屬度矩陣U,深色填滿區塊為最 大值 57 表 4-7 以Ser為演算數據,所獲得的隸屬度矩陣U,深色填滿區塊為最 大值 58 表 4-8 以Arg為演算數據,所獲得的隸屬度矩陣U,深色填滿區塊為最 大值 59 表 4-9 Leu的同義密碼子的群聚中心矩陣 60 表 4-10 Ser的同義密碼子的群聚中心矩陣 60 表 4-11 Arg的同義密碼子的群聚中心矩陣 60 表 4-12 以Leu分6個群組時的群聚結果,此表格對應於隸屬度矩陣U 61 表 4-13 以Ser分6個群組時的群聚結果,此表格對應於隸屬度矩陣U 61 表 4-14 以Arg分6個群組時的群聚結果,此表格對應於隸屬度矩陣U 61 表 4-15 Leu分6個群組時的最小門檻值 62 表 4-16 Ser分6個群組時的最小門檻值 63 表 4-17 Arg分6個群組時的最小門檻值 64 表 4-18 各個群組數的門檻值 65 表 4-19 胺基酸Leu分群群數為5的群聚結果 66 表 4-20 胺基酸Ser分群群數為5的群聚結果 66 表 4-21 胺基酸Arg分群群數為5的群聚結果 66 表 4-22 27種古生菌依據生物學分類彙整的菌種屬別及其主要特性 67 表 5-1 Leu密碼子使用數據代入歐基里德距離平方計算出的相異度矩陣, 因為是對稱矩陣,所以取下三角矩陣 80 表 5-2 氨基酸Leu的凝聚層次程序,箭頭所指的凝聚方向與圖5-1的凝 聚方向互相對應,以編號{16,17,18,21}4種古生菌凝聚過程為例 81 表 5-3 Ser密碼子使用數據代入歐基里德距離平方計算出的相異度矩陣, 因為是對稱矩陣,所以取下三角矩陣 82 表 5-4 氨基酸Ser的凝聚層次程序,箭頭所指的凝聚方向與圖5-2的凝聚 方向互相對應,以編號{6,9,10,11,18}5種古生菌凝聚過程為例 83 表 5-5 Arg密碼子使用數據代入歐基里德距離平方計算出的相異度矩陣, 因為是對稱矩陣,所以取下三角矩陣 84 表 5-6 氨基酸Arg的凝聚層次程序,箭頭所指的凝聚方向與圖5-3的凝 聚方向互相對應,以編號{1,15,21}3種古生菌凝聚過程為例 85 表 5-7 本研究篩選的3種古生菌{21,4,24}和文獻裡被研究的古生菌,相 應本文的編號為{16,17,18,19,25},兩類群古生菌性質之比對 86 圖目錄 圖 3-1 階層式演算程序的兩個基本型態 42 圖 3-2 單一連結法示意圖 43 圖 3-3 完全連結法示意圖 43 圖 3-4 平均連結法示意圖 44 圖 3-5 形心法示意圖 44 圖 4-1 矩陣與參數的設定 68 圖 4-2 程式執行後的結果,此圖為以胺基酸Leu分三群為例 69 圖 4-3 以Leu分6個群組時的群聚空間分佈圖 70 圖 4-4 以Ser分6個群組時的群聚空間分佈圖 71 圖 4-5 以Arg分6個群組時的群聚空間分佈圖 72 圖 4-6 分群的群組數與最低門檻值的關係圖 73 圖 5-1 胺基酸Leu密碼子使用數據經過HCA樹狀結構圖 87 圖 5-2 胺基酸Ser密碼子使用數據經過HCA樹狀結構圖 88 圖 5-3 胺基酸Arg密碼子使用數據經過HCA樹狀結構圖 89 圖 5-4 根據Leu的凝聚層次程序所建構的階層與凝聚係數趨勢圖 90

    [1]Shafiee S., Topal E.(2009), “When will fossil fuel reserves be diminished,” Energy Policy, 37(1) pp.181~189.
    [2]Ragauskas A.J., Williams C.K., Davison B.H. et al.(2006), “The Path Forward for Biofuels and Biomaterials,” Science, 311(5760) pp.484~489.
    [3]Mussgnug J.H., Thomas-Hall S., Rupprecht J. et al.(2007), “Engineering photosynthetic light capture: impacts on improved solar energy to biomass conversion,” Plant Biotechnology Journal, 5(6) pp.802~814.
    [4]Bridgwater A.V.(2003), “Renewable fuels and chemicals by thermal processing of biomass,” Chemical Engineering Journal, 91(2~3) pp.87~102.
    [5]Lynd L.R., Laser M.S., Bransby D., Dale B.E., Davison B., Hamilton R., Himmel M., Keller M., McMillan J.D., Sheehan J., Wyman C.E. (2008) , “How biotech can transform biofuels,” Nature Biotechnology 26(2) pp.169~72.
    [6]Ayhan Demirbas(2001), “Biomass resource facilities and biomass conversion processing for fuels and chemicals,”Energy Conversion and Management, 42(11) pp.1357~1378.
    [7]C.R. Woese, G.E. Fox (1977), “Phylogenetic structure of the prokaryotic domain: the primary kingdoms,” Proceedings of the National Academy of Sciences of the United States of America, 74(11) pp.5088~5090.
    [8]C.R.Woese, O. Kandler, M.L.Wheelis(1990), “Towards a natural system of organisms:proposal for the domains. Archaea, Bacteria, and Eucarya, ” Proceedings of the National Academy of Sciences of the United States of America, 87(12) pp.4576~4579.
    [9]Hohn M. J., Hedlund B. P., Huber H.(2002), “Detection of 16S rDNA Sequences Representing the Novel Phylum Nanoarchaeota: Indication for a Wide Distribution in High Temperature Biotopes,” Systematic and Applied Microbiology, 25(4) pp.551~554.
    [10]Huber H., Hohn M. J., Stetter K. O., Rachel R.(2003), “The phylum Nanoarchaeota: Present knowledge and future perspectives of a unique form of life,” Research in Microbiology, 154(3) pp.165~171.
    [11]Zhang C. L., Ye Q.,. Huang Z. Y et al.(2008), “Global Occurrence of Archaeal amoA Genes in Terrestrial Hot Springs,”Applied and Environmental Microbiology, 74(20) pp.6417~6426.
    [12]Michael Klocke, Edith Nettmann, Ingo Bergmann et al.(2008),“Characterization of the methanogenic Archaea within two-phase biogas
    reactor systems operated with plant biomass,”Systematic and Applied Microbiology, 31(3) pp.190~205.
    [13]F.Crick(1970), “Central Dogma of molecular biology,” Nature, 227 pp.561~563.
    [14]Dan Graur, Wen-Hsiung Li.(2000), Fundamentals of Molecular Evolution (2nd ed.), Sinauer Associates, Sunderland, MA.
    [15]R. Grantham, C. Gautier, M. Gouy et al.(1980), “Codon catalog usage and the genome hypothesis,”Nucleic Acids Research, 8(1) pp.49~62.
    [16]R. Grantham, C. Gautier, M. Gouy et al.(1981), “Codon catalog usage is a genome strategy modulated for gene expressivity,”Nucleic Acids Research, 9(1) pp.43~74.
    [17]M. Gouy, C. Gautier(1982), “Codon usage in bacteria: correlation with gene expressivity,” Nucleic Acids Research ,10(22) pp.7055~7074.
    [18]J. M. Ma, T. Zhou, W. J. Gu, X. Sun and Z. H. Lu. (2002), “Cluster analysis of the codon use frequency of MHC genes from different species,” Biosystems, 65(2~3) pp.199~207.
    [19]Peng Jiang, Xiao Sun, Zuhong Lu(2007), “Analysis of Synonymous Codon Usage in Aeropyrum pernix K1 and Other Crenarchaeota Microorganisms,” Journal of Genetics and Genomics, 34(3) pp.275~284.
    [20]Y. LU, T.Q. Chen(1998), “A Fuzzy Diagnostic Model and Its Application in Automotive Engineering Diagnosis,” Applied Intelligence, 9(3) pp.231~243.
    [21]K. L. Hsieh, C. C. Jeng, I. C. Yang, C. N. Lin(2006), “Integrated Clustering Analysis of Microorganism Classification, Proceedings of the First International Conference on Innovative Computing,” Information and Control, 3 pp.194~199.
    [22]K. L. Hsieh, I. C. Yang(2008), “Incorporating PCA and fuzzy-ART techniques into achieve organism classification based on codon usage consideration,” Computers in Biology and Medicine, 38(8) pp.886-893.
    [23]K. L. Hsieh, C. C. Jeng, I. C. Yang,Y. K. Chen, C. N. Lin(2007), “The study of applying a systematic procedure based on SOFM clustering technique into organism clustering,” Expert Systems with Applications , 33(2) pp.330~336.
    [24]C. C. Jeng, I. C. Yang, K. L. Hsieh, C. N. Lin(2006), “Clustering Analysis for Bacillus Genus Using Fourier Transform and Self-Organizing Map,” Neural information Processing, 3(4234) pp.48~57.
    [25]S. M. Yamany, A. A. Farag and S. Y. Hsu(1999), “A fuzzy hyperspectral classifier for automatic target recognition (ATR) systems,” Pattern Recognition Letters, 20(11) pp.1431~1438.
    [26]G. Schafer(1996), “Bioenergetics of the archaebacterium Sulfolobus,” Biochim Biophys Acta-Bioenerg, 1277(3) pp.163~200.
    [27]T. Kanai, H. Imanaka, A. Nakajima, K. Uwamori et al.(2005), “Continuous hydrogen production by the hyperthermophilic archaeon, Thermococcus kodakaraensis KOD1,” Journal of Biotechnology, 116(3) pp.271~282.
    [28]S.E. Blumer-Schuette, I. Kataeva, J. Westpheling, M.W. Adams, R.M. Kelly(2008), “Extremely thermophilic microorganisms for biomass conversion: status and prospects,” Current Opinion in Biotechnology, 19(3) pp.210~217.
    [29]Michael J.A. Berry, Gordon S. Linoff (1997), Data Mining Technique for Marketing, Sale, and Customer Support (3thed.), Wiley Computer.
    [30]BrocK T. D., BrocK K. M., Belly R. T. et al.(1972), “Sulfolobus: A new genus of sulfur-oxidizing bacteria living at low pH and high temperature,” Archives of Microbiology, 84(1) pp.54~68.
    [31]Harald Huber, Michael J. Hohn, Reinhard Rachel, Tanja Fuchs, Verena C. Wimmer, Karl O. Stetter(2002), “A new phylum of Archaea represented by a nanosized hyperthermophilic symbiont,” Nature, 417(6884)pp.63~67.
    [32]L. A. Zadeh(1965), “Fuzzy set. Inform and Control,” 8( 3) pp.338~ 353.
    [33]R. Bellman et al.(1966), “Abstraction and pattern classification,” JMAA, 13(1) pp.1~7.
    [34]E. H. Ruspini (1969), “Pattern classification problems and fuzzy sets,” Information and Control, 15(1) pp.22~32.
    [35]J. C. Dunn (1974), “A fuzzy relative of the ISODATA process and its use in detecting compact well separated clusters,” Cybernetics and Systems, 3(1) pp.32~57.
    [36]Bezdek J. C.(1976), “A physical interpretation of fuzzy ISODATA,” IEEE Trans, SMC-6( 2) pp.387~390.
    [37]Bezdek J.C., Hathaway R.J., Sabin M.J. et al.(1987), “Convergence theory for fuzzy c-means Counterexamples and repairs,”IEEE Trans syst , Man, and Cybernetics, 17(5) pp.873~877.
    [38]Pal N. R., Bezdek J. C.(1995) , “On cluster validity for the fuzzy c*mean model,” IEEE Transactions on Fuzzy Systems, 3(3) pp.370~379.
    [39]Bezdek J.C.(1981), Pattern Recognition with Fuzzy Objective Function Algrithms(1st ed.) , New York: Plenum Press.
    [40]J.F. Hair, Jr. William et al.(1998), Multivariate data analysis : with readings(5th ed.), New Jersey: Prentice Hall.
    [41]J. Han, M. Kamber(2001), Data Mining: Concepts and Techniques(2nd ed.), Morgan Kaufmann,San Francisco, CA.
    [42]陳正昌、程炳林、陳新豐、劉子鍵(2003),多變量分析方法:統計軟體應用(pp.222-229),台北:五南圖書出版股份有限公司。
    [43]Kevin J., Peter S.(1995),“A Buried Polar Interaction Imparts Structural Uniqueness in a Designed Heterodimeric Coiled Coil,” Biochemistry, 34(27) pp.8642~8648.
    [44]Joel P., Graeme L., Glenn F.(1996),“Backbone Dynamics of the c-Jun Leucine Zipper: 15N NMR Relaxation Studies,”Biochemistry, 35(15) pp.4867~4877.
    [45]David L., Vladimir N., Peter S.(2001),“Buried Polar Residues in Coiled-Coil Interfaces,”Biochemistry, 40(21) pp.6352~6360.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE