成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	林宗達 Lin, Tsung-Ta
論文名稱：	利用文獻探勘預測ESTs功能相關性 Utilizing Text Mining to Predict Functional Relationships of ESTs
指導教授：	王惠嘉 Wang, Hei-Chia
學位類別：	碩士 Master
系所名稱：	管理學院 - 資訊管理研究所 Institute of Information Management
論文出版年：	2005
畢業學年度：	93
語文別：	中文
論文頁數：	53
中文關鍵詞：	功能群組、功能相關性、文獻探勘、序列註解
外文關鍵詞：	Function Group, Sequence Annotation, Functional Relationship, Text Mining
相關次數：	點閱：193 下載：13
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

　　在後基因體時代，生物相關研究人員通常希望能獲得更多生物序列(biological　sequence)的相關資訊，特別是有關於實驗所產生的未知Expressed　Sequence　Tag　(EST)所具有的功能及EST之間的功能關聯性，目前因生物醫學文獻數量龐大且容易取得，因此可當為輔助資訊的主要來源。然而隨著生物科技的進步，使得生物相關資料庫的資料量呈現快速的成長，如何利用電腦自動從大量的文獻資料中找出有用的資訊便成為當今生物資訊領域中一個重要的課題。

　　現行的許多方法都是希望能透過和基因相關的生物醫學文獻來找出不同基因之間在功能上的相關性，然而目前這類型的研究皆只針對不同基因之間文獻的關聯性，卻沒有進一步分析這些文獻的內容。因此本研究希望能藉由text-mining技術，從各EST相關的文獻中，找出此EST可能的功能性關鍵字集(keyword　list)，並且藉由每條EST的Direct　Reference和Related　Document之關聯性計算，搭配關鍵字集之間的相似性，定義出不同EST之間功能上的相關性，透過此方式從大量的ESTs中建立功能群組(function　group)。接著將每個功能群組中的EST和現行已知的生化代謝途徑(pathway)資料庫進行序列相似度比對(sequence　alignment)，藉此推得各EST所屬的生化代謝途徑，在這些關聯中，因有些EST對生化代謝途徑資料庫的比對結果為未知，此原因可能是目前生化代謝途徑資料庫(如KEGG)資料是以人工建立較不完整，所以我們可以利用我們所提出之方法找出來的結果，來推論同一個功能群組中其它未知EST可能的生化代謝途徑，藉此來達成生化代謝途徑的預測。

none

第一章　緒論　　　　　　　　　　　　　　　　　　　　　　　　　　　　 1
　　第一節　研究背景與動機　　　　　　　　　　　　　　　　　　　　　 1
　　第二節　研究目的　　　　　　　　　　　　　　　　　　　　　　　　 3
　　第三節　研究範圍與限制　　　　　　　　　　　　　　　　　　　　　 4
　　第四節　論文大綱　　　　　　　　　　　　　　　　　　　　　　　　 4
第二章　文獻探討　　　　　　　　　　　　　　　　　　　　　　　　　　 6
　　第一節　生物相關資訊來源　　　　　　　　　　　　　　　　　　　　 6
　　　　2.1.1　NCBI　　　　　　　　　　　　　　　　　　　　　　　　　6
　　　　2.1.2　Gene　Ontology　　　　　　　　　　　　　　　　　　　　 　7
　　　　2.1.3　KEGG　　　　　　　　　　　　　　　　　　　　　　　　　9
　　第二節　資訊擷取　　　　　　　　　　　　　　　　　　　　　　　　10
　　　　2.2.1　布林模型(Boolean Model)　　　　　　　　　　　　　　　　　11
　　　　2.2.2　向量模型(Vector Model)　　　　　　　　　　　　　　　  　　12
　　　　2.2.3　統計模型(Probability Model)　　　　　　　　　　　　　　　　14
　　第三節　基因功能相關性之研究　　　　　　　　　　　　　　　　　　16
第三章 研究方法　　　　　　　　　　　　　　　　　　　　　　　　　　  　19
　　第一節　研究架構　　　　　　　　　　　　　　　　　　　　　　　　19
　　第二節　擷取EST相關文獻--Documents Retrieving Module　　　　　　　　20
　　　　3.2.1　EST序列相似度比對　　　　　　　　　　　　　　　　　　　20
　　　　3.2.2　EST之Direct Reference和Related Document收集　　　　　　　　21
　　第三節　發掘EST關鍵字集--Functional　Keyword　Mining　Module　　　　23
　　　　3.3.1　文獻前置處理　　　　　　　　　　　　　　　　　　　　  　24
　　　　3.3.2　建立Background Set 字彙　　　　　　　　　　　　　　  　　　24
　　　　3.3.3　關鍵字擷取　　　　　　　　　　　　　　　　　　　　　  　25
　　第四節　預測EST功能相關性--ESTs　Relation　Prediction　Module 　　　　26
　　　　3.4.1　EST之文獻關聯性計算	　　　　　　　　　　　　　　　   　　27
　　　　3.4.2　EST之關鍵字集相似度計算　　　　　　　　　　　　　　　 30
　　　　3.4.3　功能群組之建構　　　　　　　　　　　　　　　　　　　   　32
　　第五節　生化代謝途徑預測　　　　　　　　　　　　　　　　　　　　 33
第四章　實作驗證　　　　　　　　　　　　　　　　　　　　　　　　　　 35
　　第一節　系統建構　　　　　　　　　　　　　　　　　　　　　　　　 35
　　　　4.1.1　系統架構　　　　　　　　　　　　　　　　　　　　　　   　35
　　第二節　實驗方法與比較項目　　　　　　　　　　　　　　　　　　　 36
　　　　4.2.1　參數設定　　　　　　　　　　　　　　　　　　　　　　   　37
　　　　4.2.2　資料來源	　　　　　　　　　　　　　　　　　　　　　   　　37
　　　　4.2.3　實驗設計與比較項目　　　　　　　　　　　　　　　　　   　38
　　第三節　實驗結果與分析　　　　　　　　　　　　　　　　　　　　　 39
　　第四節　討論　　　　　　　　　　　　　　　　　　　　　　　　　　 47
第五章　結論與未來研究方向　　　　　　　　　　　　　　　　　　　　　 48
　　第一節　研究結果與貢獻　　　　　　　　　　　　　　　　　　　　　 48
　　第二節　未來研究方向　　　　　　　　　　　　　　　　　　　　　　 49
參考文獻　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　 51

                                    

Andrade,　M.,　&　Valencia,　A.　(1998).　Automatic　extraction　of　keywords　from　scientific　text:　application　to　the　knowledge　domain　of　protein　families.　Bioinformatics,　14(7),　600-607.
Baeza-Yates,　R.,　&　Ribeiro-Neto,　B.　(1999)　Modern　Information　Retrieval.　New　York:　The　ACM　Press.
Bassett,　D.E.,　Eisen,　M.B.,　&　Boguski,　M.S.　(1999).　Gene　expression　informatics—it’s　all　in　your　mine.　Nature　Genetics,　21,　51-55.
Crestani,　F.,　Lalmas,　M.,　Rijsbergen,　C.J.V.,　&　Campbell,　I.　(1998).　Is　this　document　relevant　?　…　Probably:　A　Survey　of　Probabilistic　Models　in　Information　Retrieval.　ACM　Computing　Surveys,　30(4),　528-552.
Eisenberg,　D.,　Marcotte,　M.E.,　Xenarios,　I.,　&　Yeates,　O.T.　(2000).　Protein　function　in　the　post-genomic　era.　Nature,　405,　823-826.
Fuhr,　N.　(1992).　Probabilistic　models　in　information　retrieval.　The　Computer　Journal,　35(3),　243-255.
Jenssen,　T.K.,　Laegreid,　A.,　Komorowski,　J.,　&　Hovig,　E.　(2001).　A　literature　network　of　human　genes　for　high-throughput　analysis　of　gene　expression.　Nature　Genetics,　28,　21-28.
Kanehisa,　M.,　&　Goto,　S.　(2000).　KEGG:　Kyoto　Encyclopedia　of　Genes　and　Genomes.　Nucleic　Acids　Research,　28,　27-30.
Kanehisa,　M.,　Goto,　S.,　Kawashima,　S.,　Okuno,　Y.,　&　Hattori,　M.　(2004).　The　KEGG　resources　for　deciphering　the　genome.　Nucleic　Acids　Research,　32,　277-280.
Li,　P.,　Nijhawan,　D.,　Budihardjo,　I.,　Srinivasula,　S.M.,　Ahmad,　M.,　Alnemri,　E.S.,　&　Wang,　X.　(1997)　Cytochrom　c　and　dATP-dependent　formation　of　Apaf-1　/caspase-9　complex　initiates　an　apoptotic　protease　cascade.　Cell,　91(4),　479-489.
Liu,　Y.,　Brandon,　M.,　Navathe,　S.,　Dingledine,　R.,　&　Ciliax,　B.J.　(2004)　Text　mining　functional　keywords　associated　with　genes.　Medinfo　2004,　San　Francisco,　292-296.
Mack,　R.,　&　Hehenberger,　M.　(2002).　Text-based　knowledge　discovery:　search　and　mining　of　life-sciences　documents.　Drug　Discovery　Today,　7(11),　89-98.
Manning,　C.M.,　&　Schutze,　H.　(1999).　Foundations　of　statistical　natural　language　processing.　Cambridge:　The　MIT　Press.
Myers,　E.　(1999).　Whole-genome　DNA　sequencing.　IEEE　Computational　Engineering　and　Science,　1(3),　33-43.
Porter,　M.　(1980).　An　algorithm　for　suffix　stripping.　Program,　14,　130-137.
Raychaudhuri,　S.,　Schutze,　H.,　&　Altman,　R.B.　(2002).　Using　text　analysis　to　identify　functionally　coherent　gene　groups.　Genome　Research,　12(10),　1582-1590.
Salton,　G.,　&　Buckley,　C.　(1988).　Term-weighting　approaches　in　automatic　retrieval.　Information　Processing　&　Management,　24(5),　513-523.　
Salton,　G.,　Wang,　A.,　&　Yang,　C.S.　(1975).　A　vector　space　model　for　automatic　indexing.　Communications　of　the　ACM,　11,　613-620.
Shah,　P.K.,　Perez-Iratxeta,　C.,　Bork,　P.,　&　Andrade,　M.A.　(2003).　Information　extraction　from　full　text　scientific　articles:　where　are　the　keywords?　BMC　Bioinformatics,　4(1),　20-28.
Shatkay,　H.,　Edwards,　S.,　&　Boguski,　M.　(2002).　Information　retrieval　meets　gene　analysis.　IEEE　Intelligent　Systems,　Special　Issue　on　Intelligent　Systems　in　Biology,　17(2),　45-53.
Shatkay,　H.,　&　Feldman,　R.　(2003).　Mining　the　biomedical　literature　in　the　genomic　era:　an　overview.　Journal　of　Computational　Biology,　10(6),　821-855.
Tao,　Y.C.,　&　Leibel,　R.L.　(2002).　Identifying　functional　relationships　among　human　genes　by　systematic　analysis　of　biological　literature.　BMC　Bioinformatics,　3(16),　1-9.
Tu,　Q.,　Tang,　H.,　&　Ding,　D.　(2004).　MedBlast:　searching　articles　related　to　a　biological　sequence.　Bioinformatics,　20(1),　75-77.
The　Gene　Ontology　Consortium.　(2000).　Gene　Ontology:　Tool　for　the　unification　of　biology.　Nature　Genetics,　25,　25-29.
Venter,　J.C.,　Adams,　M.D.,　Myers,　E.W.,　Li,　P.W.,　Mural,　R.J.,　Sutton,　G.G.,　et　al.　(2001).　The　sequence　of　the　human　genome.　Science,　291,　1304-1351.

2006-06-28公開

簡易檢索 / 詳目顯示

相關論文