研究生: |
黃翊庭 Huang, Yi-Ting |
---|---|
論文名稱: |
利用文句主題資訊檢索技術於生醫文獻進行段落排序與推薦之研究 A Study on Paragraph Ranking and Recommendation by Topic Information Retrieval from Biomedical Literature |
指導教授: |
蔣榮先
Chiang, Jung-Hsien |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2010 |
畢業學年度: | 98 |
語文別: | 中文 |
論文頁數: | 50 |
中文關鍵詞: | 文件探勘 、文獻全文 、段落排序 、資訊檢索 |
外文關鍵詞: | Text Mining, Full-text Article, Paragraph Ranking, Information Retrieval |
相關次數: | 點閱:165 下載:5 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著越來越多的全文文獻透過網路提供予學者,如何有效的利用這些全文資源變成一個重要的問題。傳統上,生物資訊學的文件處理使用MEDLINE格式資料,其中僅包含了文獻的標題與摘要,雖然擁有精確的主題訊息,卻礙於篇幅無法避免許多訊息的遺失;而全文相對擁有最完整的資訊,但也擁有一些與主題較為無關的內容,並且在閱讀上最為費時。本研究的目標便是結合文獻摘要與全文文獻兩種文件模式的優勢,利用文件探勘的技術,由摘要中找到文獻的各種主題訊息,並使用這個主題訊息對於全文文獻的段落內容進行檢索與排序,找出與文獻主題高度相關的重要段落推薦予使用者,進而節省全文閱讀時間。在本研究之實驗中,系統所推薦的段落內容經由主觀與客觀兩種方式評估品質,在人工標記的得分與ROUGE評估的計算上均得到優於基礎搜尋方法的效能。
With the growing availability of full-text scientific articles, how text mining researchers utilize them has become an important issue. Traditionally, many studies of information extraction of biological information have used materials from MEDLINE database, which contains only abstract and title. Although abstract and title provide accurate and summary information of article, lots of details are inevitably lost for its short space. It is obvious that the full-text contains more sufficient information than abstract; however, insignificant descriptions are also included, which cost time to read. The primary goal of the study is to utilize the advantages of abstract and full-text to ease the burden of reading. Finding essential information from abstract, useing them to search and to rank paragraphs in full-text, the proposed approach recommends significant paragraphs user for saving time to peruse whole article. Finally we evaluated the performance of our system in subjective and objective views. Our approach outperformed the baseline approach both in human ratings and ROUGE scores.
[1] Agarwal, S. and Yu, H. (2009) Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion, Bioinformatics, 25, 3174-3180.
[2] Angheluta, R., Busser, R., Moens, M.F. (2002) The Use of Topic Segmentation for Automatic Summarization, Proceedings of the ACL-2002, Post-Conference Workshop on Automatic Summarization
[3] Baxendale P,, (1958)Machine-made index for technical literature - an experiment,IBM Journal of Research and Development, pp. 354 - 361.
[4] Carbonell, J. and Goldstein, J. (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, Melbourne, Australia.
[5] Doms, A. and Schroeder, M. (2005) GoPubMed: Exploring PubMed with the gene ontology, Nucleic Acids Research, 33, W783-W786.
[6] Frisch, M., Klocke, B., Haltmeier, M. and Frech, K. (2009) LitInspector: literature and signal transduction pathway mining in PubMed abstracts, Nucleic Acids Research, 37, W135-W140.
[7] Gay, C.W., Kayaalp, M. and Aronson, A.R. (2005) Semi-automatic indexing of full text biomedical articles, AMIA Annu Symp Proc, 271-275.
[8] Jim Cowie and Yorick Wilks,Information Extraction
[9] Jin, F., Huang, M., Lu, Z. and Zhu, X. (2009) Towards automatic generation of gene summary. Proceedings of the Workshop on BioNLP. Association for Computational Linguistics, Boulder, Colorado.
[10] Lin, J. (2009) Is searching full text more effective than searching abstracts?, Bmc Bioinformatics, 10, -.
[11] Ling, X., Jiang, J., He, X., Mei, Q., Zhai, C. and Schatz, B. (2007) Generating gene summaries from biomedical literature: A study of semi-structured summarization, Inform Process Manag, 43, 1777-1791.
[12] MEAD. Available at http://tangra.si.umich.edu/clair/mead. ,accessed June 29
[13] NCBI. Available at http://www.ncbi.nlm.nih.gov/ ,accessed June 29
[14] Otterbacher, J., Radev, D. and Kareem, O. (2006) News to go: hierarchical text summarization for mobile devices. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, Seattle, Washington, USA.
[15] PubMed Central. Available at http://www.ncbi.nlm.nih.gov/pmc/ ,accessed June 29
[16] PubMed. Available at http://www.ncbi.nlm.nih.gov/pubmed ,accessed June 29
[17] Radev, D., Otterbacher, J., Winkel, A. and Blair-Goldensohn, A. (2005) NewsInEssence: Summarizing online news topics, Commun Acm, 48, 95-98.
[18] Reeve L., Han H., Brooks AD. (2006) BioChain: lexical chaining methods for biomedical text summarization. In: Proceedings of the 2006 ACM symposium on Applied computing. Dijon, France: ACM, p. 180-184.
[19] Ricardo B.Y., Berthier R.N.(1999)Modern information retrieval,Addison Wesley Longman
[20] Saravanan, M., Raj, P.C.R. and Raman, S. (2003) Summarization and categorization of text data in high-level data cleaning for information retrieval, Appl Artif Intell, 17, 461-474.
[21] Shah, P.K., Perez-Iratxeta, C., Bork, P. and Andrade, M.A. (2003) Information extraction from full text scientific articles: where are the keywords?, Bmc Bioinformatics, 4, 20.
[22] Shatkay, H. and Feldman, R. (2003) Mining the biomedical literature in the genomic era: an overview, J Comput Biol, 10, 821-855.
[23] The Stanford Natural Language Processing Group. Available at http://nlp.stanford.edu/software/tagger.shtml ,accessed June 29
[24] Turpin, A., Tsegay, Y., Hawking, D. and Williams, H.E. (2007) Fast generation of result snippets in web search. Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, Amsterdam, The Netherlands.
[25] Yu, H., Agarwal, S., Johnston, M. and Cohen, A. (2009) Are figure legends sufficient? Evaluating the contribution of associated text to biomedical figure comprehension, J Biomed Discov Collab, 4, 1.