簡易檢索 / 詳目顯示

研究生: 黃翊庭
Huang, Yi-Ting
論文名稱: 利用文句主題資訊檢索技術於生醫文獻進行段落排序與推薦之研究
A Study on Paragraph Ranking and Recommendation by Topic Information Retrieval from Biomedical Literature
指導教授: 蔣榮先
Chiang, Jung-Hsien
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2010
畢業學年度: 98
語文別: 中文
論文頁數: 50
中文關鍵詞: 文件探勘文獻全文段落排序資訊檢索
外文關鍵詞: Text Mining, Full-text Article, Paragraph Ranking, Information Retrieval
相關次數: 點閱:165下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著越來越多的全文文獻透過網路提供予學者,如何有效的利用這些全文資源變成一個重要的問題。傳統上,生物資訊學的文件處理使用MEDLINE格式資料,其中僅包含了文獻的標題與摘要,雖然擁有精確的主題訊息,卻礙於篇幅無法避免許多訊息的遺失;而全文相對擁有最完整的資訊,但也擁有一些與主題較為無關的內容,並且在閱讀上最為費時。本研究的目標便是結合文獻摘要與全文文獻兩種文件模式的優勢,利用文件探勘的技術,由摘要中找到文獻的各種主題訊息,並使用這個主題訊息對於全文文獻的段落內容進行檢索與排序,找出與文獻主題高度相關的重要段落推薦予使用者,進而節省全文閱讀時間。在本研究之實驗中,系統所推薦的段落內容經由主觀與客觀兩種方式評估品質,在人工標記的得分與ROUGE評估的計算上均得到優於基礎搜尋方法的效能。

    With the growing availability of full-text scientific articles, how text mining researchers utilize them has become an important issue. Traditionally, many studies of information extraction of biological information have used materials from MEDLINE database, which contains only abstract and title. Although abstract and title provide accurate and summary information of article, lots of details are inevitably lost for its short space. It is obvious that the full-text contains more sufficient information than abstract; however, insignificant descriptions are also included, which cost time to read. The primary goal of the study is to utilize the advantages of abstract and full-text to ease the burden of reading. Finding essential information from abstract, useing them to search and to rank paragraphs in full-text, the proposed approach recommends significant paragraphs user for saving time to peruse whole article. Finally we evaluated the performance of our system in subjective and objective views. Our approach outperformed the baseline approach both in human ratings and ROUGE scores.

    摘要 I Abstract II 圖目錄 VII 表目錄 VIII 第一章 導論 1 1.1 前言 1 1.2 研究動機 2 1.3 解決方法 3 1.4 論文架構 3 第二章 相關研究 4 2.1 生醫文獻資料庫 4 2.1.1 PubMed 4 2.1.2 PubMed Central 5 2.2 生醫文獻的全文利用 6 2.3 文件探勘之相關技術 8 2.3.1 資訊擷取 9 2.3.2 文件摘要系統 10 第三章 使用文句主題資訊進行段落排序 13 3.1流程圖 13 3.2 資料收集與前處理 14 3.2.1 文獻資料取得 14 3.2.2 資料前處理 16 3.3 文句資訊檢索與段落相關度評分 17 3.3.1 文句資訊檢索 18 3.3.2 段落相關度評分 21 3.4 段落重要性排序 25 3.5 系統範例 29 3.5.1 系統簡介 29 3.5.2系統展示 29 第四章 實驗設計與結果分析 33 4.1 資料收集 33 4.2 選取段落品質評估 36 4.2.1 段落平均得分–於人工標記資料集 36 4.2.2 摘要關聯度與段落重要性 37 4.2.3 段落平均得分–使用關聯度分數 39 4.2.4 客觀評估 39 4.3 實驗總結 43 第五章 結論與未來展望 44 5.1 結論 44 5.2 未來展望 44 參考文獻 46 附錄A Part-of-Speech Tagger詞性表 49

    [1] Agarwal, S. and Yu, H. (2009) Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion, Bioinformatics, 25, 3174-3180.
    [2] Angheluta, R., Busser, R., Moens, M.F. (2002) The Use of Topic Segmentation for Automatic Summarization, Proceedings of the ACL-2002, Post-Conference Workshop on Automatic Summarization
    [3] Baxendale P,, (1958)Machine-made index for technical literature - an experiment,IBM Journal of Research and Development, pp. 354 - 361.
    [4] Carbonell, J. and Goldstein, J. (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, Melbourne, Australia.
    [5] Doms, A. and Schroeder, M. (2005) GoPubMed: Exploring PubMed with the gene ontology, Nucleic Acids Research, 33, W783-W786.
    [6] Frisch, M., Klocke, B., Haltmeier, M. and Frech, K. (2009) LitInspector: literature and signal transduction pathway mining in PubMed abstracts, Nucleic Acids Research, 37, W135-W140.
    [7] Gay, C.W., Kayaalp, M. and Aronson, A.R. (2005) Semi-automatic indexing of full text biomedical articles, AMIA Annu Symp Proc, 271-275.
    [8] Jim Cowie and Yorick Wilks,Information Extraction
    [9] Jin, F., Huang, M., Lu, Z. and Zhu, X. (2009) Towards automatic generation of gene summary. Proceedings of the Workshop on BioNLP. Association for Computational Linguistics, Boulder, Colorado.
    [10] Lin, J. (2009) Is searching full text more effective than searching abstracts?, Bmc Bioinformatics, 10, -.
    [11] Ling, X., Jiang, J., He, X., Mei, Q., Zhai, C. and Schatz, B. (2007) Generating gene summaries from biomedical literature: A study of semi-structured summarization, Inform Process Manag, 43, 1777-1791.
    [12] MEAD. Available at http://tangra.si.umich.edu/clair/mead. ,accessed June 29
    [13] NCBI. Available at http://www.ncbi.nlm.nih.gov/ ,accessed June 29
    [14] Otterbacher, J., Radev, D. and Kareem, O. (2006) News to go: hierarchical text summarization for mobile devices. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, Seattle, Washington, USA.
    [15] PubMed Central. Available at http://www.ncbi.nlm.nih.gov/pmc/ ,accessed June 29
    [16] PubMed. Available at http://www.ncbi.nlm.nih.gov/pubmed ,accessed June 29
    [17] Radev, D., Otterbacher, J., Winkel, A. and Blair-Goldensohn, A. (2005) NewsInEssence: Summarizing online news topics, Commun Acm, 48, 95-98.
    [18] Reeve L., Han H., Brooks AD. (2006) BioChain: lexical chaining methods for biomedical text summarization. In: Proceedings of the 2006 ACM symposium on Applied computing. Dijon, France: ACM, p. 180-184.
    [19] Ricardo B.Y., Berthier R.N.(1999)Modern information retrieval,Addison Wesley Longman
    [20] Saravanan, M., Raj, P.C.R. and Raman, S. (2003) Summarization and categorization of text data in high-level data cleaning for information retrieval, Appl Artif Intell, 17, 461-474.
    [21] Shah, P.K., Perez-Iratxeta, C., Bork, P. and Andrade, M.A. (2003) Information extraction from full text scientific articles: where are the keywords?, Bmc Bioinformatics, 4, 20.
    [22] Shatkay, H. and Feldman, R. (2003) Mining the biomedical literature in the genomic era: an overview, J Comput Biol, 10, 821-855.
    [23] The Stanford Natural Language Processing Group. Available at http://nlp.stanford.edu/software/tagger.shtml ,accessed June 29
    [24] Turpin, A., Tsegay, Y., Hawking, D. and Williams, H.E. (2007) Fast generation of result snippets in web search. Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, Amsterdam, The Netherlands.
    [25] Yu, H., Agarwal, S., Johnston, M. and Cohen, A. (2009) Are figure legends sufficient? Evaluating the contribution of associated text to biomedical figure comprehension, J Biomed Discov Collab, 4, 1.

    下載圖示 校內:2012-08-10公開
    校外:2012-08-10公開
    QR CODE