簡易檢索 / 詳目顯示

研究生: 曾致崢
Zeng, Jhih-Jheng
論文名稱: 運用手機遊戲協助生醫文獻探勘
Using mobile game to help biomedical literature mining
指導教授: 張天豪
Chang, Tien-Hao
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2014
畢業學年度: 102
語文別: 中文
論文頁數: 25
中文關鍵詞: 群眾外包資料探勘生物文獻社群網路
外文關鍵詞: Crowdsourcing, Text-mining, Biomedical literature, Social network
相關次數: 點閱:139下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 有許多重要生物發現被記載不同文獻當中,在沒有集成的資料庫要找到這些發現,是要花費一些時間。而建構出一個集中式知識庫平台,需要花費大量人力從文獻中提取有價值的訊息,通常都是由大型研究單位聘請數百位生醫專家每天讀文章,把重點標記出來,現在大多數的生物醫學數據庫以這種方式完成。本研究提出了一個新的架構,跟這種集中式的標記正好相反,透過Android應用程序與社交網絡,使人工標記的過程變得有效率且有趣。由於行動裝置設備越來越普及,生物學家可以隨時隨地藉由應用程式從文獻中提取重要資訊,例如從他們的家中工作期間、等待的時間…等,來做標記,順便打發時間。而透過此平台所收集回來的資料,透過驗證,最終會公開在平台上。而系統的正確性是可以得到了保證,就像維基百科運作一樣。
    由於應用程式是要讓生物學家可以使用這個Android應用程序來消磨時間,故Android應用程式被設計成一個遊戲,生物學家在遊戲過程中獲得獎勵成就,參與的標記人員也從單一組織變成全世界。長期下來,這些藉由平台共同產生的結果是有助於生物醫學後續研究。本論文完成平台後,聘請10相關的學生來測試,平台將使用者送出句子透過程式篩選平台結果,篩選過後的文獻句子。最後利用人工方式分類句子並計算準確率與分析。分析結果門檻較低的樣本的真陽性率為69.35%,如果提高門檻值Precision則會提高至80.00%。代表門檻提升有助於平台篩選出正確標記PPI句子之準確率。另一個分析將樣本與預設標記比較Precision跟Recall都也明顯上升,代表透過平台的確能提高找到正確標記的PPI句子之準確率。

    Many important biomedical observations are scattered in literature, which are difficult to search. However, constructing a centralized platform to collect the valuable information from literature is requires considerable human resource. At present, most biomedical databases were completed in this way. This study proposes a framework that integrates an Android app and social network to make the manual reading process efficient and interesting. Owing to the ubiquity of Android devices, biologists can extract valuable information from literature anytime and anywhere, such as the period from their home to work. Conventionally, the extracted information must be verified and approved by a supervisor. Biologists can use this Android app to kill time. Thus, the proposed Android app was designed as a game in which biologists get credits when their answers are consist with the society.
    To test the performance of our method, we use consistency of users to find the results, which regarded as answer for platform. It has been checked manually. Based on this answer collection, we define the classification to find correctly mark PPI sentences. Our assessment shows that platform achieved 69.35% recall on the answer corpus. If we increase the consistency, the recall rises to 80.00%. Compare answers with the default mark. The precision increased from 37.70% to 52.46%. The results prove that the accuracy of the marked sentence rise by platform.

    目錄 1 表目錄 2 圖目錄 3 第一章 緒論 4 第二章 相關研究 6 2.1 群眾外包(Crowdsourcing) 6 2.2 蛋白質交互作用 7 2.3 相關資料庫 8 第三章 資料集與方法 11 3.1 資料集與蛋白質交互作用的標記 11 3.1.1 資料蒐集與處理 11 3.1.2 蛋白質交互作用的標記 13 3.2 平台實作 13 3.3 實驗設計 15 第四章 實驗結果與討論分析 16 4.1 平台介紹 16 4.2 實驗結果 19 4.3 平台結果分析 20 4.4 問題與討論 21 4.4.1 平台結果討論 21 4.4.2 平台對社會影響 22 第五章 結論與未來展望 23 5.1 結論 23 5.2 未來展望 23 參考文獻 24

    1. Apweiler, R., et al., UniProt: the Universal Protein knowledgebase. Nucleic Acids Res, 2004. 32(Database issue): p. D115-9.
    2. Balaji, S., et al., IMID: integrated molecular interaction database. Bioinformatics, 2012. 28(5): p. 747-9.
    3. Roberts, R.J., PubMed Central: The GenBank of the published literature. Proc Natl Acad Sci U S A, 2001. 98(2): p. 381-2.
    4. Barabasi, A.L., N. Gulbahce, and J. Loscalzo, Network medicine: a network-based approach to human disease. Nat Rev Genet, 2011. 12(1): p. 56-68.
    5. Ozgur, A., et al., Identifying gene-disease associations using centrality on a literature mined gene-interaction network. Bioinformatics, 2008. 24(13): p. i277-85.
    6. Yu, W., et al., A navigator for human genome epidemiology. Nat Genet, 2008. 40(2): p. 124-5.
    7. Chiang, J.H., H.C. Yu, and H.J. Hsu, GIS: a biomedical text-mining system for gene information discovery. Bioinformatics, 2004. 20(1): p. 120-1.
    8. Temkin, J.M. and M.R. Gilder, Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics, 2003. 19(16): p. 2046-53.
    9. Biron, D.G., et al., The pitfalls of proteomics experiments without the correct use of bioinformatics tools. Proteomics, 2006. 6(20): p. 5577-96.
    10. Hirschman, L., et al., Text mining for the biocuration workflow. Database (Oxford), 2012. 2012: p. bas020.
    11. Shendure, J. and H. Ji, Next-generation DNA sequencing. Nat Biotechnol, 2008. 26(10): p. 1135-45.
    12. Horner, D.S., et al., Bioinformatics approaches for genomics and post genomics applications of next-generation sequencing. Brief Bioinform, 2010. 11(2): p. 181-97.
    13. Good, B.M. and A.I. Su, Crowdsourcing for bioinformatics. Bioinformatics, 2013. 29(16): p. 1925-33.
    14. Howe, J. The rise of crowdsourcing. 2006; Available from: http://www.wired.com/wired/archive/14.06/crowds.html.
    15. Giles, J., Internet encyclopaedias go head to head. 438(7070): p. 900-901.
    16. Stark, C., et al., BioGRID: a general repository for interaction datasets. Nucleic Acids Res, 2006. 34(Database issue): p. D535-9.

    無法下載圖示 校內:立即公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE