簡易檢索 / 詳目顯示

研究生: 翁岳廷
Wong, Yue-Ting
論文名稱: 自生物醫學文獻中發現疾病-藥物間接關係進而探討老藥新用
Discovering indirect disease-drug relationships from biomedical literature toward drug repurposing
指導教授: 蔣榮先
Chiang, Jung-Hsien
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 50
中文關鍵詞: 文件探勘關係擷取老藥新用藥物相似度
外文關鍵詞: text mining, relation extraction, drug repurposing, drug reposition, drug similarity
相關次數: 點閱:112下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 一個藥物從研發到成功上市除了豐厚的利潤以外也往往伴隨著高風險與高成本,而這些是一般中小型藥廠無法負擔的.如果能從已被批准且通過人體實驗的藥物當中找出它們的新適應症,便可大大減少開發新藥所需的時間及金錢,而這就被稱為:老藥新用。老藥新用對於往後的藥物設計及疾病治療將會扮演著相當重要的角色。然而,複雜的生物機制與資訊科技的發達也使得網路上可用的生物醫藥資料以難以想像的速度成長。我們利用美國國家醫學圖書館所收藏的兩千萬篇生物醫學文獻摘要當中,擷取出疾病、基因、藥物三者之間的關係,進而預測藥物的新適應症。
    本研究的目的是利用疾病-基因與基因-藥物關係來建立兩個二分圖來進行老藥新用的分析,在疾病-基因二分圖中可以了解那些疾病的發生是由哪些基因的變異產生的,而基因-藥物二分圖可以了解哪些藥物用來調控哪些基因,利用這兩種二分圖可以獲得一條由疾病至藥物的路徑推論。我們也提出新的排序方法對這些推論進行適當的排序,透過藥物的相似度來進行比較,其概念為當兩個藥物越相似,它們的作用機制也就越相同,因此若與已知現存治療該疾病的藥物越相似時,此藥物就越有可能有該疾病的新適應症。
    我們提出的關係擷取方法得到0.86的精確度,證實自動化擷取關係是可靠的。我們藥物相似度方法對照於ATC code相似度得到了 0.80的R平方係數。然而我們針對老藥新用的驗證採取基於文獻來發現藥物的驗證方式,在排序的方法我們從文獻頻率較高的前100種疾病當中獲得0.37的MAP分數。最後挑選卵巢癌,前列腺癌,肺癌,結腸直腸癌,白血病和乳腺癌等六類癌症藥物經由文獻及臨床試驗來佐證這些老藥被新用的潛力。

    Drug development is a time-consuming, expensive, and high-risk task. The uncertainty of drug development has led to the emergence of drug repurposing, which is to find the new indications of approved drugs. Approved drugs have completed more clinical trial data than newly developed drugs. Therefore, drug repurposing is safer and faster than conventional approaches of drug development.
    This study aims to infer disease-drug indirect relations via disease-gene and gene-drug relations from large-scale biomedical literature. We propose a pattern-based relation extraction method using dependency grammar to identify disease, gene, and drug relations to construct disease-gene and gene-drug bipartite networks. In these bipartite networks, we can understand that a disease is caused by the involvement of gene product from disease-gene network. We can also understand the interaction between protein and drug from gene-drug bipartite network. However, these networks produce a large number of indirect relations between disease and drug. We propose a novel ranking method to prioritize the indirect relations. The concept of the ranking method is based on drug similarity which is defined by repurposed drugs and approved drugs. If a repurposed drug and an approved drug have highly similar interactions with the common genes, the repurposed drug might have a new indication that it has similar effects as that of the approved drug.
    Our pattern-based relation extraction method performs a higher precision of 0.86 than baseline methods. Because our drug similarity method obtains an R-square score of 0.80 with ATC code similarity, our drug vector space is suitable to calculate drug similarity. Therefore, the ranking method achieves a MAP score of 0.37 in top 100 popular diseases. Finally, we select the repurposed drugs of ovarian cancer, prostate cancer, lung cancer, colorectal cancer, leukemia, and breast cancer for validation by literature study and clinical trials.

    中文摘要 I Abstract III 誌謝 V Contents VI List of Figures VIII List of Tables X Chapter 1 Introduction 1 1.1 Background and Motivation 1 1.2 Research Objective and Specific Aims 3 1.3 Organization of Thesis 4 Chapter 2 Related Work 5 2.1 Drug discovery with ABC model 5 2.1.1 ABC Model 5 2.1.2 Relate works with ABC Model 6 2.1.3 Ranking in ABC Model 6 2.2 Utilized Databases for Drug, Gene, and Disease information 7 2.2.1 Therapeutic Target Database 7 2.2.2 The Comparative Toxicogenomics Database 8 2.2.3 The Pharmacogenomics Knowledge Base 8 2.2.4 MEDLINE Database 9 Chapter 3 Materials and Methods 10 3.1 Target documents collection 10 3.1.1 Therapeutic Target Database as system lexicon 11 3.1.2 Search-based filter 11 3.2 Relation extraction 13 3.2.1 Named-entity recognition 13 3.2.2 Dependency grammar pattern 14 3.2.3 Learning trigger terms 15 3.3 Repurposed drug prioritization 19 3.3.1 Drug vector space model 20 3.3.2 Drug similarity ranking method 21 3.4 Visualization 24 3.4.1 Visualization of Ranking 24 3.4.2 Visualization of Drug similarity 25 3.4.3 Visualization of ABC model 25 Chapter 4 Experiments and Results 27 4.1 Relation extraction evaluation 27 4.2 Drug similarity evaluation 29 4.3 Drug repurposing evaluation 32 Chapter 5 Discussion 35 5.1 Discussion of Literature review 35 5.2 Discussion of ClinicalTrials.gov 38 Chapter 6 Conclusions and Future Work 40 6.1 Conclusions 40 6.2 Future Work 41 References 43 APPENDIX A 48

    [1] Adams, C. P. and Brantner, V. V., "Estimating the cost of new drug development: is it really $802 million?," Health Affairs, Vol.25, No.2, pp.420-428, 2006.
    [2] Agarwal, P. and Searls, D. B., "Literature mining in support of drug discovery," Briefings in bioinformatics, Vol.9, No.6, pp.479-492, 2008.
    [3] Algra, A. M. and Rothwell, P. M., "Effects of regular aspirin on long-term cancer incidence and metastasis: a systematic comparison of evidence from observational studies versus randomised trials," The lancet oncology, Vol.13, No.5, pp.518-527, 2012.
    [4] Ananiadou, S., Kell, D. B., and Tsujii, J.-i., "Text mining and its potential applications in systems biology," Trends in biotechnology, Vol.24, No.12, pp.571-579, 2006.
    [5] Ashburn, T. T. and Thor, K. B., "Drug repositioning: identifying and developing new uses for existing drugs," Nature reviews Drug discovery, Vol.3, No.8, pp.673-683, 2004.
    [6] Babic, T., "The cholinergic hypothesis of Alzheimer’s disease: a review of progress," Journal of Neurology, Neurosurgery & Psychiatry, Vol.67, No.4, pp.558-558, 1999.
    [7] Baker, N. C. and Hemminger, B. M., "Mining connections between chemicals, proteins, and diseases extracted from Medline annotations," Journal of biomedical informatics, Vol.43, No.4, pp.510-519, 2010.
    [8] Benzi, G. and Moretti, A., "Is there a rationale for the use of acetylcholinesterase inhibitors in the therapy of Alzheimer's disease?," European journal of pharmacology, Vol.346, No.1, pp.1-13, 1998.
    [9] Bunescu, R. C. and Mooney, R. J., "A shortest path dependency kernel for relation extraction," in Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp.724-731, 2005.
    [10] Calabrese, L. and Fleischer Jr, A. B., "Thalidomide: current and potential clinical applications," The American journal of medicine, Vol.108, No.6, pp.487-495, 2000.
    [11] Cheng, F., Li, W., Wu, Z., Wang, X., Zhang, C., Li, J., Liu, G., and Tang, Y., "Prediction of polypharmacological profiles of drugs by the integration of chemical, side effect, and therapeutic space," Journal of chemical information and modeling, Vol.53, No.4, pp.753-762, 2013.
    [12] Chowdhury, F. M., Lavelli, A., and Moschitti, A., "A study on dependency tree kernels for automatic extraction of protein-protein interaction," in Proceedings of BioNLP 2011 Workshop, pp.124-133, 2011.
    [13] Cohen, A. M. and Hersh, W. R., "A survey of current work in biomedical text mining," Briefings in bioinformatics, Vol.6, No.1, pp.57-71, 2005.
    [14] Cruz Díaz, N. P., Maña López, M. J., Vázquez, J. M., and Álvarez, V. P., "A machine‐learning approach to negation and speculation detection in clinical texts," Journal of the American Society for Information Science and Technology, Vol.63, No.7, pp.1398-1410, 2012.
    [15] Culotta, A. and Sorensen, J., "Dependency tree kernels for relation extraction," in Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p.423, 2004.
    [16] Davis, A. P., Murphy, C. G., Johnson, R., Lay, J. M., Lennon-Hopkins, K., Saraceni-Richards, C., Sciaky, D., King, B. L., Rosenstein, M. C., and Wiegers, T. C., "The comparative toxicogenomics database: update 2013," Nucleic acids research, p.gks994, 2012.
    [17] Debusmann, R. and Kuhlmann, M., Dependency grammar: Classification and exploration, in Resource-adaptive cognitive processes. 2010, Springer. p. 365-388.
    [18] DiMasi, J. A., "Risks in new drug development: approval success rates for investigational drugs," CLINICAL PHARMACOLOGY AND THERAPEUTICS-ST LOUIS-, Vol.69, No.5, pp.297-307, 2001.
    [19] Fleuren, W. W., Verhoeven, S., Frijters, R., Heupers, B., Polman, J., van Schaik, R., de Vlieg, J., and Alkema, W., "CoPub update: CoPub 5.0 a text mining system to answer biological questions," Nucleic acids research, Vol.39, No.suppl 2, pp.W450-W454, 2011.
    [20] Francis, P. T., Palmer, A. M., Snape, M., and Wilcock, G. K., "The cholinergic hypothesis of Alzheimer’s disease: a review of progress," Journal of Neurology, Neurosurgery & Psychiatry, Vol.66, No.2, pp.137-147, February 1, 1999, 1999.
    [21] Frijters, R., van Vugt, M., Smeets, R., van Schaik, R., de Vlieg, J., and Alkema, W., "Literature mining for the discovery of hidden connections between drugs, genes and diseases," PLoS computational biology, Vol.6, No.9, p.e1000943, 2010.
    [22] Goldstein, I., Lue, T. F., Padma-Nathan, H., Rosen, R. C., Steers, W. D., and Wicker, P. A., "Oral sildenafil in the treatment of erectile dysfunction," New England Journal of Medicine, Vol.338, No.20, pp.1397-1404, 1998.
    [23] Grando, S. A., "Connections of nicotine to cancer," Nat Rev Cancer, Vol.14, No.6, pp.419-29, Jun, 2014.
    [24] Grau, D., M.Phil, and Serbedzija, G., "Innovative Strategies for Drug Repurposing," 2007, <http://www.dddmag.com/articles/2007/09/innovative-strategies-drug-repurposing>, Access on June 29 2014.
    [25] Gupta, S. C., Sung, B., Prasad, S., Webb, L. J., and Aggarwal, B. B., "Cancer drug discovery by repurposing: teaching new tricks to old dogs," Trends in pharmacological sciences, Vol.34, No.9, pp.508-517, 2013.
    [26] Huang, M., Liu, J., and Zhu, X., "GeneTUKit: a software for document-level gene normalization," Bioinformatics, Vol.27, No.7, pp.1032-3, Apr 1, 2011.
    [27] Jensen, L. J., Saric, J., and Bork, P., "Literature mining for the biologist: from information retrieval to biological discovery," Nature reviews genetics, Vol.7, No.2, pp.119-129, 2006.
    [28] Kanehisa, M., Goto, S., Sato, Y., Furumichi, M., and Tanabe, M., "KEGG for integration and interpretation of large-scale molecular data sets," Nucleic acids research, p.gkr988, 2011.
    [29] Kim, H. J., Yim, G. W., Nam, E. J., and Kim, Y. T., "Synergistic Effect of COX-2 Inhibitor on Paclitaxel-Induced Apoptosis in the Human Ovarian Cancer Cell Line OVCAR-3," Cancer Research and Treatment, Vol.46, No.1, pp.81-92, 2014.
    [30] Kim, J.-D., Ohta, T., Pyysalo, S., Kano, Y., and Tsujii, J. i., "Overview of BioNLP'09 shared task on event extraction," in Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task, pp.1-9, 2009.
    [31] Ko, J. C., Chiu, H. C., Syu, J. J., Jian, Y. J., Chen, C. Y., Jian, Y. T., Huang, Y. J., Wo, T. Y., and Lin, Y. W., "Tamoxifen enhances erlotinib-induced cytotoxicity through down-regulating AKT-mediated thymidine phosphorylase expression in human non-small-cell lung cancer cells," Biochem Pharmacol, Vol.88, No.1, pp.119-27, Mar 1, 2014.
    [32] Kumar, B. P., Rajput, S., Dey, K. K., Parekh, A., Das, S., Mazumdar, A., and Mandal, M., "Celecoxib alleviates tamoxifen-instigated angiogenic effects by ROS-dependent VEGF/VEGFR2 autocrine signaling," BMC cancer, Vol.13, No.1, p.273, 2013.
    [33] Leaman, R., Doğan, R. I., and Lu, Z., "DNorm: disease name normalization with pairwise learning to rank," Bioinformatics, p.btt474, 2013.
    [34] Lee, S., Choi, J., Park, K., Song, M., and Lee, D., "Discovering context-specific relationships from biological literature by using multi-level context terms," BMC medical informatics and decision making, Vol.12, No.Suppl 1, p.S1, 2012.
    [35] Li, B., Gao, S., Wei, F., Bellail, A. C., Hao, C., and Liu, T., "Simultaneous targeting of EGFR and mTOR inhibits the growth of colorectal carcinoma cells," Oncology reports, Vol.28, No.1, pp.15-20, 2012.
    [36] Li, J., Xue, L., Hao, H., Li, R., and Luo, J., "Rapamycin combined with celecoxib enhanced antitumor effects of mono treatment on chronic myelogenous leukemia cells through downregulating mTOR pathway," Tumor Biology, pp.1-8, 2014.
    [37] Musial, A., Bajda, M., and Malawska, B., "Recent Developments in Cholinesterases Inhibitors for Alzheimers Disease Treatment," Current medicinal chemistry, Vol.14, No.25, pp.2654-2679, 2007.
    [38] Palumbo, A., Facon, T., Sonneveld, P., Blade, J., Offidani, M., Gay, F., Moreau, P., Waage, A., Spencer, A., and Ludwig, H., "Thalidomide for treatment of multiple myeloma: 10 years later," Blood, Vol.111, No.8, pp.3968-3977, 2008.
    [39] Sarafraz, F. and Nenadic, G., "Using SVMs with the command relation features to identify negated events in biomedical literature," in Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pp.78-85, 2010.
    [40] Surdeanu, M., Harabagiu, S., Williams, J., and Aarseth, P., "Using predicate-argument structures for information extraction," in Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, pp.8-15, 2003.
    [41] Swanson, D. R., "Fish oil, Raynaud's syndrome, and undiscovered public knowledge," Perspect Biol Med, Vol.30, No.1, p.7, 1986.
    [42] Swanson, D. R., "Migraine and magnesium: eleven neglected connections," Perspect Biol Med, Vol.31, No.4, pp.526-57, Summer, 1988.
    [43] Taurin, S. and Nehoff, H., "A novel role for raloxifene nanomicelles in management of castrate resistant prostate cancer," Vol.2014, p.323594, 2014.
    [44] Weeber, M., Vos, R., Klein, H., Aronson, A. R., and Molema, G., "Generating hypotheses by discovering implicit associations in the literature: a case report of a search for new potential therapeutic uses for thalidomide," Journal of the American Medical Informatics Association, Vol.10, No.3, pp.252-259, 2003.
    [45] Wei, C.-H. and Kao, H.-Y., "Cross-species gene normalization by species inference," BMC bioinformatics, Vol.12, No.Suppl 8, p.S5, 2011.
    [46] Wei, C. H., Kao, H. Y., and Lu, Z., "PubTator: a web-based text mining tool for assisting biocuration," Nucleic Acids Res, Vol.41, No.Web Server issue, pp.W518-22, Jul, 2013.
    [47] Whirl-Carrillo, M., McDonagh, E., Hebert, J., Gong, L., Sangkuhl, K., Thorn, C., Altman, R., and Klein, T. E., "Pharmacogenomics knowledge for personalized medicine," Clinical Pharmacology & Therapeutics, Vol.92, No.4, pp.414-417, 2012.
    [48] Wolff, T., Miller, T., and Ko, S., "Aspirin for the primary prevention of cardiovascular events: an update of the evidence for the US Preventive Services Task Force," Annals of internal medicine, Vol.150, No.6, pp.405-410, 2009.
    [49] Wu, C., Gudivada, R. C., Aronow, B. J., and Jegga, A. G., "Computational drug repositioning through heterogeneous network clustering," BMC Systems Biology, Vol.7, No.Suppl 5, p.S6, 2013.
    [50] Zhu, F., Shi, Z., Qin, C., Tao, L., Liu, X., Xu, F., Zhang, L., Song, Y., Liu, X., and Zhang, J., "Therapeutic target database update 2012: a resource for facilitating target-oriented drug discovery," Nucleic acids research, Vol.40, No.D1, pp.D1128-D1136, 2012.

    下載圖示 校內:2015-08-08公開
    校外:2015-08-08公開
    QR CODE