簡易檢索 / 詳目顯示

研究生: 徐翊凱
Hsu, Yi-Kai
論文名稱: 以文件探勘法建構藥物不良反應之預測模型
Predicting Adverse Drug Reactions by Text Mining Approach
指導教授: 李昇暾
Li, Sheng-Tun
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2017
畢業學年度: 105
語文別: 英文
論文頁數: 49
中文關鍵詞: 藥物不良反應主題模型文字探勘單類別分類
外文關鍵詞: Adverse Drug Reaction, Topic Model, Text mining, one class classification
相關次數: 點閱:82下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 藥物副作用為使用藥物時,所發生的預期之外的藥物作用,此類作用有正向也有負向。其中,負向的藥物副作用比正向的更需要被關注,我們將此類型的副作用稱為藥物不良反應 (ADR)。在病人接受藥物治療的過程中,若發生藥物不良反應是極為危險甚至有可能導致病人死亡。然而,並非所有的藥物再被市場使用前都能完全的被辨識出是否可能產生藥物不良反應。本研究的目的為利用臨床實際的病例,訓練出一個藥物不良反應的預測系統,為未來臨床辨識藥物不良反應之研究提供幫助。
    本研究中,我們以臨床的出院病歷摘要為文字資料,利用文字探勘的技術結合單類別分類來建立藥物不良反應的分類預測模型。首先,我們將含有病人診斷紀錄、過去病例與用藥資訊、治療經過等資訊的出院病歷摘要依照系統所需的資訊進行擷取。我們以Latent Dirichlet Allocation (LDA) 以及Term Frequency-Inverse Document Frequency (TF-IDF)來縮減未來分析時的資料維度。LDA 可以自動萃取出每篇病歷摘要中所含的主題分布,以及主題所包含的文字分布,以主題的組成代表每一篇病歷摘要。而TF-IDF可以找出訓練文集中的關鍵字。由於臨床資料中,正向與負向的資料分布極為不平均,因此我們採用了Support Vector Description Domain (SVDD) 來建立單類別的分類預測模型,其基礎概念為訓練出一個最小超球面以包含最大數量的資料點以進行分類與預測。未來,藥物專家可以針對我們的模型所篩選出來可能具有藥物不良反應的病歷摘要進行驗證,以提升藥物不良反應通報的正確性。

    Pharmaceutical drugs side-effect is an unexpected reaction under medical treatment. The negative drug side-effects, which also called adverse drug reactions (ADRs), is more important than the positive drugs side-effect because it may cause serious injury or even death while patient is under medication. However, many ADRs cannot be identified before a drug is available on the market. Therefore, the objective of this research is to build up an ADR prediction system using the clinical data.
    In this research, we use discharge summaries as corpus, and construct the ADRs prediction model with text mining approach and one class classification method. First, we preprocess the discharge summaries, which contain diagnosis, present illness and past history, therapeutic process of patient, to retrieve the information we need. For eliminating the dimensions of feature space, we employ two feature extraction methods: Latent Dirichlet Allocation (LDA) and Term Frequency-Inverse Document Frequency (TF-IDF) to extract the feature of document. LDA aims to extract the topic distribution in each summary and the composition of word in each topic automatically. TF-IDF is to extract the keywords in the discharge summaries corpus. In the clinical data, it is imbalance distribution for positive data and negative data. Therefore, we use SVDD, which basic notion is training out a smallest hypersphere to encircle the data we put in, to construct the one class prediction model. In the future, medical experts can verify the result generated by our model to help the clinical ADR research.

    摘要 I Abstract II 誌謝 III Table of Contents IV List of Tables VI List of Figure VII Chapter 1 Introduction 1 1.1 Background and Research Motivation 1 1.2 Research Objective 2 1.3 Process of the Research 3 Chapter 2 Literature Review 4 2.1 Adverse Drug Reactions (ADRs) 4 2.2 Topic Model 5 2.2.1 Probabilistic Latent Semantic Analysis 5 2.2.2 Latent Dirichlet Allocation 8 2.3 One Class Support Vector Machine 12 2.3.1 Support Vector Machine 13 2.3.2 Support Vector Domain Description 14 Chapter 3 Research Method 20 3.1 Data Preprocessing 21 3.1.1 Stop-words list 22 3.1.2 Segmentation and P-O-S tagging 22 3.1.3 Lemmatization 22 3.1.4 Lexicon Building 23 3.2 Feature Extraction 23 3.2.1 Topic Extraction 23 3.2.2 Keyword Extraction 26 3.3 Model Constructing 27 Chapter 4 Experiment and Analysis 29 4.1 Experiment Implement 29 4.1.1 Data Set Description 29 4.1.2 Topics Generating 31 4.1.3 Keywords Generating 35 4.1.4 Prediction Model Constructing 35 4.2 Experimental Results and Analysis 36 Chapter 5 Conclusion and Future work 44 5.1 Conclusion 44 5.2 Future work 45 References 47

    Ben-Hur, A., Horn, D., Siegelmann, H. T., & Vapnik, V. (2001). Support vector clustering. Journal of machine Learning research, 2(Dec), 125-137.
    Blei, D., & Lafferty, J. (2006). Correlated topic models. Advances in neural information processing systems, 18, 147.
    Blei, D. M., & Lafferty, J. D. (2006). Dynamic topic models. Paper presented at the Proceedings of the 23rd international conference on Machine learning.
    Blei, D. M., & Lafferty, J. D. (2007). A correlated topic model of science. The Annals of Applied Statistics, 17-35.
    Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.
    Camastra, F., & Verri, A. (2005). A novel kernel method for clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5), 801-805.
    Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 27.
    Chen, Y., Zhou, X. S., & Huang, T. S. (2001). One-class SVM for learning in image retrieval. Paper presented at the Image Processing, 2001. Proceedings. 2001 International Conference on.
    Girolami, M., & Kabán, A. (2003). On an equivalence between PLSI and LDA. Paper presented at the Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval.
    Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101(suppl 1), 5228-5235.
    Heinrich, G. (2009). Parameter estimation for text analysis. University of Leipzig, Tech. Rep.
    Hofmann, T. (1999a). Probabilistic latent semantic analysis. Paper presented at the Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence.
    Hofmann, T. (1999b). Probabilistic latent semantic indexing. Paper presented at the Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval.
    Hofmann, T., Puzicha, J., & Jordan, M. I. (1999). Learning from dyadic data. Advances in neural information processing systems, 466-472.
    Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K. (1999). An introduction to variational methods for graphical models. Machine learning, 37(2), 183-233.
    Kim, S., Liu, H., Yeganova, L., & Wilbur, W. J. (2015). Extracting drug–drug interactions from literature using a rich feature-based linear kernel approach. Journal of biomedical informatics, 55, 23-30.
    Kongkaew, C., Noyce, P. R., & Ashcroft, D. M. (2008). Hospital admissions associated with adverse drug reactions: a systematic review of prospective observational studies. Annals of Pharmacotherapy, 42(7-8), 1017-1025.
    Lazarou, J., Pomeranz, B. H., & Corey, P. N. (1998). Incidence of adverse drug reactions in hospitalized patients: a meta-analysis of prospective studies. Jama, 279(15), 1200-1205.
    Li, S.-T., Chen, C.-C., & Huang, F. (2011). Conceptual-driven classification for coding advise in health insurance reimbursement. Artificial intelligence in medicine, 51(1), 27-41.
    Lin, W.-Y., Li, H.-Y., Du, J.-W., Feng, W.-Y., Lo, C.-F., & Soo, V.-W. (2012). iADRs: towards online adverse drug reaction analysis. SpringerPlus, 1(1), 1.
    Minka, T., & Lafferty, J. (2002). Expectation-propagation for the generative aspect model. Paper presented at the Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence.
    Sampathkumar, H., Chen, X.-w., & Luo, B. (2014). Mining adverse drug reactions from online healthcare forums using hidden Markov model. BMC medical informatics and decision making, 14(1), 1.
    Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., & Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural computation, 13(7), 1443-1471.
    Sikdar, K. C., Alaghehbandan, R., MacDonald, D., Barrett, B., Collins, K. D., Donnan, J., & Gadag, V. (2010). Adverse drug events in adult patients leading to emergency department visits. Annals of Pharmacotherapy, 44(4), 641-649.
    Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. Handbook of latent semantic analysis, 427(7), 424-440.
    Tam, Y.-C., & Schultz, T. (2005). Dynamic language model adaptation using variational Bayes inference. Paper presented at the INTERSPEECH.
    Tax, D. M. (2001). One-class classification; Concept-learning in the absence of counter-examples. Delft Universaity of Technology ed.
    Tax, D. M., & Duin, R. P. (1999). Support vector domain description. Pattern recognition letters, 20(11), 1191-1199.
    Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2012). Hierarchical dirichlet processes. Journal of the american statistical association.
    Vapnik, V. (2013). The nature of statistical learning theory: Springer Science & Business Media.
    White, R. W., Tatonetti, N. P., Shah, N. H., Altman, R. B., & Horvitz, E. (2013). Web-scale pharmacovigilance: listening to signals from the crowd. Journal of the American Medical Informatics Association, 20(3), 404-408.
    Yang, H., & Yang, C. C. (2015). Using health-consumer-contributed data to detect adverse drug reactions by association mining with temporal analysis. ACM Transactions on Intelligent Systems and Technology (TIST), 6(4), 55.
    Yang, M., Kiang, M., & Shang, W. (2015). Filtering big data from social media–Building an early warning system for adverse drug reactions. Journal of biomedical informatics, 54, 230-240.

    無法下載圖示 校內:2022-12-31公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE