簡易檢索 / 詳目顯示

研究生: 鄭憶婷
Cheng, Yi-Ting
論文名稱: 運用時序性疾病風險樣式探勘之慢性疾病早期評估
Early Assessment of Chronic Diseases by Mining Disease Sequential Risk Patterns
指導教授: 謝孫源
Hsieh, Sun-Yuan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 醫學資訊研究所
Institute of Medical Informatics
論文出版年: 2016
畢業學年度: 104
語文別: 英文
論文頁數: 59
中文關鍵詞: 資料探勘電子病歷疾病風險評估疾病早期偵測時序性探勘
外文關鍵詞: Data mining, Electronic medical records, Disease Risk Assessment, Disease Early Detection, Sequential Pattern analysis
相關次數: 點閱:113下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 現今,隨著人口逐漸老化,慢性疾病將帶來一連串的負擔並影響患者之生活品質。在醫學領域中,有許多慢性疾病因其初期尚未有顯著的特徵或症狀,且其成因依舊無法明確得知。由於對於慢性疾病要去達到提早預測疾病或是預防是一道難題,因此為了有效達到提早預防與治療之目標,我們需要透過深入了解疾病的風險因子並加以分析。本研究中,針對慢性疾病,我們提出了一個新穎性的分析方法架構,藉由台灣人民於不同時間點前往醫療院就醫之大量診斷資料,加以結合運用資料探勘之技術深入分析其時序性風險樣式,建立出有效之預測模型。此外,經由實驗之驗證,我們將所提出的方法架構運用於慢性阻塞性肺炎疾病之上,證實本研究所設計之方法架構將可有效達到慢性疾病早期評估與預防之目的。

    Chronic diseases have become one of the major concerns in medical fields since they may cause heavy burden on health care resources and disturb the quality of life. In this thesis, we propose a novel approach for early assessment on the target chronic disease by mining related sequential risk patterns and analyzing the time information from diagnostic clinical records using sequential rule mining and classification techniques. Through experimental evaluation on a large-scale nationwide clinical database in Taiwan, our approach is shown to be not only capable of deriving many sequential risk patterns but also reliable in prediction results. Moreover, our proposed method also takes into account the time gap between the events in the sequence. We benefit from this information and perform temporal analysis, such that the discovered sequential risk patterns may provide potential clues for physicians to derive novel markers for early detection on the target disease. For empirical evaluation, we demonstrate the effectiveness of our proposed framework by applying it to Chronic Obstructive Pulmonary Disease. This study achieves to address the important issue of early assessment on the target disease through building the proposed framework from large-scale clinical databases.

    摘要 I Abstract II 誌謝 III 1. Introduction 1 1.1. Background 1 1.2. Motivation 2 1.3. Research Aims and Challenges 2 1.4. Thesis Organization 5 2. Related Works 6 2.1. National Health Insurance Research Database (NHIRD) 6 2.2. Chronic Obstructive Pulmonary Disease 8 2.3. Data Mining in Disease Analysis 9 2.4. Sequential-based Methods on Diseases Prediction 12 3. Proposed Method 14 3.1. Proposed Framework 15 3.2. Data preprocessing 16 3.2.1. Data collection 16 3.2.2. Data cleaning 18 3.2.3. Resampling 20 3.3. Sequential Risk Pattern Mining 20 3.4. Classification Modeling 21 3.5. Post Analysis 22 3.5.1. Time Gap Extracting 22 3.5.2. Novelty Measuring 23 4. Experimental Evaluation and System Implementation 24 4.1. Dataset Description 24 4.2. Experimental Flow 28 4.3. Experiment Environment Setting 29 4.4. Experimental Results 30 4.4.1. Sequential Rule-based Classification 30 4.4.2. PubMed Validation 35 4.4.3. Risk patterns with time intervals 41 4.5. System Implementation 44 4.6. Summary of the Experimental Results 47 5. Discussions 50 6. Conclusions and Future Works 53 6.1. Conclusions 53 6.2. Future Work 54 References 56

    [1] Asha, T., Natarajan, S., and Murthy, K. B., “Associative classification in the prediction of tuberculosis,” in Proc. ACM the International Conference & Workshop on Emerging Trends in Technology. pp. 1327-1330, Feb. 2011.
    [2] Chawla, Nitesh V., Nathalie Japkowicz, and Aleksander Kotcz, “Editorial: special issue on learning from imbalanced data sets” ACM SIGKDD Explorations Newsletter, vol. 6, pp.1–6, 2004.
    [3] Chin, C. Y., Weng, M. Y., Lin, T. C., Cheng, S. Y., Yang, Y. H. K., and Tseng, V. S., “Mining Disease Risk Patterns from Nationwide Clinical Databases for the Assessment of Early Rheumatoid Arthritis Risk,” PLoS ONE, Vol. 10, Issue 4, pp. e122508, April, 2015.
    [4] COPD predicted to be third leading cause of death in 2030. Retrieved Dec. 3, 2015, from http://www.who.int/
    [5] Davis, D. A., Chawla, N. V., Blumm, N., Christakis, N., and Barabási, A. L., ” Predicting individual disease risk based on medical history,” In Proc. the 17th ACM conference on Information and knowledge management. pp. 769-778, Oct. 2008.
    [6] Fournier-Viger, P., Gomariz, A., Gueniche, T., Soltani, A., Wu., C., Tseng, V. S. (2014). SPMF: a Java Open-Source Pattern Mining Library. Journal of Machine Learning Research (JMLR), 15: 3389-3393.
    [7] Global Initiative for Chronic Obstructive Lung Disease Inc., “Global strategy for the diagnosis, management, and prevention of COPD,”January 2015.
    [8] Halbert, R. J., Isonaka, S., George, D., and Iqbal, A., “Interpreting COPD prevalence estimates: what is the true burden of disease?,” Chest journal, 123(5), pp.1684-1692, 2003.
    [9] Han, J., Pei, J., and Yin, Y., “Mining frequent patterns without candidate generation,” ACM SIGMOD Record, 29(2), pp. 1-12, 2000.
    [10] International Classification of Diseases (ICD). Retrieved Mar. 7, 2016, from http://www.who.int/
    [11] Li, J., Fu, A. W. C., He, H., Chen, J., Jin, H., McAullay, and Kelman, C., “Mining risk patterns in medical data,” in Proc. ACM SIGKDD international conference on Knowledge discovery and data mining. SIGKDD’05, pp. 770-775, Aug. 2005.
    [12] Li, W., Han, J., and Pei, J., “CMAR: Accurate and efficient classification based on multiple class-association rules,” In Proc. international conference on data mining. ICDM, pp. 369-376, 2001.
    [13] Ministry of Health and Welfare, “National Health Insurance Administration”, Taiwan, R.O.C. National Health Insurance Annual Report 2014-2015.
    [14] My Health Bank.Retrieved Dec. 29, 2015, from https://med.nhi.gov.tw/ihke0000/IHKE0100S01.aspx
    [15] National Health Insurance Administration, Ministry of Health and Welfare, Taiwan, R.O.C. (2014). National Health Insurance Annual Report 2014-2015.
    [16] Norén, G. N., Hopstadius, J., Bate, A., Star, K., and Edwards, I. R., ”Temporal pattern discovery in longitudinal electronic patient records.” DMKD, 20(3), pp.361–387, 2010.
    [17] Norén, G. N., Bate, A., Hopstadius, J., Star, K., and Edwards, I. R., “Temporal pattern discovery for trends and transient effects: its application to patient records.” In Proc. The the 14th ACM international conference on Knowledge discovery and data mining. SIGKDD, pp. 963-971, Aug. 2008
    [18] Regional COPD Working Group., “COPD prevalence in 12 Asia–Pacific countries and regions: Projections based on the COPD prevalence estimation model.” Respirology, 8(2), pp.192-198, 2003.
    [19] Reps, J., Garibaldi, J. M., Aickelin, U., Soria, D., Gibson, J. E., and Hubbard, R. B., “Discovering sequential patterns in a UK general practice database,” in Proc. IEEE-EMBS the International Conference on Biomedical and Health Informatics. BHI, pp. 960-963, Jan. 2012.
    [20] Soni, J., Ansari, U., Sharma, D., & Soni, S., “Predictive data mining for medical diagnosis: An overview of heart disease prediction,” International Journal of Computer Applications, 17(8), pp. 43-48, 2011.
    [21] Soriano, J. B., Visick, G. T., Muellerova, H., Payvandi, N., and Hansell, A. L., “Patterns of comorbidities in newly diagnosed COPD and asthma in primary care,” Chest Journal, 128(4), pp.2099-2107, 2005.
    [22] Srinivas, K., Rao, G. R., and Govardhan, A., “Analysis of coronary heart disease and prediction of heart attack in coal mining regions using data mining techniques,” In Proc. the 5th international conference on Computer Science and Education. ICCSE. pp. 1344-1349, 2010.
    [23] Tseng, V. S., and Lee, C. H., “Effective temporal data classification by integrating sequential pattern mining and probabilistic induction,” Expert Sys. App., 36(5), pp.9524–9532, 2009.
    [24] Van Manen, J. G., Bindels, P. J., Dekker, F. W., Bottema, B. J., van der Zee, J. S., Ijzermans, C. J., and Schadé, E., “The influence of COPD on health-related quality of life independent of the influence of comorbidity,” Journal of clinical epidemiology, 56(12), pp.1177-1184, 2003.
    [25] Wang, X., Sontag, D., and Wang, F., “Unsupervised learning of disease progression models,” In Proc. the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. SIGKDD’14, pp. 85-94, Aug. 2014.
    [26] Wren, J. D., and Garner, H. R., “Data-mining analysis suggests an epigenetic pathogenesis for type 2 diabetes,” BioMed Research International, 2005(2), pp.104-112, 2005.
    [27] Xie, Y., Redmond, S. J., Mohktar, M. S., Shany, T., Basilakis, J., Hession, M., and Lovell, N. H., Prediction of chronic obstructive pulmonary disease exacerbation using physiological time series patterns,” in Proc. IEEE International Engineering in Medicine and Biology Society Conf. EMBC’13, pp. 6784-6787, 2013.
    [28] Yuliana, O. Y., Rostianingsih, S., and Budhi, G. S., “Discovering sequential disease patterns in medical databases using freespan mining approach,” In International Conference on Advance Computer Science and Information System. ICACSIS'09, University of Indonesia, Jakarta, Indonesia, 2009.
    [29] Yuliana, O. Y., Rostianingsih, S., and Budhi, G. S., "Discovering sequential disease patterns in medical databases using freespan mining approach." In Proc. International Conference on Advance Computer Science and Information System. 2009.
    [30] Zaki, M. J., “Efficient enumeration of frequent sequences”, Journal of Machine learning, vol. 42, no. 1-2, pp.31-60, 2001.

    下載圖示 校內:2021-07-01公開
    校外:2021-07-01公開
    QR CODE