| 研究生: |
鄭憶婷 Cheng, Yi-Ting |
|---|---|
| 論文名稱: |
運用時序性疾病風險樣式探勘之慢性疾病早期評估 Early Assessment of Chronic Diseases by Mining Disease Sequential Risk Patterns |
| 指導教授: |
謝孫源
Hsieh, Sun-Yuan |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 醫學資訊研究所 Institute of Medical Informatics |
| 論文出版年: | 2016 |
| 畢業學年度: | 104 |
| 語文別: | 英文 |
| 論文頁數: | 59 |
| 中文關鍵詞: | 資料探勘 、電子病歷 、疾病風險評估 、疾病早期偵測 、時序性探勘 |
| 外文關鍵詞: | Data mining, Electronic medical records, Disease Risk Assessment, Disease Early Detection, Sequential Pattern analysis |
| 相關次數: | 點閱:113 下載:5 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
現今,隨著人口逐漸老化,慢性疾病將帶來一連串的負擔並影響患者之生活品質。在醫學領域中,有許多慢性疾病因其初期尚未有顯著的特徵或症狀,且其成因依舊無法明確得知。由於對於慢性疾病要去達到提早預測疾病或是預防是一道難題,因此為了有效達到提早預防與治療之目標,我們需要透過深入了解疾病的風險因子並加以分析。本研究中,針對慢性疾病,我們提出了一個新穎性的分析方法架構,藉由台灣人民於不同時間點前往醫療院就醫之大量診斷資料,加以結合運用資料探勘之技術深入分析其時序性風險樣式,建立出有效之預測模型。此外,經由實驗之驗證,我們將所提出的方法架構運用於慢性阻塞性肺炎疾病之上,證實本研究所設計之方法架構將可有效達到慢性疾病早期評估與預防之目的。
Chronic diseases have become one of the major concerns in medical fields since they may cause heavy burden on health care resources and disturb the quality of life. In this thesis, we propose a novel approach for early assessment on the target chronic disease by mining related sequential risk patterns and analyzing the time information from diagnostic clinical records using sequential rule mining and classification techniques. Through experimental evaluation on a large-scale nationwide clinical database in Taiwan, our approach is shown to be not only capable of deriving many sequential risk patterns but also reliable in prediction results. Moreover, our proposed method also takes into account the time gap between the events in the sequence. We benefit from this information and perform temporal analysis, such that the discovered sequential risk patterns may provide potential clues for physicians to derive novel markers for early detection on the target disease. For empirical evaluation, we demonstrate the effectiveness of our proposed framework by applying it to Chronic Obstructive Pulmonary Disease. This study achieves to address the important issue of early assessment on the target disease through building the proposed framework from large-scale clinical databases.
[1] Asha, T., Natarajan, S., and Murthy, K. B., “Associative classification in the prediction of tuberculosis,” in Proc. ACM the International Conference & Workshop on Emerging Trends in Technology. pp. 1327-1330, Feb. 2011.
[2] Chawla, Nitesh V., Nathalie Japkowicz, and Aleksander Kotcz, “Editorial: special issue on learning from imbalanced data sets” ACM SIGKDD Explorations Newsletter, vol. 6, pp.1–6, 2004.
[3] Chin, C. Y., Weng, M. Y., Lin, T. C., Cheng, S. Y., Yang, Y. H. K., and Tseng, V. S., “Mining Disease Risk Patterns from Nationwide Clinical Databases for the Assessment of Early Rheumatoid Arthritis Risk,” PLoS ONE, Vol. 10, Issue 4, pp. e122508, April, 2015.
[4] COPD predicted to be third leading cause of death in 2030. Retrieved Dec. 3, 2015, from http://www.who.int/
[5] Davis, D. A., Chawla, N. V., Blumm, N., Christakis, N., and Barabási, A. L., ” Predicting individual disease risk based on medical history,” In Proc. the 17th ACM conference on Information and knowledge management. pp. 769-778, Oct. 2008.
[6] Fournier-Viger, P., Gomariz, A., Gueniche, T., Soltani, A., Wu., C., Tseng, V. S. (2014). SPMF: a Java Open-Source Pattern Mining Library. Journal of Machine Learning Research (JMLR), 15: 3389-3393.
[7] Global Initiative for Chronic Obstructive Lung Disease Inc., “Global strategy for the diagnosis, management, and prevention of COPD,”January 2015.
[8] Halbert, R. J., Isonaka, S., George, D., and Iqbal, A., “Interpreting COPD prevalence estimates: what is the true burden of disease?,” Chest journal, 123(5), pp.1684-1692, 2003.
[9] Han, J., Pei, J., and Yin, Y., “Mining frequent patterns without candidate generation,” ACM SIGMOD Record, 29(2), pp. 1-12, 2000.
[10] International Classification of Diseases (ICD). Retrieved Mar. 7, 2016, from http://www.who.int/
[11] Li, J., Fu, A. W. C., He, H., Chen, J., Jin, H., McAullay, and Kelman, C., “Mining risk patterns in medical data,” in Proc. ACM SIGKDD international conference on Knowledge discovery and data mining. SIGKDD’05, pp. 770-775, Aug. 2005.
[12] Li, W., Han, J., and Pei, J., “CMAR: Accurate and efficient classification based on multiple class-association rules,” In Proc. international conference on data mining. ICDM, pp. 369-376, 2001.
[13] Ministry of Health and Welfare, “National Health Insurance Administration”, Taiwan, R.O.C. National Health Insurance Annual Report 2014-2015.
[14] My Health Bank.Retrieved Dec. 29, 2015, from https://med.nhi.gov.tw/ihke0000/IHKE0100S01.aspx
[15] National Health Insurance Administration, Ministry of Health and Welfare, Taiwan, R.O.C. (2014). National Health Insurance Annual Report 2014-2015.
[16] Norén, G. N., Hopstadius, J., Bate, A., Star, K., and Edwards, I. R., ”Temporal pattern discovery in longitudinal electronic patient records.” DMKD, 20(3), pp.361–387, 2010.
[17] Norén, G. N., Bate, A., Hopstadius, J., Star, K., and Edwards, I. R., “Temporal pattern discovery for trends and transient effects: its application to patient records.” In Proc. The the 14th ACM international conference on Knowledge discovery and data mining. SIGKDD, pp. 963-971, Aug. 2008
[18] Regional COPD Working Group., “COPD prevalence in 12 Asia–Pacific countries and regions: Projections based on the COPD prevalence estimation model.” Respirology, 8(2), pp.192-198, 2003.
[19] Reps, J., Garibaldi, J. M., Aickelin, U., Soria, D., Gibson, J. E., and Hubbard, R. B., “Discovering sequential patterns in a UK general practice database,” in Proc. IEEE-EMBS the International Conference on Biomedical and Health Informatics. BHI, pp. 960-963, Jan. 2012.
[20] Soni, J., Ansari, U., Sharma, D., & Soni, S., “Predictive data mining for medical diagnosis: An overview of heart disease prediction,” International Journal of Computer Applications, 17(8), pp. 43-48, 2011.
[21] Soriano, J. B., Visick, G. T., Muellerova, H., Payvandi, N., and Hansell, A. L., “Patterns of comorbidities in newly diagnosed COPD and asthma in primary care,” Chest Journal, 128(4), pp.2099-2107, 2005.
[22] Srinivas, K., Rao, G. R., and Govardhan, A., “Analysis of coronary heart disease and prediction of heart attack in coal mining regions using data mining techniques,” In Proc. the 5th international conference on Computer Science and Education. ICCSE. pp. 1344-1349, 2010.
[23] Tseng, V. S., and Lee, C. H., “Effective temporal data classification by integrating sequential pattern mining and probabilistic induction,” Expert Sys. App., 36(5), pp.9524–9532, 2009.
[24] Van Manen, J. G., Bindels, P. J., Dekker, F. W., Bottema, B. J., van der Zee, J. S., Ijzermans, C. J., and Schadé, E., “The influence of COPD on health-related quality of life independent of the influence of comorbidity,” Journal of clinical epidemiology, 56(12), pp.1177-1184, 2003.
[25] Wang, X., Sontag, D., and Wang, F., “Unsupervised learning of disease progression models,” In Proc. the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. SIGKDD’14, pp. 85-94, Aug. 2014.
[26] Wren, J. D., and Garner, H. R., “Data-mining analysis suggests an epigenetic pathogenesis for type 2 diabetes,” BioMed Research International, 2005(2), pp.104-112, 2005.
[27] Xie, Y., Redmond, S. J., Mohktar, M. S., Shany, T., Basilakis, J., Hession, M., and Lovell, N. H., Prediction of chronic obstructive pulmonary disease exacerbation using physiological time series patterns,” in Proc. IEEE International Engineering in Medicine and Biology Society Conf. EMBC’13, pp. 6784-6787, 2013.
[28] Yuliana, O. Y., Rostianingsih, S., and Budhi, G. S., “Discovering sequential disease patterns in medical databases using freespan mining approach,” In International Conference on Advance Computer Science and Information System. ICACSIS'09, University of Indonesia, Jakarta, Indonesia, 2009.
[29] Yuliana, O. Y., Rostianingsih, S., and Budhi, G. S., "Discovering sequential disease patterns in medical databases using freespan mining approach." In Proc. International Conference on Advance Computer Science and Information System. 2009.
[30] Zaki, M. J., “Efficient enumeration of frequent sequences”, Journal of Machine learning, vol. 42, no. 1-2, pp.31-60, 2001.