簡易檢索 / 詳目顯示

研究生: 李語嫣
Lee, Yu-Yen
論文名稱: 運用資料探勘技術由健康檢查與生活習慣資料建立疾病預測模型-以糖尿病為例
Mining Health Examination and Personal Habits Data for Building Disease Prediction Models:A Case Study on Diabetes
指導教授: 曾新穆
Tseng, Vincent S.
共同指導教授: 吳晉祥
Wu, Jin-Shang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 醫學資訊研究所
Institute of Medical Informatics
論文出版年: 2010
畢業學年度: 98
語文別: 中文
論文頁數: 85
中文關鍵詞: 資料探勘健康檢查生活習慣資料健康風險樣式疾病分析預測模型糖尿病
外文關鍵詞: data mining, health examination, lifestyle, health risk pattern, prediction model, diabetes
相關次數: 點閱:151下載:11
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,隨著生活水平的提升,關於自我健康照護日漸受到重視,民眾藉由定期接受健康檢查以瞭解自己的生理健康狀態,以便及早發現疾病及早治療。因此,健康檢查對於國民健康而言也就更加重要。然而目前的健康檢查,健檢者卻僅能得知當次的健康檢查結果報告,缺乏對於未來的健康風險評估以及健康相關的改善調整建議。因此,本研究提出一個以糖尿病為主的疾病預測模型。利用資料探勘技術從分析健檢者歷次健康檢查時的生活習慣資料以及健康檢查紀錄,以獲得各個檢測項目對於糖尿病之健康風險樣式。使用分類技術將此些有利於預測疾病風險的樣式,建立一套有效的疾病預測模型,並且能將健康風險樣式提供給醫護人員做為診斷的參考。此外,為了讓預測模型能被一般診所廣泛地使用,進而提升民眾接受健康檢查的意願,模型的建立討論了兩個要素,分別為健檢者的年齡及模型所需要之健檢項目的檢驗成本,目的在於使健康風險樣式能貼近自身狀態,以及減少使用預測模型所需要的檢驗項目,找出診所能檢驗的項目,進一步即可使用此疾病預測模型。實驗方面,以實際的生活習慣及健康檢查資料表建立及評估我們的疾病預測模型。在分年齡的實驗中,以51~64歲的年齡層之實驗評估值有較好的呈現。因此,建議此年齡層的健檢者可使用由51~64歲年齡層所建立的糖尿病預測模型,以提高預測效果並且較為符合自身情況。而在節省檢驗費用的項目實驗中,我們確認了在糖尿病的風險預測上,部份的昂貴檢測項目可以被移除而不影響準確度。這些結果證明了本研究的方法確實能由健檢者資料中建立出有效的疾病預測模型,並有助於改善目前的健檢的不足,提供更多的健康照護資訊。

    Recently, with the development of the economy and the advancement of the national income, people have paid more attentions to self health conditions by using health examination. The health examination not only can help people clearly understand their own health conditions or avoid people missing the best time of disease diagnosis and treatment, but also provide the effect of disease prevention. Based on these reasons, the health examination is playing an important role in people’s health statuses. In general health examination, people only know their results of the examinations after the health examinations, but no further future health risk can be provided for them. In this thesis, we proposed a disease prediction model for diabetes, which may discover health risk patterns from the integrated historical lifestyle and health examination data. Further, an effective disease prediction model can be built with these patterns. In addition, in order to make an accurate disease prediction model with cheaper examination items, we discussed two important factors related to model building, namely the age of examinees and the price of health examination. Through experiments, both of the actual health examination dataset and the historical lifestyle dataset were used to evaluate our proposed disease prediction model. For age experiments, the classifier of age level “51~64” showed better performance than others. Therefore, we suggest the examinees of this age level can use this classifier to make whole diabetes prediction results more accurate and more consistent with their own conditions. For the price of examination items, the experimental results showed that some expensive items can be removed from general health examination without major accuracy effect on diabetes prediction. In sum, all of these results show that our approach can build an effect diabetes prediction model based on health examination related data. The proposed model can provide the supplement for the insufficient information of future risk assessment and for enhancement of health examination level.

    摘要 I ABSTRACT III 誌謝 V 目錄 VI 表目錄 X 圖目錄 XII 第一章 導論 1 1.1 研究背景 1 1.2 研究動機 2 1.3 問題描述 3 1.4 研究方法 6 1.5 研究貢獻 7 1.6 論文架構 7 第二章 文獻探討 8 2.1 糖尿病 8 2.1.1 何謂糖尿病 8 2.1.2 糖尿病的類型 8 2.1.3 糖尿病的臨床診斷 10 2.1.4 糖尿病的高危險族群 10 2.2 健康檢查 11 2.2.1 健康檢查的定義 11 2.2.2 健康檢查的發展 12 2.3疾病預測 13 2.4資料探勘技術簡介 16 2.4.1循序樣式探勘 (Sequential Pattern Mining) 16 2.4.1.1 循序樣式探勘法之目的 16 2.4.1.2 循序樣式探勘法說明 16 2.4.1.3 循序樣式探勘範例 17 2.4.2 決策樹探勘法(Decision Tree) 20 2.4.2.1 決策樹探勘法之目的 21 2.4.2.2 決策樹探勘演算法說明 22 2.4.2.3 Weka J48演算法說明 24 2.4.3 關聯規則分類探勘(Classification Based on Associations, CBA) 24 2.4.3.1 關聯規則分類探勘法之定義 24 2.4.3.2 關聯規則分類探勘法之目的 25 2.4.3.3 關聯規則分類探勘法說明 25 第三章 研究方法 27 3.1 方法概念 27 3.1.1 問題定義 27 3.1.2 方法架構 28 3.2 以資料探勘方法建立疾病預測模組 31 3.2.1 資料前處理 31 3.2.2 特徵擷取 34 3.2.3 疾病預測模組建立 38 3.2.3.1 決策樹(Decision Tree) 39 3.2.3.2 關聯規則分類探勘(Classification Based on Associations, CBA) 40 3.3 疾病預測 40 第四章 實驗分析 42 4.1 實驗資料說明 42 4.1.1 資料描述 42 4.1.2 資料前處理 46 4.1.3 實驗資料集的組合 49 4.2 實驗的評估方式 52 4.3 實驗設計 54 4.4 實驗分析 56 4.4.1 僅使用當次健康檢查結果預測之比較 56 4.4.2不平衡與平衡資料比較 57 4.4.3 分類器之比較 58 4.4.3.1 評估值討論 58 4.4.3.2 決策樹判斷方式 59 4.4.3.3 CBA關聯規則 61 4.4.4 不同實驗資料之比較 63 4.4.4.1 單純使用生活習慣項目 63 4.4.4.2 全部的生活習慣項目,加上部分健康檢查項目 64 4.4.4.3 全部的生活習慣項目,加上全部的健康檢查項目 65 4.4.4.4 使用全部的健康檢查項目 66 4.4.5 不同年齡層之比較 67 4.5 實驗總結 68 第五章 結論與未來研究方向 70 5.1 結論 70 5.1.1 建立在醫療知識上的運作基礎 70 5.1.2 健康風險樣式的探勘 71 5.1.3. 實驗結果分析 71 5.2 未來研究方向 72 參考文獻 74 附錄一 健康檢查報告 78 附錄二 健康檢查項目一覽及參考值 79 自述 85

    [1] R. Agrawal, R. Srikant, “Fast Algorithms for Mining Association Rules,” in Proceeding of 20th International Conference on Very Large Databases, 1994.
    [2] R. Agrawal, R. Srikant, “Mining Sequential Patterns,” in Proceeding of 11th International Conference on Data Engineering, 1995.
    [3] E. Armengol, A. Palaudàries, E. Plaza, “Individual Prognosis of Diabetes Long-term Risks: A CBR Approach,” Methods of information in medicine, pp.46-51, 2001.
    [4] K. J. Barriga, R. F. Hamman, S. Hoag, J. A. Marshall, S. M. Shetterly, “Population Screening for Glucose Intolerant Subjects Using Decision Tree Analyses,” Diabetes Research and Clinical Practice, pp.S17-S29, 1996.
    [5] G. P. Elizabeth, V. Arturo, C. P. Francisco, “Using neural networks for differential diagnosis of Alzheimer Disease and Vascular Dementia,” Expert Systems with Applications, pp.219-225, 1998.
    [6] E. Frank, M. A. Hall, Geoffrey Holmes, Richard Kirkby, Bernhard Pfahringer, Ian H, “Weka - a machine learning workbench for data mining,” The Data Mining and Knowledge Discovery Handbook, pp.1305-1314, 2005.
    [7] R. Hagura, “Diabetes mellitus and life-style — for the primary prevention of diabetes mellitus: the role of diet,” British Journal of Nutrition, pp.191-194, 2000.
    [8] Y. Huang, P. McCullagh, N. Black and R. Harper, “Feature selection and classification model construction on type 2 diabetic patients’ data,” Artificial Intelligence in Medicine, pp.251-262, 2007.
    [9] R. Ichise, M. Numao, “First-Order Rule Mining by Using Graphs Created from Temporal Medical Data,” Lecture Notes in Computer Science (LNCS), pp.121-125, 2005.
    [10] I. Kononenko, “Machine Learning for Medical Diagnosis: History, State of the Art and Perspective,” Artificial Intelligence in Medicine, pp.89-109, 2001.
    [11] B. Liu, W. Hsu, Y. Ma , “Integrating Classification and Association Rule Mining,“ The Fourth International Conference on Knowledge Discovery and Data Mining, New York, USA, 1998.
    [12] S. Palaniappan, R. Awang, “Intelligent Heart Disease Prediction System Using Data Mining Techniques,” International Journal of Computer Science and Network Security (IJCSNS), pp.343-350, 2008.
    [13] S. B. Patil, Dr.Y.S.Kumaraswamy, “Extraction of Significant Patterns from Heart Disease Warehouses for Heart Attack Prediction,” International Journal of Computer Science and Network Security( IJCSNS), pp.228-234, 2009.
    [14] J.R. Quinlan, “Induction Of Decision Trees. Machine Learning 1,” Kluwer Academic Press, pp.81-106, 1986.
    [15] J.R. Quinlan, “C4.5: programs for machine learning,” The Morgan Kaufmann series in machine learning. Morgan Kaufmann Publishers, San Mateo, Calif., 1993.
    [16] S. K. Rogers, D. W. Ruck and M. Kabrisky, “Artificial Neural Networks for Early Detection and Diagnosis of Cancer,” Cancer Letters, pp.79-83, 1994.
    [17] L. Rs, M. AD. “Preventive service in clinical practice: Designing the periodic health examination,” The Journal of the American Medical Association (JAMA), pp.2205-2207, 1987.
    [18] Vincent S. Tseng, C.H. Lee and Jessie C.Y. Chen, “An Integrated Data Mining System for Patient Monitoring with Applications on Asthma Care,” In Proceedings of the 21th IEEE International Symposium on Computer-Based Medical Systems, 2008.
    [19] J. G. Wolff, “Medical diagnosis as pattern recognition in a framework of information compression by multiple alignment, unification and search,” Elsevier Decision Support Systems, 2005.
    [20] N. Younis, H. Soran, S. Farook, “The prevention of type 2 diabetes mellitus: recent advances,” An International Journal of Medicine (QJM), pp.451-455, 2004.
    [21] J. C. Zgibor, K. Ruppert, T. J. Orchard, S. S. Soedamah-Muthu, J. Fuller, N. Chaturvedi, M. S. Roberts, “Development of a coronary heart disease risk prediction model for type 1 diabetes: The Pittsburgh CHD in Type 1 Diabetes Risk Model,” Diabetes Research and Clinical Practice, 2010.
    [22] B. Zupan, J. Demsar, M.W. Kattan, J.R. Beck and I. Bratko, “Machine Learning for Survival Analysis: A Case Study on Recurrence of Prostate Cancer,” Artificial Intelligence in Medicine, pp.59-75, 2000.
    [23] CDC Diabetes Cost-Effectiveness Study Group, “The Cost Effectiveness of Screening for Type 2 Diabetes,” The Journal of the American Medical Association (JAMA), pp.1757-1763, 1998.
    [24] Medical Practice Committee, American College of Physician, “Periodic Health Examination: A guide for designing individualized preventive health care in the asymmetric patient,” Annals of Internal Medicine, pp.729-732, 1981.
    [25] 羅麗君,“健檢的定義” 中華民國醫檢會報,1996。
    [26] 李智峰,“健檢服務業現狀與經營策略之分析” 長庚大學醫學暨工程學院管理研究所碩士論文,1997。

    [27] 李博智、邱昭彰、邱文科、劉祖華、莊逸洲、黃崇哲、許光宏,“三維人體測值及資料探勘技術在高血脂症預測模式之應用(Three-dimension anthropometrics and Data Mining Approaches to Predict Hyperlipidemia)” 台灣醫療管理科學學會研討會 (SHMS),2002。
    [28] 張俊郎、張鈺芳,“運用資料探勘於糖尿病高危險群早期療育之研究” 中華民國品質學會第42屆年會暨第12屆全國品質管理研討會,2006。
    [29] 中華民國行政院衛生署 http://www.doh.gov.tw/
    [30] 行政院衛生署國民健康局 http://www.bhp.doh.gov.tw/
    [31] 財團法人糖尿病關懷基金會 http://www.dmcare.org.tw/
    [32] 中華民國糖尿病衛教學會 http://www.tade.org.tw/

    下載圖示 校內:2012-08-27公開
    校外:2012-08-27公開
    QR CODE