| 研究生: |
吳淑儀 Wu, Shu-I |
|---|---|
| 論文名稱: |
應用資料探勘技術於多重死因資料之疾病關聯分析 Using Data Mining Techniques to Explore Diseases Associations in Multiple Causes of Death Dataset |
| 指導教授: |
呂宗學
Lu, Tsung-hsueh, |
| 學位類別: |
碩士 Master |
| 系所名稱: |
醫學院 - 公共衛生學系 Department of Public Health |
| 論文出版年: | 2008 |
| 畢業學年度: | 96 |
| 語文別: | 中文 |
| 論文頁數: | 100 |
| 中文關鍵詞: | 多重死因 、資料探勘 、關聯法則 、群集分析 、癲癇 、跌倒 |
| 外文關鍵詞: | multiple causes of death, data mining, association rules, clustering analysis, falls, epilepsy |
| 相關次數: | 點閱:97 下載:10 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
研究背景:多重死因統計將死亡證明書所有死因診斷皆鍵入,因此可以當作疾病關聯分析之研究素材。但是礙於可能疾病排列組合數量龐大,因此大多研究只能根據已知的醫學知識事先設定少數可能相關疾病進行統計檢定。
研究目標:利用資料探勘關聯法則與群集分析兩種方法,以不事先設定作法探討多重死因統計資料中,與癲癇或跌倒外傷有高度相關的可能疾病與併發症有哪些,期望能找到目前醫學知識尚未知的疾病關聯。
材料方法:資料來源為台灣地區2001年至2004年(共有529,450個個案)及美國2001年(共有2,419,960個個案)死亡證明書之電子登錄檔。本研究針對兩個疾病進行疾病關聯分析,一是癲癇,二是跌倒。可能有關聯的疾病、併發症與外傷,以「臨床分類碼」進行分類,共分為260群疾病診斷。關聯法則分析主要採取Apriori演算法,參考支持度、信賴度、增益值與對比值四種指標來決定重要關聯規則。針對較重要關聯規則進一步以 物件法評判分群品質的參考指標(Silhouette Coefficient)作為群集分群之依據。
研究結果:依據不同指標,疾病關聯法則排序不同。綜合考慮四種指標,與癲癇相關的疾病與併發症的關聯法則相關最強的是癲癇與急性腦血管疾病。相關性適中的情況,台灣沒有任何規則被保留,美國則是保留了5條規則,大部份都是與腦部疾病相關的規則,僅有1條是與高血壓有關之規則被保留。進一步群集分析結果顯示性別與年齡都是分群的重要變項。
綜合考慮四種指標,可以發現台灣與美國跌倒類型與外傷規則相關較強的部份,台灣有5條規則被保留,而美國沒有相關規則被保留。在相關性適中的部份,台灣有4條規則被保留,而美國有6條規則被保留。保留的這些規則當中,台灣的規則中都與不同平面跌倒及未特定跌倒有關,而美國的規則則是同一平面跌倒與未特定的跌倒。外傷之部份,台灣皆是與頭骨或臉部骨折有關,美國則是與股骨頸骨折有關,而且大多發生在老年女性。
研究結論:本研究所發現的重要癲癇疾病關聯規則大多是已知之醫學知識,也就是癲癇是急性腦血管疾病與許多腦部疾病的併發症。沒有發現新穎疾病關聯的可能原因之一,主要是醫師填寫死亡證明書可能相關疾病時,已經受到現有醫學知識影響而限制填寫疾病之種類。至於與跌倒類型相關的外傷關聯規則與分群,台灣與美國有相當大差異。台灣幾乎沒有出現同一水平面跌倒的有意義關聯法,美國則是顯著出現同一水平面跌倒與股骨頸骨折(與骨質疏鬆相關)的關聯規則。台灣明顯出現不同水平面跌倒與頭部外傷的關聯規則,提供台灣跌倒相關事故傷害防制重要參考訊息。
Background: Multiple causes of death (MCOD) dataset include all diagnoses reported on the death certificates which could be used to explore the possible associations of diseases. However, because of the large number of possible candidate alignments, most studies could confine to priori hypotheses based on current known medical knowledge.
Objective: Using two data mining techniques, i.e., association rules and clustering analysis, the aim of this study was to identify diseases or complications that were highly associated with epilepsy or injury from falls in MCOD dataset.
Materials and methods: The MCOD dataset of Taiwan from 2001 through 2004 (n=529,450) and of the Unitied States in 2001 (n=2,419,960) were obtained for this study. The index disease was epilepsy and injury from falls. The possible candidate alignmental diseases and complications were classified into 260 according to Clinical Classification Software. A priori algorithm using four criteria (i.e., support, condition, lift and odds ratios) was used to idenfity significant association rules. Among those significant association rules we further used silhouette coefficient to determine clustering groups.
Results: According to different criteria we could identify different association rules for epilepsy. The combination of epilepsy and acute cerebrovascular disease rank first if we considered all criteria. For moderate level of association, no association rule was found in Taiwan. On the contrary, there were five association rules preserved in the US dataset and most of the diseases and complications were related to brain and hypertension. Age and sex were two important variables according to clustering analyses.
According to higher standard of criteria, there were five association rules identified in Taiwan, but none in the US dataset. In Taiwan, most of the association rules were related to falls from highs and unspecified falls. Nevertheless, in the United States most of the association rules were related to level falls. With regard to the types of injury associated with falls, it was head and face injury in Taiwan and it was hip fracture in the United States, especially in elderly females.
Conlcusion: Most of the association rules we identified in this study were consistent with existing medical knowledge. One of the explanations was that certifying physician repoted selected number of diagnoses according to the current medical knowledge. We found quite different pattern of association rules and clustering groups in falls-related deaths which could provide important clues for further injury prevention studies.
王雅苓,黃崇謙,楊啟賢 (民95年)。頭部外傷原因與醫療資源耗用。北市醫學雜誌,3(11), 1087-1098。
中華民國癲癇學會。Thttp://www.proshine.com.tw/10case/epilepsyT/newpage1.htm。
呂宗學,林金申,李孟智,周明智 (民83年)。臨床醫師對死因診斷所應有的觀念與態度。台灣醫界,37(6), 521-524。
林茂榮,蔡素蘭,陳淑雅,曾信嘉 (民91年)。台灣中部某鄉村社區老人跌倒之危險因子。台灣公共衛生雜誌,21(1), 73-82。
張鴻仁,任一安,周穎政,呂宗學 (民93年)。疾病費用統計分類群組架構相關問題探討兼介紹疾病分類軟體。台灣衛誌,23(5), 338-354。
黃少君,陳曾基,周碧瑟 (民94)。石牌地區老人跌倒累積發生率及相關因素之探討。台灣衛誌,24(2), 136-145。
曾憲雄,蔡秀滿,蘇東興,曾秋蓉,王慶堯 (民94年)。資料探勘。臺北市:旗標(初版)。
盧洲成(民91年)。資料採礦在生物醫學資訊之應用。淡江大學統計研究所碩士論文。
葉兆斌,周明智,林榮一,王世名 (民92年)。跌倒意外事件的流行病學分析。中華民國重症醫學雜誌,5, 96-109。
Agrawal, R. & Shafer, J. (1996). Parallel mining of association rules. IEEE Transactions on Knowledge and Data Engineering, 8(6), 962-969.
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H. & Verkamo, A. I. “Fast discovery of association rules,” in advances in knowledge discovery and data mining, Fayyad, M. U., Piatetsky-Shapiro, G., Smyth, P. & Uthurusamy, R., eds., AAAI/MIT Press, 1996, pp. 307-328.
Berry, M. J. A., & Linoff, G. S. (2001)。資料採礦理論與實務-顧客關係管理暨電子行銷之應用。(彭文正譯)。臺北縣:維科圖書。(原著出版年:1997年)。
Berry, M. J. A., & Linoff, G. S. (2001)。資料採礦理論與實務-顧客關係管理的技巧與科學。(吳旭智、賴淑貞譯)。臺北縣:維科圖書。(原著出版年:2000年)。
Boyle, C. A., Decoufle, P. & Holmgreen, P. (1994). Contribution of developmental disabilities to childhood mortality in the United States: a multiple-cause-of-death analysis. Paediatric and Perinatal Epidemiology, 8, 411-422.
Brossette, S. E., Sprague, A. P., Hardin, J. M., Waites, K. B., Jones, W. T. & Moser, S. A. (1998). Association rules and data mining in hospital infection control and public health surveillance. Journal of the American Medical Informatics Association, 5(4), 373-381.
Chae, Y. M., Ho, S. H., Cho, K. W., Lee, D. H. & Ji, S. H. (2001). Data mining approach to policy analysis in a health insurance domain. International Journal of Medical Informatics, 62, 103-111.
Chen, T. J., Chou, L. E. & Hwang, S. J.(2003). Application of a data-mining technique to analyze coprescription patterns for antacids in Taiwan. Clinical Therapeutics, 25(9), 2453-2463.
Cheung, D. W., Ng, V. T., Fu, A. W. & Fu, Y. (1996). Efficient mining of association rules in distributed databases. IEEE Transactions on Knowledge and Data Engineering, 8(6), 911-922.
Fayyad, M. U. (1996). Data mining and knowledge discovery: making sense out of data. IEEE Expert, 11(10), 20-25.
Fayyad, M. U., Piatetsky-Shapiro, G. & Smyth, P. (1999). “From data mining to knowledge discovery: an overview,” in advances in knowledge discovery and data mining, Fayyad, M. U., Piatetsky-Shapiro, G., Smyth, P. & Uthurusamy, R., eds., AAAI/MIT Press, pp. 1-36.
Fu, Y. (1997). Data mining tasks, techniques and applications. IEEE Potentials, 16(4), 18-20.
Fukuoka, Y., Lindgren, T. G., Rankin, S. H., Cooper, B. A. & Carroll, D. L. (2007). Cluster analysis: a useful technique to identify elderly cardiac patients at risk for poor quality of life. TQuality of Life ResearchT, 16, 1655-1663.
Goldacre, M. J., Roberts, S. E. & Griffith M. ( 2003). Multiple-cause coding of death from myocardial infarction: population-based study of trends in death dertificate data. Journal of Public Health Medicine, 25(1), 69-71.
Han, J., Kamber, M. (2001). Data mining concepts and techniques. San Francisco :Morgan Kaufmann.
Huang, Z. (1998). Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, 2, 283-304.
Imamura, T., Matsumoto, S., Kanagawa, Y., Tajima, B., Matsuya, S., Furue, M. & Oyama, H. (2007). A technique for identifying three diagnostic findings using association analysis. Medical and Biological Engineering and Computing, 45, 51-59.
Israel, R. A., Rosenberg, H. M. & Curtin, L. R. (1986). Review and commentary analytical potential for multiple case-of-death data. American Journal of Epidemiology, 124(2), 161-179.
Kaufman, L., Rousseeuw, P. J. (1990). Finding froups in data: an introduction to cluster analysis. NY: John Wiley & Sons, Inc.
Kung, Y. Y., Chen, Y. C., Hwang, S. J., Chen, T. J. & Chen, F. P. (2006). The prescriptions frequencies and patterns of chinese herbal medicine for allergic rhinitis in Taiwan. Allergy, 61, 1316-1318.
Ordonez, C. (2006). Association rule discovery with the train and test approach for heart disease prediction. IEEE Transactions on Information Technology in Biomedicine, 10(2), 334-343.
Roiger, R. J. & Geatz, M. W. (2003)。資料探勘。(曾新穆、李建億譯)。臺北市:臺灣培生教育。(原著出版年:2003年)。
Sabbe, D., Bourdeaudhuij, I. D., Legiest, E. & Maes, L.(2007). A cluster-analytical approach towards physical activity and eating habits among 10-year-old children. Health Eduction Research.
SAS Institute Inc. (2002). Applying data mining techniques using enterprise miner course notes. NY: SAS Institute Inc.
Wilkins, K., Wysocki, M., Morin, C. & Wood, P. (1997). Multiple causes of death. Public Health Reports, 9(2), 84-95.
Wilkins, K., Parsons, G. F., Gentleman, J. F. & Forbes, W. F. (1999). Deaths due to dementia: an analysis of multiple-cause-of-death data. Chronic Diseases in Canada, 20(1), 26-33.
Wise, M. E. & Sorvillo, F. (2005). Hepatitis A-Related Mortality in California, 1989-2000: Analysis of multiple cause-coded death data. American Journal of Public Health, 95(5), 900-905.