簡易檢索 / 詳目顯示

研究生: 呂姿儀
Lu, Zi-Yi
論文名稱: 到院前心肺休止病患之存活關鍵因子分析——利用有效的資料清理技術與可解釋機器學習
Survival Factors Analysis of Out-of-Hospital Cardiac Arrest Patients via Effective Data Cleaning Techniques and Explainable Machine Learning
指導教授: 解巽評
Hsieh, Hsun-Ping
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 36
中文關鍵詞: OHCA異常檢測根本原因分析SHAP
外文關鍵詞: OHCA, Anomaly Detection, Root Cause Analysis, SHAP
相關次數: 點閱:85下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究之目的為透過資料科學方法以及機器學習技術,來探討對於 OHCA(到院前心肺休止)非創傷案件的存活關鍵因子,期望能為改善 OHCA 案件的急救流程和政策宣導提供有價值的洞察,進而提高 OHCA 案件的存活率。
    本研究將運用臺灣臺南市最新的 OHCA 非創傷案件資料,並做深入的探討。為了解決多數類別資料髒亂的問題,本研究提出了適用於此資料集特性的資料清理演算法,以確保資料的正確性和可靠性。此外,考慮到資料量不足和資料分布極度不平衡的情況,本研究將運用過採樣技術來生成少數類別資料以平衡資料集。接著,使用機器學習模型來預測 OHCA 非創傷病患最終是否成功存活。最後,採用 SHAP 來對模型進行全面的分析與解釋,以獲得對於各個存活關鍵因子的嶄新見解。

    The purpose of this study is to explore the key survival factors of OHCA (Out-of-Hospital Cardiac Arrest) non-trauma cases through rigorous data science methods and machine learning technology. It is expected to provide valuable insights for improving first aid procedures and policy advocacy in OHCA cases, thereby increasing the survival rate of OHCA cases.
    This study uses the latest data of OHCA non-trauma cases in Tainan City, Taiwan, and conducts an in-depth discussion. In order to solve the problem of messy data in the majority category, this study proposes a suitable data cleaning algorithm to ensure the correctness and reliability of the data. In addition, considering the insufficient amount of data and the extremely unbalanced distribution of data, this study adopts oversampling techniques to generate data in the minority category to balance the dataset. Next, a machine learning model is adopted to predict whether OHCA non-trauma patients eventually survive. Finally, SHAP is adopted to conduct a comprehensive analysis and interpretation of the model to gain new insights into the key survival factors.

    摘要 I Abstract II Acknowledgment III Contents IV List of Figures VI List of Tables VII Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 2 1.3 Related Research 4 1.4 Challenge 4 1.4.1 Extremely Imbalanced Data 4 1.4.2 Data Mess in Majority Category 5 1.4.3 Difficulty Evaluating Data Processing Methods 5 1.5 Method 5 1.6 Contribution 6 Chapter 2 Preliminary 8 2.1 Dataset 8 2.1.1 OHCA Non-trauma Data 8 2.1.2 Population Data 9 2.1.3 Weather Data 9 2.2 Exploratory Data Analysis 9 2.2.1 Correlation of EMT Level with CPC Values 10 2.2.2 Correlation of Lay Rescuer with CPC Values 11 Chapter 3 Method 12 3.1 Overall Workflow 12 3.2 Anomaly Detection 13 3.2.1 Outlier Detection Algorithm 13 3.2.2 Sampler-based Algorithm 14 3.3 Model Prediction 16 3.3.1 Feature Selection 16 3.3.2 Data Sampling 16 3.3.3 Model Selection 17 3.3.4 Evaluation Criteria 17 3.4 Root Cause Analysis 17 Chapter 4 Experiment 18 4.1 Experimental Setup 18 4.1.1 Dataset and Data Splitting 18 4.1.2 Oversampling Method Selection 19 4.1.3 Model Selection 19 4.2 Experimental Result 20 4.2.1 Unprocessed Data 20 4.2.2 Processed Data by Outlier Detection Algorithm 21 4.2.3 Processed Data by Sampler-based Algorithm 23 Chapter 5 Discussion 27 5.1 Summary Plot & Waterfall Plot 27 5.2 Aggregated Force Plot 28 5.3 Dependence Plot 29 Chapter 6 Threat to Validity & Future Work 31 6.1 Threat to Validity 31 6.1.1 Insufficient Amount of OHCA Data 31 6.1.2 The Emergency Rescue Data System is Still Being Improved 32 6.2 Future Work 32 Chapter 7 Conclusion 33 7.1 Conclusion 33 References 34

    [1] N. Al-Dury et al., “Identifying the relative importance of predictors of survival in out of hospital cardiac arrest: a machine learning study,” Scand J Trauma Resusc Emerg Med, vol. 28, no. 1, p. 60, Jun. 2020, doi: 10.1186/S13049-020-00742-9/FIGURES/3.
    [2] D. D. Berg, B. J. Bobrow, and R. A. Berg, “Key components of a community response to out-of-hospital cardiac arrest,” Nature Reviews Cardiology 2019 16:7, vol. 16, no. 7, pp. 407–416, Mar. 2019, doi: 10.1038/s41569-019-0175-4.
    [3] L. Breiman, “Random forests,” Mach Learn, vol. 45, no. 1, pp. 5–32, Oct. 2001, doi: 10.1023/A:1010933404324/METRICS.
    [4] M. M. Breuniq, H. P. Kriegel, R. T. Ng, and J. Sander, “LOF: identifying density-based local outliers,” ACM SIGMOD Record, vol. 29, no. 2, pp. 93–104, May 2000, doi: 10.1145/335191.335388.
    [5] M. Buda, A. Maki, and M. A. Mazurowski, “A systematic study of the class imbalance problem in convolutional neural networks,” Neural Networks, vol. 106, pp. 249–259, Oct. 2018, doi: 10.1016/j.neunet.2018.07.011.
    [6] N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority over-Sampling Technique,” J. Artif. Int. Res., vol. 16, no. 1, pp. 321–357, Jun. 2002.
    [7] C. C. Chen, C. W. Chen, C. K. Ho, I. C. Liu, B. C. Lin, and T. C. Chan, “Spatial Variation and Resuscitation Process Affecting Survival after Out-of-Hospital Cardiac Arrests (OHCA),” PLoS One, vol. 10, no. 12, p. e0144882, Dec. 2015, doi: 10.1371/JOURNAL.PONE.0144882.
    [8] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 13-17-August-2016, pp. 785–794, Aug. 2016, doi: 10.1145/2939672.2939785.
    [9] M. Cilinio, D. Duarte, P. Vieira, M. P. Queluz, and A. Rodrigues, “Root Cause Analysis of Low Throughput Situations Using Boosting Algorithms and the TreeShap Analysis,” IEEE Vehicular Technology Conference, vol. 2022-June, 2022, doi: 10.1109/VTC2022-SPRING54318.2022.9860734.
    [10] E. Conradsson and V. Johansson, “A MODEL-INDEPENDENT METHODOLOGY FOR A ROOT CAUSE ANALYSIS SYSTEM : A STUDY INVESTIGATING INTERPRETABLE MACHINE LEARNING METHODS,” 2019. Accessed: Jul. 15, 2023. [Online]. Available: https://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-160372
    [11] S. Dreiseitl and L. Ohno-Machado, “Logistic regression and artificial neural network classification models: a methodology review,” J Biomed Inform, vol. 35, no. 5–6, pp. 352–359, Oct. 2002, doi: 10.1016/S1532-0464(03)00034-0.
    [12] J. H. Friedman, “Greedy Function Approximation: A Gradient Boosting Machine,” The Annals of Statistics, vol. 29, no. 5, pp. 1189–1232, 2001, [Online]. Available: http://www.jstor.org/stable/2699986
    [13] S. Hao, Y. Liu, Y. Wang, Y. Wang, and W. Zhe, “Three-Stage Root Cause Analysis for Logistics Time Efficiency via Explainable Machine Learning,” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 22, pp. 2987–2996, Aug. 2022, doi: 10.1145/3534678.3539024.
    [14] H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” Proceedings of the International Joint Conference on Neural Networks, pp. 1322–1328, 2008, doi: 10.1109/IJCNN.2008.4633969.
    [15] M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and B. Scholkopf, “Support vector machines,” IEEE Intelligent Systems and their Applications, vol. 13, no. 4, pp. 18–28, 1998, doi: 10.1109/5254.708428.
    [16] B. Jennett and M. Bond, “ASSESSMENT OF OUTCOME AFTER SEVERE BRAIN DAMAGE: A Practical Scale,” The Lancet, vol. 305, no. 7905, pp. 480–484, Mar. 1975, doi: 10.1016/S0140-6736(75)92830-5.
    [17] G. Ke et al., “LightGBM: A Highly Efficient Gradient Boosting Decision Tree,” Adv Neural Inf Process Syst, vol. 30, 2017, Accessed: Jul. 15, 2023. [Online]. Available: https://github.com/Microsoft/LightGBM.
    [18] S. Y. Ko, S. Do Shin, K. J. Song, J. H. Park, and S. C. Lee, “Effect of awareness time interval for out-of-hospital cardiac arrest on outcomes: A nationwide observational study,” Resuscitation, vol. 147, pp. 43–52, Feb. 2020, doi: 10.1016/J.RESUSCITATION.2019.12.009.
    [19] C. Kun-Fu, “救護車院前勤務時間與院前心臟驟停病患之預後分析,” 學術論文, 中國醫藥大學, 2021. Accessed: Jul. 15, 2023. [Online]. Available: https://hdl.handle.net/11296/fr3x6w
    [20] C. Y. Lai et al., “Survival factors of hospitalized out-of-hospital cardiac arrest patients in Taiwan: A retrospective study,” PLoS One, vol. 13, no. 2, Feb. 2018, doi: 10.1371/JOURNAL.PONE.0191954.
    [21] F. T. Liu, K. M. Ting, and Z. H. Zhou, “Isolation forest,” Proceedings - IEEE International Conference on Data Mining, ICDM, pp. 413–422, 2008, doi: 10.1109/ICDM.2008.17.
    [22] S. M. Lundberg, P. G. Allen, and S.-I. Lee, “A Unified Approach to Interpreting Model Predictions,” Adv Neural Inf Process Syst, vol. 30, 2017, Accessed: Jul. 15, 2023. [Online]. Available: https://github.com/slundberg/shap
    [23] I. Mani and I. Zhang, “kNN approach to unbalanced data distributions: a case study involving information extraction,” in Proceedings of workshop on learning from imbalanced datasets, 2003, pp. 1–7.
    [24] P. Pui-yun Wong et al., “A spatiotemporal data mining study to identify high-risk neighborhoods for out-of-hospital cardiac arrest (OHCA) incidents,” Scientific Reports 2022 12:1, vol. 12, no. 1, pp. 1–9, Mar. 2022, doi: 10.1038/s41598-022-07442-7.
    [25] J. R. Quinlan, “Induction of decision trees,” Machine Learning 1986 1:1, vol. 1, no. 1, pp. 81–106, Mar. 1986, doi: 10.1007/BF00116251.
    [26] M. Ritala, “Detection and data-driven root cause analysis of paper machine drive anomalies,” 2019. Accessed: Jul. 15, 2023. [Online]. Available: https://lutpub.lut.fi/handle/10024/160631
    [27] P. J. Rousseeuw, “Least Median of Squares Regression,” J Am Stat Assoc, vol. 79, no. 388, p. 871, Dec. 1984, doi: 10.2307/2288718.
    [28] B. Schölkopf, R. Williamson, A. Smola, J. Shawe-Taylor, and J. Platt, “Support Vector Method for Novelty Detection,” in Proceedings of the 12th International Conference on Neural Information Processing Systems, in NIPS’99. Cambridge, MA, USA: MIT Press, 1999, pp. 582–588.
    [29] S. Shalev-Shwartz, Y. Singer, N. Srebro, and A. Cotter, “Pegasos: Primal estimated sub-gradient solver for SVM,” Math Program, vol. 127, no. 1, pp. 3–30, Mar. 2011, doi: 10.1007/S10107-010-0420-4/METRICS.
    [30] W. Sirikul et al., “A retrospective multi-centre cohort study: Pre-hospital survival factors of out-of-hospital cardiac arrest (OHCA) patients in Thailand,” Resusc Plus, vol. 9, p. 100196, Mar. 2022, doi: 10.1016/J.RESPLU.2021.100196.
    [31] A. Sladjana, P. Gordana, and S. Ana, “Emergency response time after out-of-hospital cardiac arrest,” Eur J Intern Med, vol. 22, no. 4, pp. 386–393, Aug. 2011, doi: 10.1016/J.EJIM.2011.04.003.
    [32] K. Taunk, S. De, S. Verma, and A. Swetapadma, “A brief review of nearest neighbor algorithm for learning and classification,” 2019 International Conference on Intelligent Computing and Control Systems, ICCS 2019, pp. 1255–1260, May 2019, doi: 10.1109/ICCS45141.2019.9065747.
    [33] R. Tibshirani, “Regression Shrinkage and Selection Via the Lasso,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 58, no. 1, pp. 267–288, Jan. 1996, doi: 10.1111/J.2517-6161.1996.TB02080.X.
    [34] N. J. Tierney et al., “Evaluating health facility access using Bayesian spatial models and location analysis methods,” PLoS One, vol. 14, no. 8, p. e0218310, Aug. 2019, doi: 10.1371/JOURNAL.PONE.0218310.

    無法下載圖示 校內:2028-08-19公開
    校外:2028-08-19公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE