簡易檢索 / 詳目顯示

研究生: 周郁婕
Chou, Yu-Chieh
論文名稱: 基於缺失感知的多變量時間序列模型預測慢性腎衰竭病患之未來病況
Missing-aware Multivariate Time Series Forecasting CKD Patients' Future Disease Condition
指導教授: 蔣榮先
Chiang, Jung-Hsien
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 41
中文關鍵詞: 自監督式學習慢性腎臟病缺失感知多變量時間序列分類
外文關鍵詞: Self-Supervised Learning, Chronic Kidney Disease, Missing-Aware, Multivariate Time Series Classification
相關次數: 點閱:99下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 根據健保署的統計資料顯示,慢性腎臟病已經連續多年在健保支出中蟬聯第一。在台灣,慢性腎臟病的盛行率高達11.9%;在全球,也有將近8.5億的人口患有此疾病。一旦疾病進展到末期,患者只能依靠透析或是腎臟移植等方法來延續生命,無論對病患本身乃至於整個社會都是一大負擔。
    提前預測未來的腎功能狀況可以讓臨床醫師即早介入治療或提供衛教以減緩腎功能惡化的速度,然而,預測未來的腎功能並不是一件容易的事,儘管病患常規的抽血或驗尿數據可以輔助醫師判斷,但由於醫師難以處理大量的數據型資料,往往會造成預測的準確率不佳。因此,本研究希望藉由深度學習的方式來整合並分析多變量的時序資料,並透過自監督的代理任務讓模型學習克服電子病歷中高缺失率的問題,以萃取較佳的時序特徵,提高主要任務預測的準確率。
    本研究搜集了成大醫院從2010年至2023年間近九萬名病患在腎臟科門診的就醫資料集,並在該資料集上評估我們的模型表現。在判斷一年之內的腎功能是否會掉超過30%的二分類任務中,我們的模型達到79.4%的AUROC,並且我們邀請了臨床醫師實際在該任務上進行預測,實驗結果顯示,相較於住院醫師,我們的模型在準確率上提升了約10%;而相較於腎臟科主治醫師,我們的模型仍然有約6%的準確率提升。

    According to the report from Taiwan’s Ministry of Health and Welfare, Chronic kidney disease (CKD) is the most expensive disease in the country, with 11.9% of the population suffering from it. Furthermore, this disease is estimated to affect over 800 million individuals worldwide. Once progressing to end stage, patients can only rely on dialysis or kidney transplantation to sustain life, which is a heavy burden on both patients and society.
    Predicting future kidney function can assist clinicians in providing timely treatment and targeted health education to slow down the rapid deterioration of renal function among patients with CKD. However, it is not an easy task for nephrologists, as it can be influenced by a variety of risk factors. Although the trends observed in routine blood and urine tests can help doctors make judgments, they often face challenges in analyzing large amounts of data, which can impede their ability to make accurate predictions. Therefore, this study aims to use deep learning to integrate and analyze multivariate time series data, and to extract better time series feature representations through self-supervised pretext task. This will help overcome the high missing rate issue in EHR, thus improve performance of our main task.
    We evaluate our model on the dataset provided by National Cheng Kung University Hospital. This dataset comprises medical records of nearly 90,000 patients who received medical care at the Nephrology Department from 2010 to 2023. Our model achieved a 79.4% AUROC score in the binary classification task of predicting whether the estimated glomerular filtration rate (eGFR) will drop more than 30% within 1 year. Furthermore, we invite clinicians to make predictions on this task, and the results demonstrated that our model can outperform resident physicians with a 10% increase in accuracy, while still maintaining 6% accuracy improvement compared to the nephrology specialists.

    中文摘要 I Abstract III 誌謝 V Contents VII List of Tables IX List of Figures X Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 2 1.3 Research Objectives 4 1.4 Thesis Organization 4 Chapter 2 Literature Review 5 2.1 Self-Supervised Representation Learning for Time Series 5 2.2 Deep Learning Based Time Series Feature Extraction 6 2.3 Applications of Artificial Intelligence in CKD 7 Chapter 3 Data Preprocessing of Laboratory Data 9 3.1 Format of Input Data 9 3.2 Definition of Target Task 10 3.3 Data Exclusion 11 Chapter 4 Missing-aware Multivariate Time Series Classification Framework 13 4.1 Model Framework 13 4.2 Multivariate Time Series Feature Extractor 14 4.2.1 Fusion Module 14 4.2.2 Transformer Encoder 15 4.3 Representation Learning via Masked Prediction Pretext Task 18 4.4 Fine-tuning for Main Task 19 Chapter 5 Experiments 21 5.1 Experimental Design 21 5.2 Dataset 21 5.3 Experiment Setting 23 5.4 Evaluation Metric 23 5.5 Ablation Study 24 5.6 Comparison with Machine Learning Based Models 27 5.7 Comparison with Deep Learning Based Models 29 5.8 Imputation Performance Comparison 30 5.9 Clinical Validation and Case Study 32 Chapter 6 Conclusion and Future Work 37 6.1 Conclusion 37 6.2 Future Work 38 Reference 39

    Levey, A.S., et al., A more accurate method to estimate glomerular filtration rate from serum creatinine: a new prediction equation. Annals of internal medicine, 1999. 130(6): p. 461-470.
    Levey, A.S., et al., A new equation to estimate glomerular filtration rate. Annals of internal medicine, 2009. 150(9): p. 604-612.
    108年全民健康保險醫療統計. 2019 [cited 2021 July 13]; Available from: https://dep.mohw.gov.tw/dos/lp-5103-113-xCat-y108.html.
    Smart, N.A., et al., Early referral to specialist nephrology services for preventing the progression to end‐stage kidney disease. Cochrane Database of Systematic Reviews, 2014(6).
    Ren, H., et al. Rapt: Pre-training of time-aware transformer for learning robust healthcare representation. in Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. 2021.
    Eldele, E., et al., Time-series representation learning via temporal and contextual contrasting. arXiv preprint arXiv:2106.14112, 2021.
    Yue, Z., et al. Ts2vec: Towards universal representation of time series. in Proceedings of the AAAI Conference on Artificial Intelligence. 2022.
    Zerveas, G., et al. A transformer-based framework for multivariate time series representation learning. in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021.
    Devlin, J., et al., Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
    Hochreiter, S. and J. Schmidhuber, Long short-term memory. Neural computation, 1997. 9(8): p. 1735-1780.
    Bai, S., J.Z. Kolter, and V. Koltun, An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271, 2018.
    Vaswani, A., et al., Attention is all you need. Advances in neural information processing systems, 2017. 30.
    Pascanu, R., T. Mikolov, and Y. Bengio. On the difficulty of training recurrent neural networks. in International conference on machine learning. 2013. Pmlr.
    Dovgan, E., et al., Using machine learning models to predict the initiation of renal replacement therapy among chronic kidney disease patients. Plos one, 2020. 15(6): p. e0233976.
    Ventrella, P., et al., Supervised machine learning for the assessment of chronic kidney disease advancement. Computer Methods and Programs in Biomedicine, 2021. 209: p. 106329.
    Inaguma, D., et al., Increasing tendency of urine protein is a risk factor for rapid eGFR decline in patients with CKD: A machine learning-based prediction model by using a big database. PloS one, 2020. 15(9): p. e0239262.
    Inaguma, D., et al., Development of a machine learning-based prediction model for extremely rapid decline in estimated glomerular filtration rate in patients with chronic kidney disease: a retrospective cohort study using a large data set from a hospital in Japan. BMJ open, 2022. 12(6): p. e058833.
    Nagin, D.S., Group-based trajectory modeling: an overview. Handbook of quantitative criminology, 2010: p. 53-67.
    Ioffe, S. and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. in International conference on machine learning. 2015. pmlr.
    He, K., et al. Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

    下載圖示 校內:2025-07-31公開
    校外:2025-07-31公開
    QR CODE