簡易檢索 / 詳目顯示

研究生: 張郁祥
Chang, Yu-Hsiang
論文名稱: 生成元圖發掘:利用病患電子病歷進行早期疾病預測
Generative Meta-Graph Discovery: Early Disease Prediction from Patient EHRs
指導教授: 莊坤達
Chuang, Kun-Ta
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2025
畢業學年度: 113
語文別: 英文
論文頁數: 43
中文關鍵詞: 電子病歷早期疾病預測圖生成圖分類癌症偵測
外文關鍵詞: Electronic health records (EHR), early disease prediction, graph generation, graph classification, cancer detection
相關次數: 點閱:4下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 電子病歷(EHR)提供了豐富的縱向病患資料,但由於其稀疏、不規則及異質性的特性,對於早期疾病預測構成了極大挑戰。本文探討將生成式方法應用於早期EHR圖形分類任務。我們提出一個新的生成式分類器框架Gen-MGD,首先針對不同疾病分別訓練生成式圖模型,產生能捕捉該疾病圖結構特徵的meta圖集合。接著,將每位新病患的EHR圖與這些meta圖集合比較,計算相似度向量作為後續分類的特徵。
    我們在MIMIC-IV資料集上,以六種癌症類別(包含乳癌、肺癌、卵巢癌、結腸癌、攝護腺癌及非癌症)進行實驗。結果顯示,Gen-MGD在僅使用30%病患EHR資料等早期場景下,對比傳統子圖嵌入方法如Sub2vec,有顯著的改進。然而在與採用temporal encoding機制、能在資料稀疏時維持預測表現的SOTA方法TRANS比較下,仍稍顯劣勢。
    此結果顯示未來可朝向於生成模型中加入時間感知設計,以更有效捕捉EHR圖的時間特徵。整體而言,我們提出的生成式觀點為結合EHR結構與時間資訊、發展更穩健的早期疾病分類方法奠定基礎。

    Electronic health records (EHR) provide rich longitudinal patient data, but pose significant challenges for early disease prediction due to their sparse, irregular, and heterogeneous nature. This paper investigates the application of a generative approach to tackle early EHR graph classification tasks. We propose a novel generative classifier framework, Gen-MGD, which first trains disease-specific graph generative models to create meta graph sets that capture key structural characteristics of different disease types. Each new patient EHR graph is then compared against these meta sets to compute a similarity-based feature vector, which is used for downstream disease classification.
    Our experiments on the MIMIC-IV dataset with six cancer types (including 'Breast', 'Lung', 'Ovary', 'Colon', 'Prostate' cancers, and 'Non-cancer') demonstrate that Gen-MGD improves upon classical subgraph embedding approaches like Sub2vec in early-stage settings (e.g., using only 30% of patient EHR). However, Gen-MGD still underperforms compared to state-of-the-art discriminative methods such as TRANS, which leverage temporal encoding to maintain predictive performance under sparse data conditions.
    This reveals an important future direction: enhancing generative models with time-aware mechanisms to better capture temporal patterns in EHR graphs. We believe our generative perspective lays the groundwork for developing more robust early disease classifiers that integrate structural and temporal EHR information.

    中文摘要 i Abstract ii Acknowledgment iii Contents iv List of Tables vi List of Figures vii 1 Introduction 1 1.1 Background 1 1.2 Motivation 2 1.3 Research Objective 3 1.4 Problem Formulation 4 2 Related Works 5 2.1 EHR Classification 5 2.1.1 Discriminative Approaches 5 2.1.2 Limitations of Existing Works 6 2.1.3 Toward Generative Approaches 6 3 Preliminary 7 3.1 Data Settings 7 3.2 Graph Construction 7 3.3 Laboratory Measurement Matrix 10 3.4 Graph Feature Distribution Analysis 10 4 Methodology 12 4.1 Training and Testing Pipeline 12 4.2 Graph Generative Model for EHR Graphs 13 4.2.1 GraphGen: Graph Generation 14 4.2.2 Disease-specific Meta Graph Sets 15 4.3 Similarity Score Module 16 4.3.1 Similarity Score Calculation 17 4.3.2 Feature Vector Composition 19 4.4 Classifier Training and Prediction 21 4.5 Overall Procedure 22 5 Experiments 25 6 Conclusions and Future Work 30 Bibliography 31

    [1] M. Gitlin, N. McGarvey, N. Shivaprakash, and Z. Cong, “Time duration and health care resource use during cancer diagnoses in the united states: A large claims database analysis,” Journal of Managed Care & Specialty Pharmacy, vol. 29, no. 6, pp. 659–670, 2023, pMID: 37276034.

    [2] N. Goyal, H. V. Jain, and S. Ranu, “Graphgen: A scalable approach to domain-agnostic labeled graph generation,” in Proceedings of The Web Conference 2020, ser. WWW ’20. New York, NY, USA: Association for Computing Machinery, 2020, p. 1253–1263.

    [3] L. Zwaan and H. Singh, “The challenges in defining and measuring diagnostic error,” Diagnosis, vol. 2, no. 2, pp. 97–103, 2015.

    [4] A. D. Auerbach, T. M. Lee, C. C. Hubbard, S. R. Ranji, K. Raffel, G. Valdes, J. Boscardin, A. K. Dalal, A. Harris, E. Flynn, J. L. Schnipper, and U. R. Group, “Diagnostic errors in hospitalized adults who died or were transferred to intensive care,” JAMA Internal Medicine, vol. 184, no. 2, pp. 164–173, 02 2024.

    [5] L. A. Schols, M. E. Maranus, P. P. Rood, and L. Zwaan, “Diagnostic discrepancies in the emergency department: A retrospective study,” Journal of Patient Safety, pp. 10–1097, 2023.

    [6] V. Allgar and R. Neal, “Delays in the diagnosis of six cancers: analysis of data from the national survey of nhs patients: Cancer,” British journal of cancer, vol. 92, no. 11, pp. 1959–1970, 2005.

    [7] D. Crosby, S. Bhatia, K. M. Brindle, L. M. Coussens, C. Dive, M. Emberton, S. Esener, R. C. Fitzgerald, S. S. Gambhir, P. Kuhn, T. R. Rebbeck, and S. Balasubramanian, “Early detection of cancer,” Science, vol. 375, no. 6586, p. eaay9040, 2022.

    [8] J. Kim, A. Harper, V. McCormack, H. Sung, N. Houssami, E. Morgan, M. Mutebi, G. Garvey, I. Soerjomataram, and M. M. Fidler-Benaoudia, “Global patterns and trends in breast cancer incidence and mortality across 185 countries,” Nature Medicine, pp. 1–9, 2025.

    [9] E. Choi, M. T. Bahadori, J. A. Kulas, A. Schuetz, W. F. Stewart, and J. Sun, “Retain: an interpretable predictive model for healthcare using reverse time attention mechanism,” in Proceedings of the 30th International Conference on Neural Information Processing Systems, ser. NIPS’16. Red Hook, NY, USA: Curran Associates Inc., 2016, p. 3512–3520.

    [10] Z. Che, S. Purushotham, K. Cho, D. A. Sontag, and Y. Liu, “Recurrent neural networks for multivariate time series with missing values,” Scientific Reports, vol. 8, 2016.

    [11] L. Rasmy, Y. Xiang, Z. Xie, C. Tao, and D. Zhi, “Med-bert: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction,” NPJ Digital Medicine, vol. 4, 2020.

    [12] R. Poulain and R. Beheshti, “Graph transformers on EHRs: Better representation improves downstream performance,” in The Twelfth International Conference on Learning Representations, 2024.

    [13] R. Xu, M. K. Ali, J. C. Ho, and C. Yang, “Hypergraph transformers for ehr-based clinical predictions,” AMIA Summits on Translational Science Proceedings, vol. 2023, p. 582, 2023.

    [14] Q. Wen, Z. Ouyang, J. Zhang, Y. Qian, Y. Ye, and C. Zhang, “Disentangled dynamic heterogeneous graph learning for opioid overdose prediction,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, ser. KDD ’22. New York, NY, USA: Association for Computing Machinery, 2022, p. 2009–2019.

    [15] J. Chen, C. Yin, Y. Wang, and P. Zhang, “Predictive modeling with temporal graphical representation on electronic health records,” in Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, ser. IJCAI ’24, 2024.

    [16] A. C. Li, A. Kumar, and D. Pathak, “Generative classifiers avoid shortcut solutions,” in The Thirteenth International Conference on Learning Representations, 2025.

    [17] J. J. C. Xian, S. Mahdavi, R. Liao, and O. Schulte, “From graph diffusion to graph classification,” in ICML 2024 Workshop on Structured Probabilistic Inference & Generative Modeling, 2024.

    [18] S. Loibl, F. Andr´e, T. Bachelot, C. Barrios, J. Bergh, H. Burstein, M. Cardoso, L. Carey, S. Dawood, L. Del Mastro et al., “Early breast cancer: Esmo clinical practice guideline for diagnosis, treatment and follow-up☆,” Annals of Oncology, vol. 35, no. 2, pp. 159–182, 2024.

    [19] A. Cervantes, R. Adam, S. Rosell´o, D. Arnold, N. Normanno, J. Ta¨ıeb, J. Seligmann, T. De Baere, P. Osterlund, T. Yoshino et al., “Metastatic colorectal cancer: Esmo clinical practice guideline for diagnosis, treatment and follow-up☆,” Annals of Oncology, vol. 34, no. 1, pp. 10–32, 2023.

    [20] A.-M. Dingemans, M. Fr¨uh, A. Ardizzoni, B. Besse, C. Faivre-Finn, L. Hendriks, S. Lantuejoul, S. Peters, N. Reguart, C. Rudin et al., “Small-cell lung cancer: Esmo clinical practice guidelines for diagnosis, treatment and follow-up☆,” Annals of Oncology, vol. 32, no. 7, pp. 839–853, 2021.

    [21] A. Gonz´alez-Mart´ın, P. Harter, A. Leary, D. Lorusso, R. Miller, B. Pothuri, I. Ray-Coquard, D. Tan, E. Bellet, A. Oaknin et al., “Newly diagnosed and relapsed epithelial ovarian cancer: Esmo clinical practice guideline for diagnosis, treatment and follow-up☆,” Annals of Oncology, vol. 34, no. 10, pp. 833–848, 2023.

    [22] A. Horwich, C. Parker, T. De Reijke, V. Kataja, E. G. W. Group et al., “Prostate cancer: Esmo clinical practice guidelines for diagnosis, treatment and follow-up,” Annals of Oncology, vol. 24, pp. vi106–vi114, 2013.

    [23] N. De Cao and T. Kipf, “Molgan: An implicit generative model for small molecular graphs,” arXiv preprint arXiv:1805.11973, 2018.

    [24] B. Samanta, A. De, G. Jana, V. G´omez, P. Chattaraj, N. Ganguly, and M. Gomez-Rodriguez, “Nevae: A deep generative model for molecular graphs,” Journal of machine learning research, vol. 21, no. 114, pp. 1–33, 2020.

    [25] J. You, B. Liu, Z. Ying, V. Pande, and J. Leskovec, “Graph convolutional policy network for goal-directed molecular graph generation,” Advances in neural information processing systems, vol. 31, 2018.

    [26] S. Fan and B. Huang, “Labeled graph generative adversarial networks,” arXiv preprint arXiv:1906.03220, 2019.

    [27] A. Grover, A. Zweig, and S. Ermon, “Graphite: Iterative generative modeling of graphs,” in International conference on machine learning. PMLR, 2019, pp. 2434–2444.

    [28] Y. Li, O. Vinyals, C. Dyer, R. Pascanu, and P. Battaglia, “Learning deep generative models of graphs,” arXiv preprint arXiv:1803.03324, 2018.

    [29] J. You, R. Ying, X. Ren, W. Hamilton, and J. Leskovec, “Graphrnn: Generating realistic graphs with deep auto-regressive models,” in International conference on machine learning. PMLR, 2018, pp. 5708–5717.

    [30] M. Simonovsky and N. Komodakis, “Graphvae: Towards generation of small graphs using variational autoencoders,” in Artificial Neural Networks and Machine Learning–ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings, Part I 27. Springer, 2018, pp. 412–422.

    [31] R. Liao, Y. Li, Y. Song, S. Wang, W. L. Hamilton, D. Duvenaud, R. Urtasun, and R. Zemel, Efficient graph generation with graph recurrent attention networks. Red Hook, NY, USA: Curran Associates Inc., 2019.

    [32] A. E. Johnson, L. Bulgarelli, L. Shen, A. Gayles, A. Shammout, S. Horng, T. J. Pollard, S. Hao, B. Moody, B. Gow et al., “Mimic-iv, a freely accessible electronic health record dataset,” Scientific data, vol. 10, no. 1, p. 1, 2023.

    [33] B. Adhikari, Y. Zhang, N. Ramakrishnan, and B. A. Prakash, “Sub2vec: Feature learning for subgraphs,” in Advances in Knowledge Discovery and Data Mining: 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018, Proceedings, Part II. Berlin, Heidelberg: Springer-Verlag, 2018, p. 170–182.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE