| 研究生: |
廖涴婷 Liao, Wo-Ting |
|---|---|
| 論文名稱: |
一種基於自動編碼器雙重篩選和身份驗證機制的安全聯邦蒸餾方法 A Secure Federated Distillation Method Based on Dual Filtering and Identity Authentication Mechanisms with Autoencoders |
| 指導教授: |
許志仲
Hsu, Chih-Chung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 數據科學研究所 Institute of Data Science |
| 論文出版年: | 2024 |
| 畢業學年度: | 113 |
| 語文別: | 中文 |
| 論文頁數: | 72 |
| 中文關鍵詞: | 聯邦蒸餾 、非監督學習 、隱私保護 、COVID-19 診斷 |
| 外文關鍵詞: | Federated Distillation, Unsupervised Learning, Privacy Preservation, COVID- 19 Diagnosis |
| 相關次數: | 點閱:136 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著 COVID-19 疫情在全球迅速蔓延,醫療資料的共享與分析成為當前醫療領域的重要議題。然而,傳統的資料共享方式通常面臨隱私洩露的風險,為了應對這一挑戰,本研究提出了一種名為非監督身份驗證聯邦蒸餾模型 (Unsupervised Authentication Federated Distillation, UA-FD),以解決電子病歷資料共享與隱私保護的雙重難題。UA-FD 結合了自動編碼器、雙重篩選機制和身份驗證技術,並在預訓練階段進行了優化,不僅強化了隱私保護與模型穩健性,還顯著降低了對標註資料的依賴。
本研究對 UA-FD 在 COVID-19 診斷、MNIST 手寫數字辨識以及 Fashion-MNIST 服飾分類三個任務中進行了全面評估。實驗結果顯示 UA-FD 在效能上與監督式學習方法相當,並在處理獨立非同分布的資料時表現出卓越的穩定性。特別是在 COVID-19 診斷任務中 UA-FD 在召回率和 F1-Score 指標上展現了與監督學習方法相媲美的表現。
此外,針對 UA-FD 的安全性,本研究進行了深入評估。面對投毒攻擊和資料反轉攻擊時 UA-FD 展現出顯著的防禦能力,明顯優於其他對比方法。這些結果強調了 UA-FD 在保護資料隱私和穩健性方面的優越性。
UA-FD 在效能、隱私保護和攻擊防禦等方面的卓越表現,使其成為對隱私高度敏感的領域中應用聯邦蒸餾技術的理想選擇。未來的研究將進一步探索如何增強預訓練階段的安全性,並在更多實際應用場景中驗證 UA-FD 的有效性。
As the COVID-19 pandemic rapidly spread worldwide, the sharing and analysis of medical data became a critical issue in the healthcare field. However, traditional methods of data sharing often face the risk of privacy breaches. To address this challenge, this study proposes an Unsupervised Authentication Federated Distillation (UA-FD) model to solve the dual problems of electronic health record (EHR) data sharing and privacy protection. UA-FD integrates autoencoders, a dual filtering mechanism, and authentication techniques, and it is optimized during the pre-training phase. This approach not only enhances privacy protection and model robustness but also significantly reduces reliance on labeled data.
This study comprehensively evaluates UA-FD on three tasks: COVID-19 diagnosis, MNIST handwritten digit recognition, and Fashion-MNIST clothing classification. Experimental results show that UA-FD performs comparably to supervised learning methods and demonstrates exceptional stability when handling independently and non-identically distributed (non-IID) data. Notably, in the COVID-19 diagnosis task, UA-FD achieves recall and F1-Score metrics comparable to those of supervised learning methods.
Additionally, the security of UA-FD was thoroughly evaluated. The model exhibits strong defensive capabilities against poisoning attacks and data inversion attacks, outperforming other comparison methods. These findings underscore the superiority of UA-FD in both privacy protection and robustness.
UA-FD’s outstanding performance in terms of effectiveness, privacy protection, and attack defense makes it an ideal choice for applying federated distillation technology in highly privacy-sensitive fields. Future research will further explore ways to enhance the security of the pre-training phase and validate the effectiveness of UA-FD in more practical application scenarios.
[1] A. Rahman, M. S. Hossain, N. A. Alrajeh, and F. Alsolami, “Adversarial examples—security threats to covid-19 deep learning systems in medical iot devices,” IEEE Internet of Things Journal, vol. 8, no. 12, pp. 9603–9610, 2020.
[2] H. Malik and T. Anees, “Federated learning with deep convolutional neural networks for the detection of multiple chest diseases using chest x-rays,” Multimedia Tools and Applications, pp. 1–29, 2024.
[3] N. Rieke, J. Hancox, W. Li, F. Milletari, H. R. Roth, S. Albarqouni, S. Bakas, M. N. Galtier, B. A. Landman, K. Maier-Hein, et al., “The future of digital health with federated learning,” NPJ digital medicine, vol. 3, no. 1, pp. 1–7, 2020.
[4] I. Adjei-Mensah, X. Zhang, I. O. Agyemang, S. B. Yussif, A. A. Baffour, B. M. Cobbinah, C. Sey, L. D. Fiasam, I. A. Chikwendu, and J. R. Arhin, “Cov-fed: Federated learning-based framework for covid-19 diagnosis using chest x-ray scans,” Engineering Applications of Artificial Intelligence, vol. 128, p. 107448, 2024.
[5] S. A. Bagabir, N. K. Ibrahim, H. A. Bagabir, and R. H. Ateeq, “Covid-19 and artificial intelligence: Genome sequencing, drug development and vaccine discovery,” Journal of Infection and Public Health, vol. 15, no. 2, pp. 289–296, 2022.
[6] 陳珮馨, “臺灣醫院電子病歷資訊安全防護策略,” Master’s thesis, 臺北醫學大學, 2018. 碩士論文.
[7] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communicationefficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics, pp. 1273–1282, PMLR, 2017.
[8] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel, D. Ramage, A. Segal, and K. Seth, “Practical secure aggregation for privacy-preserving machine learning,” in proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1175–1191, 2017.
[9] L. Zhu, Z. Liu, and S. Han, “Deep leakage from gradients,” Advances in neural information processing systems, vol. 32, 2019.
[10] B. Zhao, K. R. Mopuri, and H. Bilen, “idlg: Improved deep leakage from gradients,” arXiv preprint arXiv:2001.02610, 2020.
[11] E. Jeong, S. Oh, H. Kim, J. Park, M. Bennis, and S.-L. Kim, “Communication-efficient on-device machine learning: Federated distillation and augmentation under non-iid private data,” arXiv preprint arXiv:1811.11479, 2018.
[12] H. Takahashi, J. Liu, and Y. Liu, “Breaching fedmd: image recovery via paired-logits inversion attack,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12198–12207, 2023.
[13] J. Zhang, C. Chen, and L. Lyu, “Ideal: Query-efficient data-free learning from black-box models,” in The Eleventh International Conference on Learning Representations, 2022.
[14] J. Shao, F. Wu, and J. Zhang, “Selective knowledge sharing for privacy-preserving federated distillation without a good teacher,” Nature Communications, vol. 15, no. 1, p. 349, 2024.
[15] L. Wang, Z. Q. Lin, and A. Wong, “Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images,” Scientific reports, vol. 10, no. 1, p. 19549, 2020.
[16] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[17] H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,” arXiv preprint arXiv:1708.07747, 2017.
[18] D. Li and J. Wang, “Fedmd: Heterogenous federated learning via model distillation,” arXiv preprint arXiv:1910.03581, 2019.
[19] S. Itahara, T. Nishio, Y. Koda, M. Morikura, and K. Yamamoto, “Distillation-based semi-supervised federated learning for communication-efficient collaborative training with non-iid private data,” IEEE Transactions on Mobile Computing, vol. 22, no. 1, pp. 191–205, 2021.
[20] S. V. Kogilavani, J. Prabhu, R. Sandhiya, M. S. Kumar, U. Subramaniam, A. Karthick, M. Muhibbullah, and S. B. S. Imam, “Covid-19 detection based on lung ct scan using deep learning techniques,” Computational and Mathematical Methods in Medicine, vol. 2022, 2022.
[21] E. D. Tenda, J. Henrina, A. Setiadharma, D. J. Aristy, P. Z. Romadhon, H. F. Thahadian, B. A. Mahdi, I. M. Adhikara, E. Marfiani, S. D. Suryantoro, et al., “Derivation and validation of novel integrated inpatient mortality prediction score for covid-19 (impact) using clinical, laboratory, and ai—processed radiological parameter upon admission: a multicentre study,” Scientific Reports, vol. 14, no. 1, p. 2149, 2024.
[22] 徐俊瑋, “利用物理不可複製函式建構階層式存取控制機制應用於電子病歷,” Master’s thesis,慈濟大學, 2024. 碩士論文.
[23] K. Bonawitz et al., “Practical secure aggregation for privacy-preserving machine learning,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017.
[24] D. Sui, Y. Chen, J. Zhao, Y. Jia, Y. Xie, and W. Sun, “Feded: Federated learning via ensemble distillation for medical relation extraction,” in Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pp. 2118– 2128, 2020.
[25] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by backpropagating errors,” nature, vol. 323, no. 6088, pp. 533–536, 1986.
[26] S. Stephanie, T. Shum, H. Cleveland, S. R. Challa, A. Herring, F. L. Jacobson, H. Hatabu, S. C. Byrne, K. Shashi, T. Araki, et al., “Determinants of chest radiography sensitivity for covid-19: a multi-institutional study in the united states,” Radiology: Cardiothoracic Imaging, vol. 2, no. 5, p. e200337, 2020.
[27] S. Jin, G. Liu, and Q. Bai, “Deep learning in covid-19 diagnosis, prognosis and treatment selection,” Mathematics, vol. 11, no. 6, p. 1279, 2023.
[28] N. Rajawat, B. S. Hada, M. Meghawat, S. Lalwani, and R. Kumar, “C-covidnet: A cnn model for covid-19 detection using image processing,” Arabian Journal for Science and Engineering, vol. 47, no. 8, pp. 10811–10822, 2022.
[29] B. Yan, J. Wang, J. Cheng, Y. Zhou, Y. Zhang, Y. Yang, L. Liu, H. Zhao, C. Wang, and B. Liu, “Experiments of federated learning for covid-19 chest x-ray images,” in Advances in Artificial Intelligence and Security: 7th International Conference, ICAIS 2021, Dublin, Ireland, July 19-23, 2021, Proceedings, Part II 7, pp. 41–53, Springer, 2021.
[30] W. Zhang, T. Zhou, Q. Lu, X.Wang, C. Zhu, H. Sun, Z.Wang, S. K. Lo, and F.-Y.Wang,“Dynamic-fusion-based federated learning for covid-19 detection,” IEEE Internet of Things Journal, vol. 8, no. 21, pp. 15884–15891, 2021.
[31] H. Malik, A. Naeem, R. A. Naqvi, and W.-K. Loh, “Dmfl_net: A federated learningbased framework for the classification of covid-19 from multiple chest diseases using x-rays,” Sensors, vol. 23, no. 2, p. 743, 2023.
[32] I. Feki, S. Ammar, Y. Kessentini, and K. Muhammad, “Federated learning for covid-19 screening from chest x-ray images,” Applied Soft Computing, vol. 106, p. 107330, 2021.
[33] Y. Chen, W. Lu, X. Qin, J. Wang, and X. Xie, “Metafed: Federated learning among federations with cyclic knowledge distillation for personalized healthcare,” IEEE Transactions on Neural Networks and Learning Systems, 2023.
[34] T. Mou, X. Jiang, J. Li, B. Yan, Q. Chen, T. Zhang, W. Huang, C. Gao, and Y. Chen,“Fedtam: Decentralized federated learning with a feature attention based multi-teacher knowledge distillation for healthcare,” in 2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS), pp. 1246–1253, IEEE, 2023.
[35] C. Zhang, Y. Xie, H. Bai, B. Yu, W. Li, and Y. Gao, “A survey on federated learning,” Knowledge-Based Systems, vol. 216, p. 106775, 2021.
[36] O. R. A. Almanifi, C.-O. Chow, M.-L. Tham, J. H. Chuah, and J. Kanesan, “Communication and computation efficiency in federated learning: A survey,” Internet of Things, vol. 22, p. 100742, 2023.
[37] T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, “Federated optimization in heterogeneous networks,” Proceedings of Machine learning and systems, vol. 2, pp. 429–450, 2020.
[38] J.Wang, Q. Liu, H. Liang, G. Joshi, and H.V. Poor, “Tackling the objective inconsistency problem in heterogeneous federated optimization,” Advances in neural information processing systems, vol. 33, pp. 7611–7623, 2020.
[39] S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “Scaffold: Stochastic controlled averaging for federated learning,” in International conference on machine learning, pp. 5132–5143, PMLR, 2020.
[40] S. Caldas, J. Konečny, H. B. McMahan, and A. Talwalkar, “Expanding the reach of federated learning by reducing client resource requirements,” arXiv preprint arXiv:1812.07210, 2018.
[41] J. Shao, Z. Li, W. Sun, T. Zhou, Y. Sun, L. Liu, Z. Lin, and J. Zhang, “A survey of what to share in federated learning: Perspectives on model utility, privacy leakage, and communication efficiency,” arXiv preprint arXiv:2307.10655, 2023.
[42] L. Li, J. Gou, B. Yu, L. Du, and Z. Y. D. Tao, “Federated distillation: A survey,” arXiv preprint arXiv:2404.08564, 2024.
[43] Y. Zhao, X. Deng, Y. Liu, X. Pei, J. Xia, and W. Chen, “Fully exploiting every real sample: Superpixel sample gradient model stealing,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 24316–24325, 2024.
[44] D. Bank, N. Koenigstein, and R. Giryes, “Autoencoders,” Machine learning for data science handbook: data mining and knowledge discovery handbook, pp. 353–374, 2023.
[45] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propagation,” in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1, ch. 8, pp. 318–362, Cambridge, MA, USA: MIT Press, 1986.
[46] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in Proceedings of the 25th international conference on Machine learning, pp. 1096–1103, 2008.
[47] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
[48] Y. Zhao, P. Barnaghi, and H. Haddadi, “Multimodal federated learning on iot data,” in 2022 IEEE/ACM Seventh International Conference on Internet-of-Things Design and Implementation (IoTDI), pp. 43–54, IEEE, 2022.
[49] C.-K. Hsieh, F.-T. Chien, and M.-K. Chang, “Autoencoder-enhanced federated learning with reduced overhead and lower latency,” in 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 2118–2123, IEEE, 2023.
[50] M. Beitollahi and N. Lu, “Flac: Federated learning with autoencoder compression and convergence guarantee,” in GLOBECOM 2022-2022 IEEE Global Communications Conference, pp. 4589–4594, IEEE, 2022.
[51] S. M. Shah and V. K. Lau, “Model compression for communication efficient federated learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 9, pp. 5937–5951, 2021.
[52] Z. Gu, L. He, P. Li, P. Sun, J. Shi, and Y. Yang, “Frepd: A robust federated learning framework on variational autoencoder.,” Comput. Syst. Sci. Eng., vol. 39, no. 3, pp. 307–320, 2021.
[53] X. Ding, G. Li, L. Yuan, L. Zhang, and Q. Rong, “Combining autoencoder with adaptive differential privacy for federated collaborative filtering,” in International Conference on Database Systems for Advanced Applications, pp. 661–676, Springer, 2023.
[54] J. Geiping, H. Bauermeister, H. Dröge, and M. Moeller, “Inverting gradients-how easy is it to break privacy in federated learning?,” Advances in neural information processing systems, vol. 33, pp. 16937–16947, 2020.
[55] M. Nasr, R. Shokri, and A. Houmansadr, “Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning,” in 2019 IEEE symposium on security and privacy (SP), pp. 739–753, IEEE, 2019.
[56] E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, and V. Shmatikov, “How to backdoor federated learning,” in International conference on artificial intelligence and statistics, pp. 2938–2948, PMLR, 2020.
[57] Y. Li, J. Zhang, J. Zhu, and W. Li, “Blockfd: blockchain-based federated distillation against poisoning attacks,” Neural Computing and Applications, pp. 1–16, 2024.
[58] W. Sun, B. Gao, K. Xiong, Y. Lu, and Y.Wang, “Vaguegan: A gan-based data poisoning attack against federated learning systems,” in 2023 20th Annual IEEE International Conference on Sensing, Communication, andNetworking (SECON), pp. 321–329, IEEE, 2023.
[59] S. Selvarajan and H. Mouratidis, “A quantum trust and consultative transaction-based blockchain cybersecurity model for healthcare systems,” Scientific Reports, vol. 13, no. 1, p. 7107, 2023.
[60] W. Yi Ming, L. Ge Hao, F. Li Yu, and P. Mao, “Research on block chain defense against malicious attack in federated learning,” in Proceedings of the 2021 3rd International Conference on Blockchain Technology, pp. 67–72, 2021.
[61] H. Qiu, M. Qiu, M. Liu, and G. Memmi, “Secure health data sharing for medical cyber-physical systems for the healthcare 4.0,” IEEE journal of biomedical and health informatics, vol. 24, no. 9, pp. 2499–2505, 2020.
[62] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang,“Deep learning with differential privacy,” in Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp. 308–318, 2016.
[63] Y. Aono, T. Hayashi, L. Wang, S. Moriai, et al., “Privacy-preserving deep learning via additively homomorphic encryption,” IEEE transactions on information forensics and security, vol. 13, no. 5, pp. 1333–1345, 2017.