| 研究生: |
劉彥甫 Liu, Yen-Fu |
|---|---|
| 論文名稱: |
基於知識蒸餾與微調技術實現高效率醫學影像分割基礎模型 EMSAM: Efficient Medical Image Segmentation Foundation Model, with Knowledge Distillation and Fine-tuning |
| 指導教授: |
賴槿峰
Lai, Chin-Feng |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2024 |
| 畢業學年度: | 112 |
| 語文別: | 中文 |
| 論文頁數: | 108 |
| 中文關鍵詞: | 基礎模型 、知識蒸餾 、醫療影像分割 |
| 外文關鍵詞: | foundation model, medical image segmentation, knowledge distillation |
| 相關次數: | 點閱:38 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著軟硬體科技的進步,能夠運用於各種下游任務的基礎模型於多個領域中取得了成功。影像分割領域中 SAM 是表現最優秀的模型,能夠在零樣本任務中取得與 SOTA 模型相當的結果,這樣的結果顯示出 SAM 的通用性以及強大的分割能力。然而,SAM 在專業領域上分割效果不佳且運算時間長,本研究提出了一個流程來解決這兩個問題,並訓練出 EMSAM 模型,分別是將 SAM 模型中的影像編碼器替換為 EfficientVIT-SAM,並透過知識蒸餾,將醫療影像分割基礎模型 MedSAM 的特徵萃取能力轉移至高效率模,再進行微調,並以選擇不確定點的方式精進模型的分割能力。
本研究分別蒐集 130858, 43166 筆影像遮罩對進行訓練與測試,包括七種不同形式的醫學影像,並且以 DSC、HD、NSD 三種評估指標進行模型評估,同時驗證每個步驟對於模型的提升。EMSAM 模型應用時間少於 MedSAM 的 1/8 ,且在 7 個測試資料集上的 DSC 評估指標與 MedSAM 的差距小於 0.02,這代表模型能夠在保持高精度的同時,提高了模型的運行效率。
With the advancement of hardware and software technology, foundation models that can be applied to various downstream tasks have achieved success in multiple fields. SAM is the best-performing model in image segmentation, capable of achieving results comparable to SOTA models in zero-shot tasks. This demonstrates SAM’s versatility and strong segmentation capability. However, SAM performs poorly in specialized domains and has long computation times.
This study proposes a process to address these two issues and has trained the EMSAM model. The steps include replacing the image encoder in the SAM model with EfficientVIT- SAM and transferring the feature extraction capability of the foundation model for medical image segmentation, MedSAM, to the efficient model through knowledge distillation. Subsequently, fine-tuning is performed, and the model’s segmentation capability is enhanced by training with uncertain points sampling.
This study collected 130,858 and 43,166 image-mask pairs for training and testing, respectively, including seven different types of medical images. The models were evaluated with three metrics: DSC, HD, and NSD, also the necessity of the model at each step was verified. The inference time was less than 1/8 of MedSAM, and the DSC evaluation metric of the EMSAM model on the seven test datasets showed a difference of less than 0.02 compared to MedSAM. This indicates that the model can maintain high accuracy while improving operational efficiency.
[1] J. De Fauw, J. R. Ledsam, B. Romera-Paredes, S. Nikolov, N. Tomasev, S. Blackwell, H. Askham, X. Glorot, B. O’Donoghue, D. Visentin, et al., “Clinically applicable deep learning for diagnosis and referral in retinal disease,” Nature medicine, vol. 24, no. 9, pp. 1342–1350, 2018.
[2] D. Ouyang, B. He, A. Ghorbani, N. Yuan, J. Ebinger, C. P. Langlotz, P. A. Heidenreich, R. A. Harrington, D. H. Liang, E. A. Ashley, et al., “Video-based ai for beat-to-beat assessment of cardiac function,” Nature, vol. 580, no. 7802, pp. 252–256, 2020.
[3] N. Byrne, M. Velasco Forte, A. Tandon, I. Valverde, and T. Hussain, “A systematic review of image segmentation methodology, used in the additive manufacture of patient-specific 3d printed models of the cardiovascular system,” JRSM cardiovascular disease, vol. 5, p. 2048004016645467, 2016.
[4] H. Greenspan, A. Madabhushi, P. Mousavi, S. Salcudean, J. Duncan, T. SyedaMahmood, and R. Taylor, Medical Image Computing and Computer Assisted Intervention–MICCAI 2023: 26th International Conference, Vancouver, BC, Canada, October 8–12, 2023, Proceedings, Part V, vol. 14224. Springer Nature, 2023.
[5] J. C. Caicedo, A. Goodman, K. W. Karhohs, B. A. Cimini, J. Ackerman, M. Haghighi, C. Heng, T. Becker, M. Doan, C. McQuin, et al., “Nucleus segmentation across imaging experiments: the 2018 data science bowl,” Nature methods, vol. 16, no. 12, pp. 1247–1253, 2019.
[6] Z. Lambert, C. Petitjean, B. Dubray, and S. Kuan, “Segthor: Segmentation of thoracic organs at risk in ct images,” in 2020 Tenth International Conference on Image Processing Theory, Tools and Applications (IPTA), pp. 1–6, IEEE, 2020.
[7] M. Antonelli, A. Reinke, S. Bakas, K. Farahani, A. Kopp-Schneider, B. A. Landman, G. Litjens, B. Menze, O. Ronneberger, R. M. Summers, et al., “The medical segmentation decathlon,” Nature communications, vol. 13, no. 1, p. 4128, 2022.
[8] F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, and K. H. Maier-Hein, “nnu-net: a self-configuring method for deep learning-based biomedical image segmentation,” Nature methods, vol. 18, no. 2, pp. 203–211, 2021.
[9] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European conference on computer vision (ECCV), pp. 801–818, 2018.
[10] H. El-Hariri, L. A. S. M. Neto, P. Cimflova, F. Bala, R. Golan, A. Sojoudi, C. Duszynski, I. Elebute, S. H. Mousavi, W. Qiu, et al., “Evaluating nnu-net for early ischemic change segmentation on non-contrast computed tomography in patients with acute ischemic stroke,” Computers in biology and medicine, vol. 141, p. 105033, 2022.
[11] F. R. Kolbinger, F. M. Rinner, A. C. Jenke, M. Carstens, S. Krell, S. Leger, M. Distler, J. Weitz, S. Speidel, and S. Bodenstedt, “Anatomy segmentation in laparoscopic surgery: comparison of machine learning and human expertise–an experimental study,” International Journal of Surgery, vol. 109, no. 10, pp. 2962–2974, 2023.
[12] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, et al., “Segment anything,” inProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4015–4026, 2023.
[13] J. Ma, Y. He, F. Li, L. Han, C. You, and B. Wang, “Segment anything in medical images,” Nature Communications, vol. 15, no. 1, p. 654, 2024.
[14] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
[15] Z. Zhang, H. Cai, and S. Han, “Efficientvit-sam: Accelerated segment anything model without performance loss,” arXiv preprint arXiv:2402.05008, 2024.
[16] E. Nguyen, H. Liu, and D. Ruan, “Necessity and impact of specialization of large foundation model for medical segmentation tasks,” bioRxiv, pp. 2024–06, 2024.
[17] R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill, et al., “On the opportunities and risks of foundation models,” arXiv preprint arXiv:2108.07258, 2021.
[18] X. Wang, G. Chen, G. Qian, P. Gao, X.-Y. Wei, Y. Wang, Y. Tian, and W. Gao, “Large-scale multi-modal pre-trained models: A comprehensive survey,” Machine Intelligence Research, vol. 20, no. 4, pp. 447–482, 2023.
[19] P. P. Liang, A. Zadeh, and L.-P. Morency, “Foundations and trends in multimodal machine learning: Principles, challenges, and open questions,” arXiv preprint arXiv:2209.03430, 2022.
[20] M. Xie, N. Jean, M. Burke, D. Lobell, and S. Ermon, “Transfer learning from deep features for remote sensing and poverty mapping,” in Proceedings of the AAAI conference on artificial intelligence, vol. 30, 2016.
[21] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
[22] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
[23] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” Journal of machine learning research, vol. 21, no. 140, pp. 1–67, 2020.
[24] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
[25] J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, et al., “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774, 2023.
[26] https://openai.com/index/chatgpt/. [Accessed 04-07-2024].
[27] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., “Learning transferable visual models from natural 71 language supervision,” in International conference on machine learning, pp. 8748– 8763, PMLR, 2021.
[28] C. Jia, Y. Yang, Y. Xia, Y.-T. Chen, Z. Parekh, H. Pham, Q. Le, Y.-H. Sung, Z. Li, and T. Duerig, “Scaling up visual and vision-language representation learning with noisy text supervision,” in International conference on machine learning, pp. 4904–4916, PMLR, 2021.
[29] K. Sofiiuk, I. A. Petrov, and A. Konushin, “Reviving iterative training with mask guidance for interactive segmentation,” in 2022 IEEE International Conference on Image Processing (ICIP), pp. 3141–3145, IEEE, 2022.
[30] Y. Li, H. Mao, R. Girshick, and K. He, “Exploring plain vision transformer backbones for object detection,” in European conference on computer vision, pp. 280–296, Springer, 2022.
[31] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision– ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755, Springer, 2014.
[32] A. Gupta, P. Dollar, and R. Girshick, “Lvis: A dataset for large vocabulary instance segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5356–5364, 2019.
[33] C. Zhang, L. Liu, Y. Cui, G. Huang, W. Lin, Y. Yang, and Y. Hu, “A comprehensive survey on segment anything model for vision and beyond,” arXiv preprint arXiv:2305.08196, 2023. 72
[34] C. Zhang, F. D. Puspitasari, S. Zheng, C. Li, Y. Qiao, T. Kang, X. Shan, C. Zhang, C. Qin, F. Rameau, et al., “A survey on segment anything model (sam): Vision foundation model meets prompt engineering,” arXiv preprint arXiv:2306.06211, 2023.
[35] T. Yu, R. Feng, R. Feng, J. Liu, X. Jin, W. Zeng, and Z. Chen, “Inpaint anything: Segment anything meets image inpainting,” arXiv preprint arXiv:2304.06790, 2023.
[36] D. Xie, R. Wang, J. Ma, C. Chen, H. Lu, D. Yang, F. Shi, and X. Lin, “Edit everything: A text-guided generative system for images editing,” arXiv preprint arXiv:2304.14006, 2023.
[37] S. Liu, J. Ye, and X. Wang, “Any-to-any style transfer: Making picasso and da vinci collaborate,” arXiv preprint arXiv:2304.09728, 2023.
[38] J. Yang, M. Gao, Z. Li, S. Gao, F. Wang, and F. Zheng, “Track anything: Segment anything meets videos,” arXiv preprint arXiv:2304.11968, 2023.
[39] Y. Cheng, L. Li, Y. Xu, X. Li, Z. Yang, W. Wang, and Y. Yang, “Segment and track anything,” arXiv preprint arXiv:2305.06558, 2023.
[40] Z. Ma, X. Hong, and Q. Shangguan, “Can sam count anything? an empirical study on sam counting,” arXiv preprint arXiv:2304.10817, 2023.
[41] J. Zhang, Z. Zhou, G. Mai, L. Mu, M. Hu, and S. Li, “Text2seg: Remote sensing image semantic segmentation via text-guided visual foundation models,” arXiv preprint arXiv:2304.10597, 2023. 73
[42] S. Liu, Z. Zeng, T. Ren, F. Li, H. Zhang, J. Yang, C. Li, J. Yang, H. Su, J. Zhu, et al., “Grounding dino: Marrying dino with grounded pre-training for open-set object detection,” arXiv preprint arXiv:2303.05499, 2023.
[43] J. Fan, “Gpt-3 moment in computer vision.” https://twitter.com/DrJimFan, 2023.
[44] L. Tang, H. Xiao, and B. Li, “Can sam segment anything? when sam meets camouflaged object detection,” arXiv preprint arXiv:2304.04709, 2023.
[45] M. Ahmadi, A. G. Lonbar, A. Sharifi, A. T. Beris, M. Nouri, and A. S. Javidi, “Application of segment anything model for civil infrastructure defect assessment,” arXiv preprint arXiv:2304.12600, 2023.
[46] D. Williams, F. Macfarlane, and A. Britten, “Leaf only sam: a segment anything pipeline for zero-shot automated leaf segmentation,” arXiv preprint arXiv:2305.09418, 2023.
[47] J. Chen and X. Bai, “Learning to" segment anything" in thermal infrared images through knowledge distillation with a large scale dataset satir,” arXiv preprint arXiv:2304.07969, 2023.
[48] Y. Zhang and R. Jiao, “How segment anything model (sam) boost medical image segmentation: A survey,” Available at SSRN 4495221, 2023.
[49] J. Wu, W. Ji, Y. Liu, H. Fu, M. Xu, Y. Xu, and Y. Jin, “Medical sam adapter: Adapting segment anything model for medical image segmentation,” arXiv preprint arXiv:2304.12620, 2023. 74
[50] P. Shi, J. Qiu, S. M. D. Abaxi, H. Wei, F. P.-W. Lo, and W. Yuan, “Generalist vision foundation models for medical imaging: A case study of segment anything model on zero-shot medical segmentation,” Diagnostics, vol. 13, no. 11, p. 1947, 2023.
[51] S. He, R. Bao, J. Li, P. E. Grant, and Y. Ou, “Accuracy of segment-anything model (sam) in medical image segmentation tasks,” arXiv preprint arXiv:2304.09324, vol. 2, 2023.
[52] C. Hu, T. Xia, S. Ju, and X. Li, “When sam meets medical images: An investigation of segment anything model (sam) on multi-phase liver tumor segmentation,” arXiv preprint arXiv:2304.08506, 2023.
[53] T. Zhou, Y. Zhang, Y. Zhou, Y. Wu, and C. Gong, “Can sam segment polyps?,” arXiv preprint arXiv:2304.07583, 2023.
[54] S. He, R. Bao, J. Li, J. Stout, A. Bjornerud, P. E. Grant, and Y. Ou, “Computervision benchmark segment-anything model (sam) in medical images: Accuracy in 12 datasets,” arXiv preprint arXiv:2304.09324, 2023.
[55] A. Wang, M. Islam, M. Xu, Y. Zhang, and H. Ren, “Sam meets robotic surgery: an empirical study on generalization, robustness and adaptation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–244, Springer, 2023.
[56] C. Mattjie, L. V. De Moura, R. Ravazio, L. Kupssinskü, O. Parraga, M. M. Delucis, and R. C. Barros, “Zero-shot performance of the segment anything model (sam) in 2d medical imaging: A comprehensive evaluation and practical guidelines,” in 2023 75 IEEE 23rd International Conference on Bioinformatics and Bioengineering (BIBE), pp. 108–112, IEEE, 2023.
[57] D. Cheng, Z. Qin, Z. Jiang, S. Zhang, Q. Lao, and K. Li, “Sam on medical images: A comprehensive study on three prompt modes,” arXiv preprint arXiv:2305.00035, 2023.
[58] S. Roy, T. Wald, G. Koehler, M. R. Rokuss, N. Disch, J. Holzschuh, D. Zimmerer, and K. H. Maier-Hein, “Sam. md: Zero-shot medical image segmentation capabilities of the segment anything model,” arXiv preprint arXiv:2304.05396, 2023.
[59] Y. Huang, X. Yang, L. Liu, H. Zhou, A. Chang, X. Zhou, R. Chen, J. Yu, J. Chen, C. Chen, et al., “Segment anything model for medical images?,” Medical Image Analysis, vol. 92, p. 103061, 2024.
[60] R. Deng, C. Cui, Q. Liu, T. Yao, L. W. Remedios, S. Bao, B. A. Landman, L. E. Wheless, L. A. Coburn, K. T. Wilson, et al., “Segment anything model (sam) for digital pathology: Assess zero-shot segmentation on whole slide imaging,” arXiv preprint arXiv:2304.04155, 2023.
[61] S. Mohapatra, A. Gosai, and G. Schlaug, “Sam vs bet: A comparative study for brain extraction and segmentation of magnetic resonance images using deep learning,” arXiv preprint arXiv:2304.04738, 2023.
[62] Y. Zhang, Z. Shen, and R. Jiao, “Segment anything model for medical image segmentation: Current applications and future directions,” Computers in Biology and Medicine, p. 108238, 2024. 76
[63] W. Xie, N. Willems, S. Patil, Y. Li, and M. Kumar, “Sam fewshot finetuning for anatomical segmentation in medical images,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3253–3261, 2024.
[64] M. Jiang, J. Zhou, J. Wu, T. Wang, Y. Jin, and M. Xu, “Uncertainty-aware adapter: Adapting segment anything model (sam) for ambiguous medical image segmentation,” arXiv preprint arXiv:2403.10931, 2024.
[65] K. Zhang and D. Liu, “Customized segment anything model for medical image segmentation,” arXiv preprint arXiv:2304.13785, 2023.
[66] E. J. Hu, Y. Shen, P.Wallis, Z. Allen-Zhu, Y. Li, S.Wang, L.Wang, andW. Chen, “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021.
[67] H. Wang, S. Guo, J. Ye, et al., “Sam-med3d,” arXiv preprint arXiv:2310.15161, 2023.
[68] W. Lei, X. Wei, X. Zhang, K. Li, and S. Zhang, “Medlsam: Localize and segment anything model for 3d medical images,” arXiv preprint arXiv:2306.14752, 2023.
[69] C. Chen, J. Miao, D. Wu, Z. Yan, S. Kim, J. Hu, A. Zhong, Z. Liu, L. Sun, X. Li, et al., “Ma-sam: Modality-agnostic sam adaptation for 3d medical image segmentation,” arXiv preprint arXiv:2309.08842, 2023.
[70] Y. Shen, J. Li, X. Shao, B. I. Romillo, A. Jindal, D. Dreizin, and M. Unberath, “Fastsam3d: An efficient segment anything model for 3d volumetric medical images,” arXiv preprint arXiv:2403.09827, 2024. 77
[71] H. Gu, H. Dong, J. Yang, and M. A. Mazurowski, “How to build the best medical image segmentation algorithm using foundation models: a comprehensive empirical study with segment anything model,” arXiv preprint arXiv:2404.09957, 2024.
[72] X. Wei, J. Cao, Y. Jin, M. Lu, G. Wang, and S. Zhang, “I-medsam: Implicit medical image segmentation with segment anything,” arXiv preprint arXiv:2311.17081, 2023.
[73] J. Gou, B. Yu, S. J. Maybank, and D. Tao, “Knowledge distillation: A survey,” International Journal of Computer Vision, vol. 129, no. 6, pp. 1789–1819, 2021.
[74] C. Zhang, D. Han, Y. Qiao, J. U. Kim, S.-H. Bae, S. Lee, and C. S. Hong, “Faster segment anything: Towards lightweight sam for mobile applications,” arXiv preprint arXiv:2306.14289, 2023.
[75] K. Wu, J. Zhang, H. Peng, M. Liu, B. Xiao, J. Fu, and L. Yuan, “Tinyvit: Fast pretraining distillation for small vision transformers,” in European Conference on Computer Vision, pp. 68–85, Springer, 2022.
[76] H. Shu, W. Li, Y. Tang, Y. Zhang, Y. Chen, H. Li, Y. Wang, and X. Chen, “Tinysam: Pushing the envelope for efficient segment anything model,” arXiv preprint arXiv:2312.13789, 2023.
[77] Y. Xiong, B. Varadarajan, L. Wu, X. Xiang, F. Xiao, C. Zhu, X. Dai, D. Wang, F. Sun, F. Iandola, et al., “Efficientsam: Leveraged masked image pretraining for efficient segment anything,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16111–16121, 2024. 78
[78] H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers & distillation through attention,” in International conference on machine learning, pp. 10347–10357, PMLR, 2021.
[79] K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16000–16009, 2022.
[80] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255, Ieee, 2009.
[81] A. Wang, H. Chen, Z. Lin, J. Han, and G. Ding, “Repvit-sam: Towards real-time segmenting anything,” arXiv preprint arXiv:2312.05760, 2023.
[82] C. Zhou, X. Li, C. C. Loy, and B. Dai, “Edgesam: Prompt-in-the-loop distillation for on-device deployment of sam,” arXiv preprint arXiv:2312.06660, 2023.
[83] A. Wang, H. Chen, Z. Lin, J. Han, and G. Ding, “Repvit: Revisiting mobile cnn from vit perspective,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15909–15920, 2024.
[84] H. Cai, J. Li, M. Hu, C. Gan, and S. Han, “Efficientvit: Lightweight multi-scale attention for high-resolution dense prediction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 17302–17313, 2023.
[85] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022, 2021. 79
[86] B. Graham, A. El-Nouby, H. Touvron, P. Stock, A. Joulin, H. Jégou, and M. Douze, “Levit: a vision transformer in convnet’s clothing for faster inference,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 12259–12269, 2021.
[87] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
[88] M. Tan and Q. Le, “Efficientnetv2: Smaller models and faster training,” in International conference on machine learning, pp. 10096–10106, PMLR, 2021.
[89] A. Kirillov, Y. Wu, K. He, and R. Girshick, “Pointrend: Image segmentation as rendering,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9799–9808, 2020.
[90] J. Ma, J. Chen, M. Ng, R. Huang, Y. Li, C. Li, X. Yang, and A. L. Martel, “Loss odyssey in medical image segmentation,” Medical Image Analysis, vol. 71, p. 102035, 2021.
[91] F. Milletari, N. Navab, and S.-A. Ahmadi, “V-net: Fully convolutional neural networks for volumetric medical image segmentation,” in 2016 fourth international conference on 3D vision (3DV), pp. 565–571, Ieee, 2016.
[92] S. A. Taghanaki, Y. Zheng, S. K. Zhou, B. Georgescu, P. Sharma, D. Xu, D. Comaniciu, and G. Hamarneh, “Combo loss: Handling input and output imbalance in multi-organ segmentation,” Computerized Medical Imaging and Graphics, vol. 75, pp. 24–33, 2019. 80
[93] A. Reinke, M. D. Tizabi, C. H. Sudre, M. Eisenmann, T. Rädsch, M. Baumgartner, L. Acion, M. Antonelli, T. Arbel, S. Bakas, et al., “Common limitations of image processing metrics: A picture story,” arXiv preprint arXiv:2104.05642, 2021.
[94] L. Maier-Hein, M. Eisenmann, A. Reinke, S. Onogur, M. Stankovic, P. Scholz, T. Arbel, H. Bogunovic, A. P. Bradley, A. Carass, et al., “Why rankings of biomedical image analysis competitions should be interpreted with care,” Nature communications, vol. 9, no. 1, p. 5217, 2018.
[95] D. P. Huttenlocher, G. A. Klanderman, and W. J. Rucklidge, “Comparing images using the hausdorff distance,” IEEE Transactions on pattern analysis and machine intelligence, vol. 15, no. 9, pp. 850–863, 1993.
[96] “TWCC — twcc.ai.” https://www.twcc.ai/. [Accessed 20-06-2024].
[97] “Ieee standard for robustness testing and evaluation of artificial intelligence (ai)-based image recognition service,” IEEE Std 3129-2023, pp. 1–34, 2023.
[98] P. Gravel, G. Beaudoin, and J. A. De Guise, “A method for modeling noise in medical images,” IEEE Transactions on medical imaging, vol. 23, no. 10, pp. 1221–1232, 2004.
[99] H. R. Roth, Z. Xu, C. Tor-Díez, R. S. Jacob, J. Zember, J. Molto, W. Li, S. Xu, B. Turkbey, E. Turkbey, et al., “Rapid artificial intelligence solutions in a pandemic—the covid-19-20 lung ct lesion segmentation challenge,” Medical image analysis, vol. 82, p. 102605, 2022. 81
[100] P. An, S. Xu, S. A. Harmon, E. B. Turkbey, T. H. Sanford, A. Amalou, M. Kassin, N. Varble, M. Blain, V. Anderson, et al., “Ct images in covid-19 [data set],” The Cancer Imaging Archive, vol. 10, 2020.
[101] J. Ma, Y. Wang, X. An, C. Ge, Z. Yu, J. Chen, Q. Zhu, G. Dong, J. He, Z. He, et al., “Toward data-efficient learning: A benchmark for covid-19 ct lung and infection segmentation,” Medical physics, vol. 48, no. 3, pp. 1197–1210, 2021.
[102] X. Zhuang, “Multivariate mixture model for myocardial segmentation combining multi-source images,” IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 12, pp. 2933–2946, 2018.
[103] X. Zhuang and J. Shen, “Multi-scale patch and multi-modality atlases for whole heart segmentation of mri,” Medical image analysis, vol. 31, pp. 77–87, 2016.
[104] X. Luo and X. Zhuang, “An n-dimensional information-theoretic framework for groupwise registration and deep combined computing,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
[105] X. Zhuang, L. Li, C. Payer, D. Štern, M. Urschler, M. P. Heinrich, J. Oster, C. Wang, Ö. Smedby, C. Bian, et al., “Evaluation of algorithms for multi-modality whole heart segmentation: an open-access grand challenge,” Medical image analysis, vol. 58, p. 101537, 2019.
[106] V. M. Campello, P. Gkontra, C. Izquierdo, C. Martin-Isla, A. Sojoudi, P. M. Full, K. Maier-Hein, Y. Zhang, Z. He, J. Ma, et al., “Multi-centre, multi-vendor and multidisease cardiac segmentation: the m&ms challenge,” IEEE Transactions on Medical Imaging, vol. 40, no. 12, pp. 3543–3554, 2021. 82
[107] A. L. Simpson, M. Antonelli, S. Bakas, M. Bilello, K. Farahani, B. Van Ginneken, A. Kopp-Schneider, B. A. Landman, G. Litjens, B. Menze, et al., “A large annotated medical image dataset for the development and evaluation of segmentation algorithms,” arXiv preprint arXiv:1902.09063, 2019.
[108] “Chest X-Ray Images with Pneumothorax Masks — kaggle.com.” https://www.kaggle.com/datasets/vbookshelf/ pneumothorax-chest-xray-images-and-masks. [Accessed 13-06-2024].
[109] A. M. Tahir, M. E. Chowdhury, Y. Qiblawey, A. Khandakar, T. Rahman, S. Kiranyaz, U. Khurshid, N. Ibtehaz, S. Mahmud, and M. Ezeddin, “Covid-qu-ex dataset (2022),” URL https://www. kaggle. com/dsv/3122958, vol. 11, 2022.
[110] J. Shiraishi, S. Katsuragawa, J. Ikezoe, T. Matsumoto, T. Kobayashi, K.-i. Komatsu, M. Matsui, H. Fujita, Y. Kodera, and K. Doi, “Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists’ detection of pulmonary nodules,” American journal of roentgenology, vol. 174, no. 1, pp. 71–74, 2000.
[111] A. Degerli, S. Kiranyaz, M. E. Chowdhury, and M. Gabbouj, “Osegnet: Operational segmentation network for covid-19 detection using chest x-ray images,” in 2022 IEEE International Conference on Image Processing (ICIP), pp. 2306–2310, IEEE, 2022.
[112] S. Vitale, J. I. Orlando, E. Iarussi, and I. Larrabide, “Improving realism in patientspecific abdominal ultrasound simulation using cyclegans,” International journal of computer assisted radiology and surgery, vol. 15, no. 2, pp. 183–192, 2020. 83
[113] W. Al-Dhabyani, M. Gomaa, H. Khaled, and A. Fahmy, “Dataset of breast ultrasound images,” Data in brief, vol. 28, p. 104863, 2020.
[114] S. Leclerc, E. Smistad, J. Pedrosa, A. Østvik, F. Cervenansky, F. Espinosa, T. Espeland, E. A. R. Berg, P.-M. Jodoin, T. Grenier, et al., “Deep learning for segmentation using an open large-scale dataset in 2d echocardiography,” IEEE transactions on medical imaging, vol. 38, no. 9, pp. 2198–2210, 2019.
[115] Y. Song, J. Zheng, L. Lei, Z. Ni, B. Zhao, and Y. Hu, “Ct2us: Cross-modal transfer learning for kidney segmentation in ultrasound images with synthesized data,” Ultrasonics, vol. 122, p. 106706, 2022.
[116] Y. Lu, M. Zhou, D. Zhi, M. Zhou, X. Jiang, R. Qiu, Z. Ou, H. Wang, D. Qiu, M. Zhong, et al., “The jnu-ifm dataset for segmenting pubic symphysis-fetal head,” Data in Brief, vol. 41, p. 107904, 2022.
[117] “Ultrasound Nerve Segmentation — kaggle.com.” https://www.kaggle.com/ competitions/ultrasound-nerve-segmentation/data. [Accessed 13-06- 2024].
[118] P. Ngoc Lan, N. S. An, D. V. Hang, D. V. Long, T. Q. Trung, N. T. Thuy, and D. V. Sang, “Neounet: Towards accurate colon polyp segmentation and neoplasm detection,” in Advances in Visual Computing: 16th International Symposium, ISVC 2021, Virtual Event, October 4-6, 2021, Proceedings, Part II, pp. 15–28, Springer, 2021.
[119] S. Ali, D. Jha, N. Ghatwary, S. Realdon, R. Cannizzaro, O. E. Salem, D. Lamarque, C. Daul, M. A. Riegler, K. V. Anonsen, et al., “A multi-centre polyp detection and 84 segmentation dataset for generalisability assessment,” Scientific Data, vol. 10, no. 1, p. 75, 2023.
[120] J. I. Orlando, H. Fu, J. B. Breda, K. Van Keer, D. R. Bathula, A. Diaz-Pinto, R. Fang, P.-A. Heng, J. Kim, J. Lee, et al., “Refuge challenge: A unified framework for evaluating automated methods for glaucoma assessment from fundus photographs,” Medical image analysis, vol. 59, p. 101570, 2020.
[121] “HuBMAP + HPA - Hacking the Human Body — kaggle.com.” https://www. kaggle.com/competitions/hubmap-organ-segmentation. [Accessed 13-06- 2024].
[122] “HuBMAP - Hacking the Kidney — kaggle.com.” https://www.kaggle.com/ competitions/hubmap-kidney-segmentation/data?select=test. [Accessed 13-06-2024].
[123] S. P. Morozov, A. E. Andreychenko, N. Pavlov, A. Vladzymyrskyy, N. Ledikhova, V. Gombolevskiy, I. A. Blokhin, P. Gelezhe, A. Gonchar, and V. Y. Chernina, “Mosmeddata: Chest ct scans with covid-19 related findings dataset,” arXiv preprint arXiv:2005.06465, 2020.
[124] M. Jun, G. Cheng, W. Yixin, A. Xingle, G. Jiantao, Y. Ziqi, Z. Minqing, L. Xin, D. Xueyuan, C. Shucheng, et al., “Covid-19 ct lung and infection segmentation dataset,” 2020.
[125] P. Bilic, P. Christ, H. B. Li, E. Vorontsov, A. Ben-Cohen, G. Kaissis, A. Szeskin, C. Jacobs, G. E. H. Mamani, G. Chartrand, et al., “The liver tumor segmentation benchmark (lits),” Medical Image Analysis, vol. 84, p. 102680, 2023. 85
[126] B. Rister, D. Yi, K. Shivakumar, T. Nobashi, and D. L. Rubin, “Ct-org, a new dataset for multiple organ segmentation in computed tomography,” Scientific Data, vol. 7, no. 1, p. 381, 2020.
[127] O. Bernard, A. Lalande, C. Zotti, F. Cervenansky, X. Yang, P.-A. Heng, I. Cetin, K. Lekadir, O. Camara, M. A. G. Ballester, et al., “Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved?,” IEEE transactions on medical imaging, vol. 37, no. 11, pp. 2514–2525, 2018.
[128] T. Rahman, A. Khandakar, Y. Qiblawey, A. Tahir, S. Kiranyaz, S. B. A. Kashem, M. T. Islam, S. Al Maadeed, S. M. Zughaier, M. S. Khan, et al., “Exploring the effect of image enhancement techniques on covid-19 detection using chest x-ray images,” Computers in biology and medicine, vol. 132, p. 104319, 2021.
[129] M. E. Chowdhury, T. Rahman, A. Khandakar, R. Mazhar, M. A. Kadir, Z. B. Mahbub, K. R. Islam, M. S. Khan, A. Iqbal, N. Al Emadi, et al., “Can ai help in screening viral and covid-19 pneumonia?,” Ieee Access, vol. 8, pp. 132665–132676, 2020.
[130] L. Pedraza, C. Vargas, F. Narváez, O. Durán, E. Muñoz, and E. Romero, “An open access thyroid ultrasound image database,” in 10th International symposium on medical information processing and analysis, vol. 9287, pp. 188–193, SPIE, 2015.
[131] D. Jha, P. H. Smedsrud, M. A. Riegler, P. Halvorsen, T. De Lange, D. Johansen, and H. D. Johansen, “Kvasir-seg: A segmented polyp dataset,” in MultiMedia Modeling: 26th International Conference, MMM 2020, Daejeon, South Korea, January 5–8, 2020, Proceedings, Part II 26, pp. 451–462, Springer, 2020. 86
[132] O. Kovalyk, J. Morales-Sánchez, R. Verdú-Monedero, I. Sellés-Navarro, A. PalazónCabanes, and J.-L. Sancho-Gómez, “Papila: Dataset with fundus images and clinical data of both eyes of the same patient for glaucoma assessment,” Scientific Data, vol. 9, no. 1, p. 291, 2022.
[133] K. Sirinukunwattana, J. P. Pluim, H. Chen, X. Qi, P.-A. Heng, Y. B. Guo, L. Y. Wang, B. J. Matuszewski, E. Bruni, U. Sanchez, et al., “Gland segmentation in colon histology images: The glas challenge contest,” Medical image analysis, vol. 35, pp. 489–502, 2017.
校內:2027-08-13公開