簡易檢索 / 詳目顯示

研究生: 彭偉倫
Peng, Wei-Lun
論文名稱: 基於生成式數據增強與遷移學習提升角膜內皮細胞影像分割性能之研究
Enhancing Corneal Endothelial Cell Image Segmentation via Generative Data Augmentation and Transfer Learning
指導教授: 陳牧言
Chen, Mu-Yen
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 68
中文關鍵詞: 目標偵測影像生成遷移學習影像分割分水嶺演算法
外文關鍵詞: object detection, image generation, transfer learning, image segmentation, watershed algorithm
相關次數: 點閱:21下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 角膜內皮細胞分割高度依賴精確標註,但手動繪製邊界既耗時又費力,加之特定病變樣本(如滴狀變化 guttata)稀少且形態多樣,導致可用真實資料極度匱乏。為此,本研究提出一種創新的兩階段式訓練流程:在第一階段,利用從文本到影像的擴散模型(Text-to-Image Stable Diffusion)根據「有無 guttata」等文字提示生成三類合成影像—包含滴狀變化之內皮影像、不含滴狀變化之內皮影像以及相應的細胞邊界標註影像。隨後,通過 LoRA(Low-Rank Adaptation)對擴散模型進行微調,並引入 ControlNet 架構,以真實邊界標註作為結構化控制信號,從而產生成對的合成資料集。第二階段先在大規模合成資料上對 Unet(pure convolutional, Unet)、TransUnet(convolutional with hybrid self-attention, TransUnet) 及 SwinUnet(pure self-attention, SwinUnet)三種不同架構進行預訓練,然後以有限的真實標註資料執行遷移學習,校正合成影像與真實影像之間的域差。此外,前處理階段採用 ResNet50 結合 K-means 叢聚與 Silhouette Score 篩選異常影像;為精確辨識 guttata,則嵌入 YOLO 系列模型自動定位並分類滴狀變化。
    實驗結果表明,隨預訓練所用合成資料量的增加,分割模型性能顯著提升,驗證了生成式數據對模型表現的強化作用;但當合成資料量不足時,噪聲反而會抑制模型性能。此外,在 Unet、TransUnet與 SwinUnet三種架構的比較中,Unet 表現最佳,顯示對於角膜內皮影像而言,局部特徵比全局上下文更為關鍵。最後,儘管 Watershed 後處理能改善細胞邊界的連續性,但整體 Dice 分數約下降 3%,表明其在精度與邊界連貫性之間存在一定折衷。綜上所述,本研究所提訓練框架在資料稀缺場景下有效提升分割精度,並為模型架構選擇與後處理策略提供了實證依據。

    Corneal endothelial cell segmentation heavily relies on precise annotations, but manually delineating boundaries is both time-consuming and labor-intensive. Moreover, specific pathological patterns (e.g., guttata) are rare and morphologically diverse, resulting in extremely limited availability of real-world samples. To address this challenge, we propose a novel two-stage training pipeline. In the first stage, a Text-to-Image Stable Diffusion model is employed to generate three categories of synthetic images—endothelial images with guttata, endothelial images without guttata, and corresponding cellular boundary annotations—based on textual prompts such as “presence or absence of guttata.” Subsequently, the diffusion model is fine-tuned using LoRA (Low-Rank Adaptation) and augmented with a ControlNet architecture, leveraging authentic boundary annotations as structured control signals to produce paired synthetic datasets. In the second stage, we pretrain Unet, TransUnet, and SwinUnet architectures on the large-scale synthetic dataset and then perform transfer learning on a limited real-world annotated dataset to bridge the domain gap between synthetic and genuine images. Additionally, the preprocessing phase incorporates ResNet50 combined with K-means clustering and Silhouette Score to filter out anomalous images, while a YOLO-based detection model is integrated to accurately localize and classify guttata.
    Experimental results demonstrate that increasing the amount of synthetic pretraining data significantly enhances segmentation performance, confirming the efficacy of synthetic data augmentation. However, insufficient synthetic data can introduce noise that impairs model performance. Among Unet (pure convolutional), TransUnet (convolutional with hybrid self-attention), and SwinUnet (pure self-attention) architectures, Unet achieves the best performance, indicating that local features are more critical than global context for corneal endothelial images. Finally, although applying a Watershed post-processing step improves boundary continuity, it results in an approximate 3% decrease in overall Dice score, highlighting a trade-off between accuracy and boundary coherence. In summary, the proposed training framework effectively improves segmentation accuracy in data-scarce scenarios and provides empirical guidance for model architecture selection and post-processing strategies.

    摘要 I Abstract II 誌謝 VII 目錄 VIII 表目錄 XI 圖目錄 XII 第一章、緒論 1 1.1. 研究背景與動機 1 1.2. 研究目的 2 1.3. 章節摘要 4 第二章、文獻探討 5 2.1. 目標檢測 5 2.1.1 YOLOv8 6 2.1.2 YOLOv11 6 2.1.3 YOLOv12 7 2.1.4 RT-DETR 7 2.2. 影像生成 8 2.2.1 擴散模型 8 2.2.2 LoRA 8 2.2.3 ControlNet 9 2.3. 深度學習模型 10 2.3.1 Resnet50 10 2.3.2 Unet 10 2.3.3 TransUnet 11 2.3.4 SwinUnet 11 2.4. 分水嶺演算法 12 2.5. K-means與Silhouette score 12 2.6. 遷移學習 13 第三章、研究方法 14 3.1. 研究架構 14 3.2. 資料前處理與資料生成 15 3.2.1. 資料前處理 15 3.2.2. 資料生成 16 3.3. 影像分割模型訓練 17 3.4. 損失函數 24 3.5. 評估指標 25 3.5.1. 目標檢測模型之評估指標 25 3.5.2. 影像生成模型之評估指標 26 3.5.3. 影像分割模型之評估指標 27 第四章、實驗設計與結果分析 29 4.1. 實驗環境 29 4.2. 目標檢測模型之實驗結果 30 4.3. 影像生成模型之實驗結果 34 4.3.1. 生成角膜內皮細胞影像之實驗結果 34 4.3.2. 生成角膜內皮細胞標註影像之實驗結果 38 4.3.3. 由角膜內皮細胞標註影像生成細胞影像之實驗結果 40 4.4. 影像分割模型之實驗結果 41 4.5. 討論 47 第五章、結論與未來展望 48 5.1. 結論 48 5.2. 研究限制 48 5.3. 未來展望 48 參考文獻 50

    [1]. Bourne, W.M. (2003). Biology of the corneal endothelium in health and disease. Eye, 17, 912-918.
    [2]. Sridhar, M.S. (2018). Anatomy of cornea and ocular surface. Indian Journal of Ophthalmology, 66, 190 - 194.
    [3]. Elbaz, U., Mireskandari, K., Tehrani, N.N., Shen, C., Khan, M.S., Williams, S., & Ali, A. (2017). Corneal Endothelial Cell Density in Children: Normative Data From Birth to 5 Years Old. American journal of ophthalmology, 173, 134-138 .
    [4]. Nelson, L.R., Hodge, D.O., & Bourne, W.M. (1997). Central corneal endothelial cell changes over a ten-year period. Investigative ophthalmology & visual science, 38 3, 779-82 .
    [5]. Ong Tone, S., & Jurkunas, U.V. (2019). Imaging the Corneal Endothelium in Fuchs Corneal Endothelial Dystrophy. Seminars in Ophthalmology, 34, 340 - 346.
    [6]. Hamill, C.E., Schmedt, T., & Jurkunas, U.V. (2013). Fuchs Endothelial Cornea Dystrophy: A Review of the Genetics Behind Disease Development. Seminars in Ophthalmology, 28, 281 - 286.
    [7]. Karmakar, R., Nooshabadi, S.V., & Eghrari, A.O. (2022). Mobile-CellNet: Automatic Segmentation of Corneal Endothelium Using an Efficient Hybrid Deep Learning Model. Cornea, 42, 456 - 463.
    [8]. Kucharski, A., & Fabijańska, A. (2023). Corneal endothelial image segmentation training data generation using GANs. Do experts need to annotate? Biomed. Signal Process. Control., 85, 104985.
    [9]. Chaurasia, S., & Vanathi, M. (2021). Specular microscopy in clinical practice. Indian Journal of Ophthalmology, 69, 517 - 524.
    [10]. Patel, S.V., McLaren, J.W., Bachman, L.A., & Bourne, W.M. (2010). Comparison of Flex-Center, Center, and Corner Methods of Corneal Endothelial Cell Analysis. Cornea, 29, 1042-1047.
    [11]. Huang, J., Maram, J., Tepelus, T.C., Modak, C., Marion, K.M., Sadda, S.R., Chopra, V., & Lee, O.L. (2017). Comparison of manual & automated analysis methods for corneal endothelial cell density measurements by specular microscopy. Journal of Optometry, 11, 182 - 191.
    [12]. Sierra, J.S., Pineda, J., Rueda, D., Tello, A., Prada, A.M., Galvis, V., Volpe, G., Millán, M.S., Romero, L.A., & Marrugo, A.G. (2022). Corneal endothelium assessment in specular microscopy images with Fuchs' dystrophy via deep regression of signed distance maps. Biomedical optics express, 14 1, 335-351 .
    [13]. Kucharski, A., & Fabijańska, A. (2021). CNN-watershed: A watershed transform with predicted markers for corneal endothelium image segmentation. Biomed. Signal Process. Control., 68, 102805.
    [14]. Fabijańska, A. (2018). Segmentation of corneal endothelium images using a U-Net-based convolutional neural network. Artificial intelligence in medicine, 88, 1-13 .
    [15]. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (Vol. 30). Curran Associates, Inc.
    [16]. Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. North American Chapter of the Association for Computational Linguistics.
    [17]. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ArXiv, abs/2010.11929.
    [18]. Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., Torr, P.H., & Zhang, L. (2020). Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 6877-6886.
    [19]. Zhang, Y., Higashita, R., Fu, H., Xu, Y., Zhang, Y., Liu, H., Zhang, J., & Liu, J. (2021). A Multi-Branch Hybrid Transformer Network for Corneal Endothelial Cell Segmentation. ArXiv, abs/2106.07557.
    [20]. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.C., & Bengio, Y. (2021). Generative Adversarial Networks. 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), 1-7.
    [21]. Mendoza, K.D., Sierra, J.S., Tello, A., Galvis, V., Romero, L.A., & Marrugo, A.G. (2022). Generative Adversarial Networks for Cell Segmentation in Human Corneal Endothelium. Imaging and Applied Optics Congress 2022 (3D, AOA, COSI, ISA, pcAOP).
    [22]. Dussy Lachaud, E., Caunes, A., Thuret, G., & Gavet, Y. (2021). Digital twins of human corneal endothelium from generative adversarial networks. International Conference on Quality Control by Artificial Vision.
    [23]. Ho, J., Jain, A., & Abbeel, P. (2020). Denoising Diffusion Probabilistic Models. ArXiv, abs/2006.11239.
    [24]. Nichol, A., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., McGrew, B., Sutskever, I., & Chen, M. (2021). GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. International Conference on Machine Learning.
    [25]. Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning Transferable Visual Models From Natural Language Supervision. International Conference on Machine Learning.
    [26]. Hu, J.E., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., & Chen, W. (2021). LoRA: Low-Rank Adaptation of Large Language Models. ArXiv, abs/2106.09685.
    [27]. Zhang, L., Rao, A., & Agrawala, M. (2023). Adding Conditional Control to Text-to-Image Diffusion Models. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 3813-3824.
    [28]. Redmon, J., Divvala, S.K., Girshick, R.B., & Farhadi, A. (2015). You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779-788.
    [29]. Glenn Jocher. YOLOv8. https://github.com/ultralytics/ultralytics/tree/main, 2023
    [30]. Glenn Jocher. YOLOv11. https://github.com/ultralytics, 2024.
    [31]. Tian, Y., Ye, Q., & Doermann, D. (2025). YOLOv12: Attention-Centric Real-Time Object Detectors. ArXiv, abs/2502.12524.
    [32]. Lv, W., Xu, S., Zhao, Y., Wang, G., Wei, J., Cui, C., Du, Y., Dang, Q., & Liu, Y. (2023). DETRs Beat YOLOs on Real-time Object Detection. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 16965-16974.
    [33]. Zhao, Z., Zheng, P., Xu, S., & Wu, X. (2018). Object Detection With Deep Learning: A Review. IEEE Transactions on Neural Networks and Learning Systems, 30, 3212-3232.
    [34]. Girshick, R.B., Donahue, J., Darrell, T., & Malik, J. (2013). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition, 580-587.
    [35]. Ren, S., He, K., Girshick, R.B., & Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149
    [36]. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C., & Berg, A.C. (2015). SSD: Single Shot MultiBox Detector. European Conference on Computer Vision.
    [37]. Song, J., Meng, C., & Ermon, S. (2020). Denoising Diffusion Implicit Models. ArXiv, abs/2010.02502..
    [38]. Song, Y., Sohl-Dickstein, J.N., Kingma, D.P., Kumar, A., Ermon, S., & Poole, B. (2020). Score-Based Generative Modeling through Stochastic Differential Equations. ArXiv, abs/2011.13456.
    [39]. Kirkpatrick, J., Pascanu, R., Rabinowitz, N.C., Veness, J., Desjardins, G., Rusu, A.A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., Hassabis, D., Clopath, C., Kumaran, D., & Hadsell, R. (2016). Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114, 3521 - 3526.
    [40]. He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778.
    [41]. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. ArXiv, abs/1505.04597.
    [42]. Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., Yuille, A.L., & Zhou, Y. (2021). TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. ArXiv, abs/2102.04306.
    [43]. Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., & Wang, M. (2021). Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. ECCV Workshops.
    [44]. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 9992-10002.
    [45]. Beucher, S., & Lantuejoul, C. (1979, September). Use of watersheds in contour detection. Paper presented at the International Workshop on Image Processing: Real-Time Edge and Motion Detection/Estimation, Rennes, France.
    [46]. Vincent, L.M., & Soille, P. (1991). Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations. IEEE Trans. Pattern Anal. Mach. Intell., 13, 583-598.
    [47]. Hartigan, J. A., & Wong, M. A. (1979). A K-Means Clustering Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1), 100–108.
    [48]. Rousseeuw, P.J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53-65.
    [49]. Pan, S.J., & Yang, Q. (2010). A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering, 22, 1345-1359.
    [50]. Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks? ArXiv, abs/1411.1792.
    [51]. Sørensen, T. (1948). A method of establishing groups of equal amplitude in plant sociology based on similarity of species content, and its application to analyses of the vegetation on Danish commons. Kongelige Danske Videnskabernes Selskab, Biologiske Skrifter, 5(4), 1–34.
    [52]. Milletarì, F., Navab, N., & Ahmadi, S. (2016). V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. 2016 Fourth International Conference on 3D Vision (3DV), 565-571.
    [53]. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. Neural Information Processing Systems.
    [54]. Binkowski, M., Sutherland, D.J., Arbel, M., & Gretton, A. (2018). Demystifying MMD GANs. ArXiv, abs/1801.01401.
    [55]. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2015). Rethinking the Inception Architecture for Computer Vision. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2818-2826.
    [56]. Taipei Veterans General Hospital. From https://www.vghtpe.gov.tw/Index.action

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE