成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	劉承威 Liu, Cheng-Wei
論文名稱：	基於圖像區塊的骨閃爍攝影骨骼區域分割方法 Patch-based Bone Segmentation for Bone Scan
指導教授：	藍崑展 Lan, Kun-Chan
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 醫學資訊研究所 Institute of Medical Informatics
論文出版年：	2025
畢業學年度：	114
語文別：	英文
論文頁數：	103
中文關鍵詞：	骨骼掃描、關鍵點定位、區塊切分、骨分割任務
外文關鍵詞：	bone scan, key point registration, patching, bone segmentation
相關次數：	點閱：12 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

骨掃描(Bone scan)，又稱骨骼閃爍造影(Skeletal Scintigraphy)，是一種核醫學影像檢查。相較於X-ray和CT，它能更敏銳且有效地顯現潛在病灶。這種高敏感度的特性有助於醫師及早發現病變，具備極高的臨床價值。檢查過程中，需將標記有放射性物質 (通常為99mTc-MDP) 的藥劑經靜脈注射至患者體內。由於Gamma-ray光子發射具有非定向擴散的特性，骨掃描影像的邊界對比度往往較為模糊。儘管如此，針對骨掃描進行骨骼分割 (Bone Segmentation) 在臨床上仍不可或缺，因為它是實現自動化計算Bone Scan Index(BSI)的基礎，這對於手術規劃等醫療應用至關重要。既有的骨掃描分割研究在準確度上仍有進步空間。舉例來說，單階段多類別分割雖是一種直觀的策略，但其中一個相關研究的最佳F1-score僅達0.90，其主要瓶頸在於「類別混淆」，即模型難以區分不同的骨骼類別而限制了整體效能。同樣地，採用固定尺寸Patch分割策略的研究，其最佳Dice Coefficient僅達0.8920，主要是受限於定位不準確影響模型的判讀成效。我們在此提出了一套包含五個階段的新策略:預處理、關鍵點預測、區塊分割、分割預測以及重組回原始維度。首先，預處理階段我們利用Contrast Limited Adaptive Histogram Equalization (CLAHE)技術增強影像品質，在不過度放大雜訊的前提下提升細微特徵的可見度。接著，關鍵點模型識別影像上的關鍵解剖標的，作為後續步驟的基準點。隨後，依據這些關鍵點將全身影像劃分為特定的區域Patch。進入分割預測階段後，專用的分割模型會逐一處理這些Patch，藉由聚焦於骨骼區域來識別骨骼結構。最後，在重組回原始維度階段將所有分割完成的Patch重新組合成完整的全身分割圖。我們透過與Ground Truth進行比對計算Mean Dice Coefficient，結果顯示此方法在骨掃描數據上取得了0.9480的最佳Mean Dice Score。此流程對低空間解析度、對比度受限的平面骨掃描影像提供了實用策略。

Bone scan, also known as skeletal scintigraphy, is a nuclear medicine imaging test that can sensitively and effectively visualize potential lesion comparing to X-ray and CT. The sensitivity of bone scan imaging helps the doctors to identify illness early, which is beneficial. It is conducted by injecting a tracer labelled with a radioactive substance (commonly 99mTc-MDP) intravenously into the patient. Bone scan images often exhibit blurry boundary contrast due to the non-directional divergence of gamma-ray photon emission. Nevertheless, bone segmentation remains indispensable for bone scan in clinical usage since it can provide automation for Bone Scan Index (BSI) calculation, which is crucial for medical applications such as surgical planning. Previous research on bone segmentation from bone scan data can be improved in accuracy. For instance, a single-stage multi-class segmentation strategy is a straightforward approach, but the latest research using this method achieved the best F1-score of only 0.90. The primary issue here is class confusion, where the model struggles to distinguish between different bone categories, thereby limiting its overall performance. Similarly, studies employing a fixed-size patch division strategy reported the best Dice Coefficient of 0.8920, primarily due to inaccuracies in localization, which limit the model’s effectiveness. We propose a new strategy consisting of five stages: preprocessing, key point prediction, patching, segmentation prediction, and reconstruction. First, in the preprocessing stage, we enhance the input image quality using Contrast Limited Adaptive Histogram Equalization (CLAHE), which improves visibility of subtle details without overamplifying noise. Next, in the key point prediction stage, key point model identifies critical anatomical landmarks on the enhanced image, serving as reference points for subsequent steps. Then, during patch division, an algorithm uses these key points to divide the body into specific regional patches. In the segmentation model prediction stage, dedicated segmentation model processes each of these patches individually to outline the bone structures within them, leveraging the localized focus to improve precision. Finally, in the reconstruction stage, another algorithm reassembles all the segmented patches into a complete, full-body segmentation map. We evaluated this pipeline by calculating the Mean Dice Coefficient and comparing it against ground truth labels, considering predictions across all bone regions to assess overall performance. We successfully achieved the best Mean Dice Coefficient 0.9480 on bone scan data. This approach provides a practical strategy for handling images with lower spatial resolution, potentially reducing the workload on physicians.

中文摘要 i
Abstract iii
Contents v
List of Tables viii
List of Figures x
Introduction 1
Related Work 6
1 Previous works on keypoint detection 6
2 Previous works on semantic segmentation 9 
3 Previous works on bone segmentation 13
4 Previous works on patch-based segmentation 15
Methods 19
1 Pipeline Structure 19
2 Bone Region Definition 20
3 Data Preprocessing 22
3.1 BS-80K Dataset  22
3.2 CLAHE Enhancement 23
3.3 Keypoint Annotation 26
3.4 Segmentation Mask Annotation 28
4 Keypoint Detection Model Training 30
4.1 Environment Setup 32
4.2 Data Registration for RTMPose framework 33
4.3 Training Setting 34
5 Patching 34
5.1 Cropping Region Of Interest with Keypoint 35
5.2 Bounding Rectangle Calculation and Placement 35
5.3 Cropping Approach 36
5.4 Foreground-Background Balancing 39
6  Segmentation Model Training 40
6.1 Environment Setup for U-Net 41
6.2 Data Registration for U-Net 41
6.3 Training Setting for U-Net 42
7 Reconstruction 42
Experimental Result 45
1 Key Factors 46
1.1 Preprocessing: CLAHE Clip Limit 46
1.2 Keypoint Detection: Cost Functions 46
1.3 Segmentation: Model Architecture 47
1.4 Inside Patching Mechanism: Foreground-Background Ratio 51
2 General Training Setting toward Model Training 52
2.1 General Training Setting toward Keypoint Detection Model 52
2.2 General Training Setting toward U-Net 52
2.3 General Training Setting toward TransUNet (ablation study) 52
3 Settings of The Best Patch-Based Bone Segmentation Result 53
3.1 Performance Evaluation 53
3.2 Baseline Experiment for Comparison 54
3.3 The Best Training Setting of Keypoint Detection Model 57
3.4 The Best Training Setting of Segmentation Model 58
3.5 Ablation Study Setting toward Clip Limit 64
3.6 Ablation Study Setting toward Cost Function of Keypoint Detection Model 71
3.7 Ablation Study Setting toward Segmentation Model Architecture 72
3.8 Ablation Study Setting toward Foreground-Background Ratio 77
Conclusion 82
Limitation & Future Work 83
REFERENCES 86
                                    

Rachmawati, E., Azhad, M., Kusuma, I. B. I., Chrysti, Y. R., & Kamila, N. (n.d.). From segmentation to biomarker quantification: A deep learning framework for metastases detection in bone scans. Ida Bagus Indrabudhi and Chrysti, Yolanda Rahma and Kamila, Nasywa, From Segmentation to Biomarker Quantification: A Deep Learning Framework for Metastases Detection in Bone Scans.
Yu, P.-N., Lai, Y.-C., Chen, Y.-Y., & Cheng, D.-C. (2023a). Skeleton segmentation on bone scintigraphy for bsi computation. Diagnostics, 13(13), 2302.
Huang, K., Huang, S., Chen, G., Li, X., Li, S., Liang, Y., & Gao, Y. (2022). An end-to-end multi-task system of automatic lesion detection and anatomical localization in whole-body bone scintigraphy by deep learning. Bioinformatics, 39(1), btac753. https://doi.org/10.1093/bioinformatics/btac753
Gu, H., Colglazier, R., Dong, H., Zhang, J., Chen, Y., Yildiz, Z., Chen, Y., Li, L., Yang, J., Willhite, J., et al. (2025). Segmentanybone: A universal model that segments any bone at any location on mri. Medical Image Analysis, 101, 103469.
Li, J., Bian, S., Zeng, A., Wang, C., Pang, B., Liu, W., & Lu, C. (2021). Human pose regression with residual log-likelihood estimation. Proceedings of the IEEE/CVF international conference on computer vision, 11025–11034.
YUAN, Y., Fu, R., Huang, L., Lin, W., Zhang, C., Chen, X., & Wang, J. (2021). Hrformer: High-resolution vision transformer for dense predict. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, & J. W. Vaughan (Eds.), Advances in neural information processing systems (pp. 7281–7293, Vol. 34). Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2021/file/3bbfdde8842a5c44a0323518eec97cbe-Paper.pdf
Sun, K., Xiao, B., Liu, D., & Wang, J. (2019). Deep high-resolution representation learning for human pose estimation. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 5693–5703.
Mao, W., Ge, Y., Shen, C., Tian, Z., Wang, X., Wang, Z., & den Hengel, A. v. (2022). Poseur: Direct human pose regression with transformers. European conference on computer vision, 72–88.
Artacho, B., & Savakis, A. (2021). Omnipose: A multi-scale framework for multi-person pose estimation. arXiv preprint arXiv:2103.10180.
Cai, Y., Wang, Z., Luo, Z., Yin, B., Du, A., Wang, H., Zhang, X., Zhou, X., Zhou, E., & Sun, J. (2020). Learning delicate local representations for multi-person pose estimation. European conference on computer vision, 455–472.
Xu, Y., Zhang, J., Zhang, Q., & Tao, D. (2022). Vitpose: Simple vision transformer baselines for human pose estimation. Advances in neural information processing systems, 35, 38571–38584.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Learning Representations. https://openreview.net/forum?id=YicbFdNTTy
Li, Y., Yang, S., Liu, P., Zhang, S., Wang, Y., Wang, Z., Yang, W., & Xia, S.-T. (2022). Simcc: A simple coordinate classification perspective for human pose estimation. European Conference on Computer Vision, 89–106.
Jiang, T., Lu, P., Zhang, L., Ma, N., Han, R., Lyu, C., Li, Y., & Chen, K. (2023). Rtmpose: Real-time multi-person pose estimation based on mmpose. arXiv preprint arXiv:2303.07399.
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition, 3431–3440.
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. International Conference on Medical image computing and computer-assisted intervention, 234–241.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Oktay, O., Schlemper, J., Folgoc, L. L., Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla, N. Y., Kainz, B., et al. (2018). Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999.
Strudel, R., Garcia, R., Laptev, I., & Schmid, C. (2021). Segmenter: Transformer for semantic segmentation. Proceedings of the IEEE/CVF international conference on computer vision, 7262–7272.
Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). Nnu-net: A self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203–211.
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J. M., & Luo, P. (2021). Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in neural information processing systems, 34, 12077–12090.
Cheng, B., Misra, I., Schwing, A. G., Kirillov, A., & Girdhar, R. (2022). Masked-attention mask transformer for universal image segmentation. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 1290–1299.
Hatamizadeh, A., Nath, V., Tang, Y., Yang, D., Roth, H. R., & Xu, D. (2021). Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. International MICCAI brainlesion workshop, 272–284.
Chen, J., Mei, J., Li, X., Lu, Y., Yu, Q., Wei, Q., Luo, X., Xie, Y., Adeli, E., Wang, Y., et al. (2024). Transunet: Rethinking the u-net architecture design for medical image segmentation through the lens of transformers. Medical Image Analysis, 97, 103280.
Ma, J., He, Y., Li, F., Han, L., You, C., & Wang, B. (2024). Segment anything in medical images. Nature Communications, 15(1), 654.
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C., Lo, W.-Y., et al. (2023). Segment anything. Proceedings of the IEEE/CVF international conference on computer vision, 4015–4026.
Rossi, M., Marsilio, L., Mainardi, L., Manzotti, A., & Cerveri, P. (2022). Cel-unet: Distance weighted maps and multi-scale pyramidal edge extraction for accurate osteoarthritic bone segmentation in ct scans. Frontiers in Signal Processing, Volume 2 - 2022. https://doi.org/10.3389/frsip.2022.857313
Yu, P.-N., Lai, Y.-C., Chen, Y.-Y., & Cheng, D.-C. (2023b). Skeleton segmentation on bone scintigraphy for bsi computation. Diagnostics, 13(13). https://doi.org/10.3390/diagnostics13132302
Humayun, A., Rehman, M., & Liu, B. (2024). A method framework of semi-automatic knee bone segmentation and reconstruction from computed tomography (ct) images. Quantitative Imaging in Medicine and Surgery, 14(10). https://qims.amegroups.org/article/view/129165
Kao, P.-Y., Shailja, S., Jiang, J., Zhang, A., Khan, A., Chen, J. W., & Manjunath, B. (2020). Improving patch-based convolutional neural networks for mri brain tumor segmentation by leveraging location information. Frontiers in neuroscience, 13, 1449.
Tang, Y., Gao, R., Lee, H. H., Han, S., Chen, Y., Gao, D., Nath, V., Bermudez, C., Savona, M. R., Abramson, R. G., et al. (2021). High-resolution 3d abdominal segmentation with random patch network fusion. Medical image analysis, 69, 101894.
Tsai, C.-C., Wu, T.-H., & Lai, S.-H. (2022). Multi-scale patch-based representation learning for image anomaly detection and segmentation. Proceedings of the IEEE/CVF winter conference on applications of computer vision, 3992–4000.
Qadri, S. F., Lin, H., Shen, L., Ahmad, M., Qadri, S., Khan, S., Khan, M., Zareen, S. S., Akbar, M. A., Bin Heyat, M. B., et al. (2023). Ct-based automatic spine segmentation using patch-based deep learning. International Journal of Intelligent Systems, 2023(1), 2345835.
Xu, J., Xiang, L., Hang, R., & Wu, J. (2014). Stacked sparse autoencoder (ssae) based framework for nuclei patch classification on breast cancer histopathology. 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI), 999–1002. https://doi.org/10.1109/ISBI.2014.6868041
Chen, W., Han, Y., Ashraf, M. A., Liu, J., Zhang, M., Su, F., Huang, Z., & Wong, K. K. (2024). A patch-based deep learning mri segmentation model for improving efficiency and clinical examination of the spinal tumor. Journal of Bone Oncology, 49, 100649.
Huang, Z., Pu, X., Tang, G., Ping, M., Jiang, G., Wang, M., Wei, X., & Ren, Y. (2022). Bs-80k: The first large open-access dataset of bone scan images. Computers in Biology and Medicine, 151, 106221. https://doi.org/https://doi.org/10.1016/j.compbiomed.2022.106221
Mmpose. (2025). OpenMMLab. Retrieved October 7, 2025, from https://mmpose.readthedocs.io/en/latest/
Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The annals of mathematical statistics, 22(1), 79–86.
Baker, S., Roth, S., Scharstein, D., Black, M. J., Lewis, J., & Szeliski, R. (2007). A database and evaluation methodology for optical flow. 2007 IEEE 11th International Conference on Computer Vision, 1–8. https://doi.org/10.1109/ICCV.2007.4408903
Vision transformer. (2025). Google Research. Retrieved October 7, 2025, from https://github.com/google-research/vision_transformer
Transunet. (2025). Beckschen. Retrieved October 7, 2025, from https://github.com/Beckschen/TransUNet
Riquelme, C., Puigcerver, J., Mustafa, B., Neumann, M., Jenatton, R., Susano Pinto, A., Keysers, D., & Houlsby, N. (2021). Scaling vision with sparse mixture of experts. Advances in Neural Information Processing Systems, 34, 8583–8595.

簡易檢索 / 詳目顯示

相關論文