簡易檢索 / 詳目顯示

研究生: 戴子期
Tai, Tzu-Chi
論文名稱: 基於CNN混合模型的相似融合度背景圖像分割之方法
A CNN Based Hybrid Model Method for Image Segmentation in Similar Fusion Background Image
指導教授: 賴槿峰
Lai, Chin-Feng
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 55
中文關鍵詞: 相似融合度背景圖像圖像分割圖像強化
外文關鍵詞: Similar Fusion Background Image, Image Segmentation, Image Enhancement
相關次數: 點閱:146下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究提出了相似融合度背景的圖像分割方法,主要演算法使用兩個卷積神經網路模型做圖像處理。相似融合度背景圖像指的是圖像中物件的顏色與紋路等特徵與背景相似,導致使用卷積層做為特徵提取方法的模型產生錯誤。

    相較於傳統直接使用卷積層做特徵提取,本研究先使用PyNET模型分別強化圖像中物件與背景的特徵,並將此強化圖像與原始圖像依照一定重疊權重進行重疊。藉由重疊圖像同時保有原始圖像與強化圖像的特徵,使卷積層易於從物件與背景提取不同的特徵,讓物件不會融合於背景中被忽略掉。為了確保最後的圖像分割會有最好的效果,本研究對圖像做兩次分割,第一次會將大致物件形狀切出,第二次則是細分物體形狀與正確分類該物件的標籤。

    於實驗結果中,本研究會以IoU得分做為指標,評估圖像分割模型U-Net、DeepLab、FCN的表現。於本研究的實驗一第一次分割中,U-Net模型有最好的表現,並於實驗二第二次分割中,該分割結果改善了相似融合度背景的圖像之圖像分割。觀察總得分可以發現,在處理相同的圖像時,使用U-Net模型相較於DeepLab模型,平均增加了20\%的IoU得分。

    In this study, we propose a convolutional neural network (CNN) based hybrid model method to improve the image segmentation of similar fusion background images, which means the features such as the color and texture of objects in the image are similar to the background. This leads to errors in image segmentation using convolutional layers as feature extraction.

    In order to address this problem, we use the PyNET model to enhance the features of the objects and the background in the image, and overlay this enhanced image with the original image according to a certain overlap weight. By overlapping the image while retaining the features of the original image and the enhanced image, the convolution layer can easily extract different features from the object and the background, so that the object will not be ignored in the background.

    In the experimental results, this study will use the Intersection over Union (IoU) score as an criteria to evaluate the performance of the image segmentation models. Our result shows that U-Net model got 20\% more than DeepLab got in the average score criterion.

    摘要 i 英文延伸摘要 ii 誌謝 vi 目錄 vii 表格 ix 圖片 x 縮寫表 xi 符號表 xii Chapter 1. 簡介 1 1.1. 研究動機 1 1.2. 研究目標 1 1.3. 章節提要 2 Chapter 2. 研究背景與相關文獻 3 2.1. 研究背景 3 2.1.1. 神經網路架構研究 3 2.1.2. 卷積神經網路研究 5 2.1.3. 混合模型研究 8 2.2. 圖像分割研究 8 2.2.1. 資料集與評估指標 9 2.2.2. 圖像分割研究 10 2.3. 相似融合度背景分割 13 2.3.1. 相似融合度背景分割之因與其問題 14 2.3.2. 近年相關研究 15 Chapter 3. 研究方法 16 3.1. 圖像強化 16 3.1.1. PyNET 16 3.1.2. 圖像重疊 19 3.2. 圖像分割 20 3.2.1. 全卷積網路 20 3.2.2. UNet 22 3.2.3. DeepLab 24 3.3. 測試資料集 27 Chapter 4. 研究結果與討論 29 4.1. 實驗設計 29 4.1.1. 實驗環境 29 4.1.2. 實驗流程 30 4.2. 實驗結果 32 4.2.1. 圖像分割比較實驗 32 4.2.2. 分割結果辨識實驗 44 4.3. 結果討論 47 Chapter 5. 結論與未來展望 48 5.1. 研究結論 48 5.2. 未來展望 49 References 50

    [1] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradientbased learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
    [2] S. Hochreiter and J. Schmidhuber, “Long shortterm memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
    [3] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, pp. 2672–2680, 2014.
    [4] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient graphbased image segmentation,” International journal of computer vision, vol. 59, no. 2, pp. 167–181, 2004.
    [5] L.C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image segmentation with deep convolutional nets and fully connected crfs,” arXiv preprint arXiv:1412.7062, 2014.
    [6] L.C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848, 2017.
    [7] L.C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” arXiv preprint arXiv:1706.05587, 2017.
    [8] L.C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoderdecoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European conference on computer vision (ECCV), pp. 801–818, 2018.
    [9] F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain.,” Psychological review, vol. 65, no. 6, p. 386, 1958.
    [10] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by backpropagating errors,” nature, vol. 323, no. 6088, pp. 533–536, 1986.
    [11] D. A. Pomerleau, “Alvinn: An autonomous land vehicle in a neural network,” in Advances in neural information processing systems, pp. 305–313, 1989.
    [12] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural computation, vol. 1, no. 4, pp. 541–551, 1989.
    [13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, pp. 1097–1105, 2012.
    [14] K. Simonyan and A. Zisserman, “Very deep convolutional networks for largescale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
    [15] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, 2015.
    [16] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.
    [17] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826, 2016.
    [18] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inceptionv4, inceptionresnet and the impact of residual connections on learning,” in Thirtyfirst AAAI conference on artificial intelligence, 2017.
    [19] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258, 2017.
    [20] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
    [21] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in European conference on computer vision, pp. 630–645, Springer, 2016.
    [22] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1492–1500, 2017.
    [23] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable architectures for scalable image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8697–8710, 2018.
    [24] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propagation,” tech. rep., California Univ San Diego La Jolla Inst for Cognitive Science, 1985.
    [25] S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules,” in Advances in neural information processing systems, pp. 3856–3866, 2017.
    [26] G. E. Hinton, S. Sabour, and N. Frosst, “Matrix capsules with em routing,” 2018.
    [27] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587, 2014.
    [28] R. Girshick, “Fast rcnn,” in Proceedings of the IEEE international conference on computer vision, pp. 1440–1448, 2015.
    [29] S. Ren, K. He, R. Girshick, and J. Sun, “Faster rcnn: Towards realtime object detection with region proposal networks,” in Advances in neural information processing systems, pp. 91–99, 2015.
    [30] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask rcnn,” in Proceedings of the IEEE international conference on computer vision, pp. 2961–2969, 2017.
    [31] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, realtime object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788, 2016.
    [32] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
    [33] A. Bochkovskiy, C.Y. Wang, and H.Y. M. Liao, “Yolov4: Optimal speed and accuracy of object detection,” arXiv preprint arXiv:2004.10934, 2020.
    [34] O. Alsing, “Mobile object detection using tensorflow lite and transfer learning,” 2018.
    [35] C.F. Lai, W.C. Chien, L. T. Yang, and W. Qiang, “Lstm and edge computing for big data feature recognition of industrial electrical equipment,” IEEE Transactions on Industrial Informatics, vol. 15, no. 4, pp. 2469–2477, 2019.
    [36] X. Jia, E. Gavves, B. Fernando, and T. Tuytelaars, “Guiding the longshort term memory model for image caption generation,” in Proceedings of the IEEE international conference on computer vision, pp. 2407–2415, 2015.
    [37] A. Graves, N. Jaitly, and A.r. Mohamed, “Hybrid speech recognition with deep bidirectional lstm,” in 2013 IEEE workshop on automatic speech recognition and understanding, pp. 273–278, IEEE, 2013.
    [38] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Advances in neural information processing systems, pp. 3104–3112, 2014.
    [39] P. Mishra, K. Khurana, S. Gupta, and M. K. Sharma, “Vmanalyzer: Malware semantic analysis using integrated cnn and bidirectional lstm for detecting vmlevel attacks in cloud,” in 2019 Twelfth International Conference on Contemporary Computing (IC3), pp. 1–6, IEEE, 2019.
    [40] S. Song, H. Huang, and T. Ruan, “Abstractive text summarization using lstmcnn based deep learning,” Multimedia Tools and Applications, vol. 78, no. 1, pp. 857–875, 2019.
    [41] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.
    [42] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoderdecoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.
    [43] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, pp. 5998–6008, 2017.
    [44] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio, “Show, attend and tell: Neural image caption generation with visual attention,” in International conference on machine learning, pp. 2048–2057, 2015.
    [45] M.T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attentionbased neural machine translation,” arXiv preprint arXiv:1508.04025, 2015.
    [46] Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy, “Hierarchical attention networks for document classification,” in Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp. 1480–1489, 2016.
    [47] Y. Cui, Z. Chen, S. Wei, S. Wang, T. Liu, and G. Hu, “Attentionoverattention neural networks for reading comprehension,” arXiv preprint arXiv:1607.04423, 2016.
    [48] J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin, “Convolutional sequence to sequence learning,” in Proceedings of the 34th International Conference on Machine LearningVolume 70, pp. 1243–1252, JMLR. org, 2017.
    [49] A. Oliva and A. Torralba, “Building the gist of a scene: The role of global image features in recognition,” Progress in brain research, vol. 155, pp. 23–36, 2006.
    [50] C. Vondrick, A. Khosla, T. Malisiewicz, and A. Torralba, “Hoggles: Visualizing object detection features,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 1–8, 2013.
    [51] D. G. Lowe, “Object recognition from local scaleinvariant features,” in Proceedings of the seventh IEEE international conference on computer vision, vol. 2, pp. 1150–1157, Ieee, 1999.
    [52] T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution grayscale and rotation invariant texture classification with local binary patterns,” IEEE Transactions on pattern analysis and machine intelligence, vol. 24, no. 7, pp. 971–987, 2002.
    [53] Y. Fan, X. Lu, D. Li, and Y. Liu, “Videobased emotion recognition using cnnrnn and c3d hybrid networks,” in Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 445–450, 2016.
    [54] Y. Hu, Y. Wong, W. Wei, Y. Du, M. Kankanhalli, and W. Geng, “A novel attentionbased hybrid cnnrnn architecture for semgbased gesture recognition,” PloS one, vol. 13, no. 10, 2018.
    [55] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and tell: A neural image caption generator,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3156–3164, 2015.
    [56] G. Trigeorgis, F. Ringeval, R. Brueckner, E. Marchi, M. A. Nicolaou, B. Schuller, and S. Zafeiriou, “Adieu features? endtoend speech emotion recognition using a deep convolutional recurrent network,” in 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5200–5204, IEEE, 2016.
    [57] Z. Zhang, D. Robinson, and J. Tepper, “Detecting hate speech on twitter using a convolutiongru based deep neural network,” in European semantic web conference, pp. 745–760, Springer, 2018.
    [58] Y. Lyu and X. Huang, “Road segmentation using cnn with gru,” arXiv preprint arXiv:1804.05164, 2018.
    [59] A. Nanduri and L. Sherry, “Anomaly detection in aircraft data using recurrent neural networks (rnn),” in 2016 Integrated Communications Navigation and Surveillance (ICNS), pp. 5C2–1, Ieee, 2016.
    [60] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440, 2015.
    [61] W. Liu, A. Rabinovich, and A. C. Berg, “Parsenet: Looking wider to see better,” arXiv preprint arXiv:1506.04579, 2015.
    [62] H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation,” in Proceedings of the IEEE international conference on computer vision, pp. 1520–1528, 2015.
    [63] O. Ronneberger, P. Fischer, and T. Brox, “Unet: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computerassisted intervention, pp. 234–241, Springer, 2015.
    [64] T.Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125, 2017.
    [65] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2881–2890, 2017.
    [66] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768, 2018.
    [67] H. Zhang, K. Dana, J. Shi, Z. Zhang, X. Wang, A. Tyagi, and A. Agrawal, “Context encoding for semantic segmentation,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 7151–7160, 2018.
    [68] R. Archibald, K. Chen, A. Gelb, and R. Renaut, “Improving tissue segmentation of human brain mri through preprocessing by the gegenbauer reconstruction method,” NeuroImage, vol. 20, no. 1, pp. 489–502, 2003.
    [69] D. Gottlieb, C.W. Shu, A. Solomonoff, and H. Vandeven, “On the gibbs phenomenon i: recovering exponential accuracy from the fourier partial sum of a nonperiodic analytic function,” Journal of Computational and Applied Mathematics, vol. 43, no. 12, pp. 81–98, 1992.
    [70] Y. Zhang and C. Zhang, “A new algorithm for character segmentation of license plate,” in IEEE IV2003 Intelligent Vehicles Symposium. Proceedings (Cat. No. 03TH8683), pp. 106–109, IEEE, 2003.
    [71] M. H. Jafari, N. Karimi, E. NasrEsfahani, S. Samavi, S. M. R. Soroushmehr, K. Ward, and K. Najarian, “Skin lesion segmentation in clinical images using deep learning,” in 2016 23rd International conference on pattern recognition (ICPR), pp. 337–342, IEEE, 2016.
    [72] K. He, J. Sun, and X. Tang, “Guided image filtering,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 6, pp. 1397–1409, 2012.
    [73] A. Ignatov, L. Van Gool, and R. Timofte, “Replacing mobile camera isp with a single deep learning model,” arXiv preprint arXiv:2002.05509, 2020.
    [74] D. G. Lowe, “Distinctive image features from scaleinvariant keypoints,” International journal of computer vision, vol. 60, no. 2, pp. 91–110, 2004.
    [75] A. Vedaldi and B. Fulkerson, “Vlfeat: An open and portable library of computer vision algorithms,” in Proceedings of the 18th ACM international conference on Multimedia, pp. 1469–1472, 2010.
    [76] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708, 2017.
    [77] P. Isola, J.Y. Zhu, T. Zhou, and A. A. Efros, “Imagetoimage translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134, 2017.
    [78] J. Johnson, A. Alahi, and L. FeiFei, “Perceptual losses for realtime style transfer and superresolution,” in European conference on computer vision, pp. 694–711, Springer, 2016.
    [79] A. Dosovitskiy, J. T. Springenberg, M. Riedmiller, and T. Brox, “Discriminative unsupervised feature learning with convolutional neural networks,” in Advances in neural information processing systems, pp. 766–774, 2014. [80] A. Adams, J. Baek, and M. A. Davis, “Fast highdimensional filtering using the permutohedral lattice,” in Computer Graphics Forum, vol. 29, pp. 753–762, Wiley Online Library, 2010.
    [81] T. Carneiro, R. V. M. Da Nóbrega, T. Nepomuceno, G.B. Bian, V. H. C. De Albuquerque, and P. P. Reboucas Filho, “Performance analysis of google colaboratory as a tool for accelerating deep learning applications,” IEEE Access, vol. 6, pp. 61677–61685, 2018.

    下載圖示 校內:2025-07-01公開
    校外:2025-07-01公開
    QR CODE