| 研究生: |
張修齊 Chang, Hsiu-Chi |
|---|---|
| 論文名稱: |
使用具備微調功能之編解碼器學習網路進行多對焦影像融合 Multi-focus Image Fusion Using Encoder-Decoder Learning Network with Refinement |
| 指導教授: |
連震杰
Lien, Jenn-Jier |
| 共同指導教授: |
郭淑美
Guo, Shu-Mei |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 英文 |
| 論文頁數: | 70 |
| 中文關鍵詞: | 多對焦影像融合 、深度卷積神經網路 、編解碼器學習網路 、特徵金字塔網路 、微調網路 |
| 外文關鍵詞: | Multi-focus Image Fusion, Deep Convolutional Neural Network, Encoder-Decoder Learning Network, Feature Pyramid Network, Refinement Network |
| 相關次數: | 點閱:79 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在影像處理領域,多對焦影像融合是一個有趣的議題。在以往,眾多的方法著重於使用繁複的數學模型來評估影像之清晰模糊程度,並以此為合成之參考來進行多對焦影像融合。近年來,由於深度學習的興起,出現不少以深度學習為基礎,所發展出的多對焦影像融合演算法。而在眾多的方法中,以Yu Liu等人所發表的「Multi-focus Image Fusion with A Deep Convolution Neural Network」最具代表性。其中包含使用切塊的影像訓練而成的編碼器網路(Encoder),及一連串對編碼器網路運算出的結果進行後處理的步驟。最終的目的是生成一張代表輸入影像清晰模糊程度的決策圖(Decision map)用以進行多對焦影像融合。
本研究以Yu Liu等人所提出的演算法為基礎提出了三個改善,針對其中網路架構及後處理方式進行修改,期望以更快的速度產生更好的決策圖。首先是改良編碼器網路,編碼器網路的主要目的為進行特徵的萃取,而本研究基於VGG16的架構改良編碼器網路,增加特徵萃取的深度,以期望可以獲得更具有語義的資訊;其次是在網路架構中加入解碼器(Decoder)網路,其學習對編碼器萃取之資訊進行解析並放大以產生初始決策圖。而在此之中,本研究亦參考了特徵金字塔網路(Feature Pyramid Network),將編碼器進行萃取的過程中的一些資訊直接傳遞至解碼器當中,以避免在進行萃取過程中的資訊遺失;最後,雖然在Encoder-Decoder架構針對輸入之語義有良好的解析,但在初始決策圖的邊緣細節上還有可以進步的空間。因此我參考並使用微調網路,將輸入及初始決策圖當成參考,進行初始決策圖的微調,以此方式產生最終的決策圖。
根據實驗結果,雖然本研究在合成結果評估指標上提升有限,但在運行速度上有大幅的提升。除此之外,本研究亦展示了在實驗過程中嘗試的不同編解碼器學習網路,及其運用相同訓練資料集進行訓練後所呈現的結果。我針對結果進行分析並且決定最終使用之網路模型。
In the field of image processing, multi-focus image fusion is an interesting topic. In the past, many algorithms focused on using complex mathematical models to evaluate the clearness and blurring of images. In recent years, due to the rise of deep learning, many multi-focus image fusion algorithms have been developed based on deep learning. Among the many algorithms, the "Multi-focus Image Fusion with A Deep Convolution Neural Network" published by Yu Liu et al. is the most representative. It includes an encoder network trained using patch-based images, and a series of steps for post-processing the score map calculated by the encoder network. The ultimate goal is to generate a decision map that representing the clarity and blurriness of the input image for multi-focus image fusion.
This study proposes three improvements based on the algorithm proposed by Yu Liu et al. The network architecture and post-processing methods are modified in the hope of producing better decision maps at a faster rate. Firstly, encoder is modified. The main purpose of the encoder is to extract features. In this study, the encoder is improved based on the VGG16 architecture to increase the depth of feature extraction in order to obtain more semantic information. Secondly, decoder is added to the network architecture. It learns to analyze and upsample the feature extracted by the encoder to generate the initial decision map. Among them, this study also refers to the Feature Pyramid Network (FPN), which passes some feature map from the encoder to the decoder directly to avoid loss of feature during the down sampling of encoder. Finally, although there is a good analysis of the input semantics in the encoder-decoder architecture, there is still room for improvement in the edge details of the initial decision map. Therefore, I refer to the refinement network, and use it to take the input and initial decision map as reference and refine the initial decision map to the final decision map.
According to the experimental results, although the research has limited improvement in the benchmark of multi-focus image fusion, it has greatly improved the operating speed. In addition, this study also shows the different encoder-decoder learning networks that were tried during the experiment and the results presented after training with the same training data set. I analyze the results and decide which network model to use.
[1] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with Atrous separable convolution for semantic image segmentation,” European Conference on Computer Vision, pp. 833–851, 2018.
[2] M. Everingham, L.V. Gool, C.K.I. Williams, J. Winn, and A. Zisserman, “The Pascal Visual Object Classes (VOC) Challenge,” International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, Sep. 2009.
[3] K. He , J. Sun and X. Tang , “Guided image filtering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 6, pp. 1397-1409, 2013
[4] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[5] Y. Horibe, “Entropy and correlation,” IEEE Transactions on Systems, Man, and Cybernetics, SMC-15, pp. 641-642, 1985.
[6] M. Hossny, S. Nahavandi, and D. Creighton, “Comments on ‘Information measure for performance of image fusion,’” Electronics Letters, vol. 44, no. 18, p. 1066, 2008.
[7] W. Huang and Z. Jing, “Evaluation of focus measures in multi-focus image fusion,” Pattern Recognition Letters, vol. 28, no. 4, pp. 493–500, 2007.
[8] G. Huang, Z. Liu, L.V.D. Maaten, and K.Q. Weinberger, “Densely connected convolutional networks,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[9] M.A. Islam, S. Naha, M. Rochan, N. Bruce and Y. Wang, “Label refinement network for coarse-to-fine semantic segmentation,” arXiv:1703.00551 [cs.CV], 2017.
[10] T.O. Kvalseth, “Entropy and correlation: some comments,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 17, no. 3, pp. 517–519, 1987.
[11] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C.L. Zitnick, “Microsoft COCO: common objects in context,” European Conference on Computer Vision, pp. 740–755, 2014.
[12] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[13] Y. Liu, X. Chen, H. Peng, and Z. Wang, “Multi-focus image fusion with a deep convolutional neural network,” Information Fusion, vol. 36, pp. 191–207, 2017.
[14] M. Nejati, S. Samavi, and S. Shirani, “Multi-focus image fusion using dictionary-based sparse representation,” Information Fusion, vol. 25, pp. 72–84, 2015.
[15] Odena, et al., “Deconvolution and checkerboard artifacts”, Distill, 2016. http://doi.org/10.23915/distill.00003
[16] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” International Conference on Learning Representations, 2015.
[17] H. Tang, B. Xiao, W. Li, and G. Wang, “Pixel convolutional neural network for multi-focus image fusion,” Information Sciences, vol. 433-434, pp. 125–141, 2018.
[18] D. Ulyanov, A. Vedaldi and V. Lempitsky, “Instance normalization: the missing ingredient for fast stylization,” arXiv:1607.08022 [cs.CV], 2016
[19] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
[20] W. Wang and F. Chang, “A multi-focus image fusion method based on Laplacian pyramid,” Journal of Computers, vol. 6, no. 12, Jan. 2011.
[21] C.S. Xydeas and V. Petrović, “Objective image fusion performance measure,” Electronics Letters, vol. 36, no. 4, p. 308, 2000.
[22] N. Xu, B. Price, S. Cohen, and T. Huang, “Deep image matting,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[23] Y. Yan, J. Du, Q. Li, M. Zuo, and J. Lee, “Multi-focus image fusion algorithm based on NSCT,” 2012 IEEE 2nd International Conference on Cloud Computing and Intelligence Systems, 2012.
[24] C. Yang, J.-Q. Zhang, X.-R. Wang, and X. Liu, “A novel similarity based quality metric for image fusion,” Information Fusion, vol. 9, no. 2, pp. 156–160, 2008.
[25] S. Zagoruyko and N. Komodakis, “Learning to compare image patches via convolutional neural networks,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
[26] J.-Y. Zhu, T. Park, P. Isola, and A.A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” 2017 IEEE International Conference on Computer Vision (ICCV), 2017.