簡易檢索 / 詳目顯示

研究生: 洪偉倫
Hong, Wei-Lun
論文名稱: 基於深度學習與可變形卷積之雙視角圖生成多視角圖網路
A Deep Learning Network for Stereoview to Multiview Generation by Using Deformable Convolution
指導教授: 楊家輝
Yang, Jar-Ferr
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電腦與通信工程研究所
Institute of Computer & Communication Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 53
中文關鍵詞: 多視角圖可調整尺度的可變形卷積深度學習裸眼3D
外文關鍵詞: multiview, adjustable scale deformable convolution, deep learning, autostereoscopy
相關次數: 點閱:74下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著人類對於影視感受的需求增加,3D領域的發展也日益進步。其中在裸眼3D領域,如何產生多視角圖給3D裸眼電視使用是一個蠻大的問題。若想直接在拍攝時就獲得多視角圖,則需使用多台固定視角的攝影機來進行拍攝,但各鏡頭的曝光以及誤差皆會影響其品質。而若是以一彩圖一深度圖的方式錄製則需透過基於深度影像生成渲染技術來使視圖進行偏移,來生成多視角影像。但深度圖的取得不易,且兩階段的基於深度影像生成渲染技術也需要一定品質的偏移技術及補洞技術才能達到好的效果。因此在本論文中我們提出了一雙視角圖生成多視角圖系統,藉由深度學習進行可調整尺度的可變形卷積運算來達成端對端的像素點位移效果。透過分享權重的編碼器來萃取輸入雙視角圖的特徵,再經過多尺度特徵融合模塊來混合及篩選不同尺寸的特徵,接著送入可變形捲積之參數估計網路估算執行可變形捲積所需要的偏移量及遮罩,最後再執行可調整尺度的可變形卷積運算並透過路徑選擇機制來生成最後的多視角圖輸出。我們也在系統中設計了一視角因子的輸入,使用者即可依據視角因子來控制特定角度的結果來輸出。由實驗數據與結果圖顯示,此架構能夠生產出高品質且人眼感受度良好之多視角圖結果。

    3D images and videos have advanced with the increasing technologies for visual experience. For autostereoscopic exhibitions, the generation of multiview images for 3D naked-eye displays is one of the major issues. In cases of one color image and one depth image, the views can be warped by depth image-based rendering (DIBR) to create multiview images. However, depth maps are not easy to obtain and the two-stage DIBR requires high quality of warping and hole-filling techniques to achieve good results. Therefore, we propose a stereoview to multiview generation system which uses deep learning to perform the adjustable scale deformable convolution to achieve end-to-end pixel shifting. Offsets and masks required for the execution of deformable convolution are estimated using a deep convolutional neural network, and the multiview outputs are generated finally by performing adjustable scale deformable convolution with path selection mechanism. We also designed a view angle control parameter as one of the input for the system so that the user can control the output of a specific angle based on the parameter. The experimental data shows that this system is capable of producing high quality multiview results with comfortable human eye sensation.

    摘要 I Abstract II 誌謝 III Contents IV List of Tables VII List of Figures VIII Chapter 1 Introduction 1 1.1 Research Background 1 1.2 Motivations 4 1.3 Thesis Organization 5 Chapter 2 Related Work 6 2.1 Deformable Convolution 6 2.2 Video Frame Interpolation (VFI) 8 2.2.1 Kernel-based Interpolation 8 2.2.2 Flow-based Interpolation 9 2.2.3 Hybrid Methods 10 2.3 Squeeze-and-excitation Block (SE Block) 11 2.4 Pyramid Pooling Module (PPM) 12 2.5 Generative Adversarial Networks (GAN) 12 2.6 Feature Loss 13 Chapter 3 Stereoview to Multiview Generation System 14 3.1 Overview of the Proposed S2M System 15 3.2 Feature Extraction Network (FEN) 16 3.2.1 Shared Encoder 16 3.2.2 Multiscale Feature Fusion Block (MFFB) 18 3.3 Deformable Convolution Parameter Network (DCPN) 19 3.3.1 View Factor 19 3.3.2 Offset Estimator and Mask Estimator 20 3.4 View Generator 21 3.4.1 Adjustable Scale Deformable Convolution 22 3.4.2 Path Selection Mechanism 24 3.4.3 Discriminator 26 3.5 Network Training 27 3.5.1 Interpolation 28 3.5.2 Extrapolation 28 3.5.3 Accelerated Inference Mechanism 29 3.6 Loss Function 29 3.6.1 Middle Focus Mechanism 29 3.6.2 Loss Functions 30 Chapter 4 Experimental Results 34 4.1 Environmental Settings and Datasets 34 4.2 Results of the Proposed S2M System 36 4.2.1 Objective Performance Evaluations 36 4.2.2 Presentation of Experimental Results 38 4.3 Ablation Study 42 4.4 Visualization DConv Offset 47 Chapter 5 Conclusions 49 Chapter 6 Future Work 50 References 51

    [1] C. Fehn, “Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TV,” Proceedings of Stereoscopic Displays and Virtual Reality Systems XI, pp. 93-104, May 2004.
    [2] T. Kanade and M. Okutomi, “A stereo matching algorithm with an adaptive window: theory and experiment,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 9, pp. 920-932, Sept. 1994.
    [3] S. Niklaus, L. Mai and F. Liu, “Video Frame Interpolation via Adaptive Separable Convolution,” Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 261-270, 2017.
    [4] M. Kartašev, C. Rapisarda and D. Fay, “Implementing Adaptive Separable Convolution for Video Frame Interpolation,” ArXiv:1809.07759 [cs.CV], Sep 2018.
    [5] S. Niklaus, L. Mai and O. Wang, “Revisiting Adaptive Convolutions for Video Frame Interpolation,” Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1098-1108, 2021.
    [6] S. Niklaus and F. Liu, “Softmax Splatting for Video Frame Interpolation,” Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5436-5445, 2020.
    [7] S. Gui, C. Wang, Q. Chen and D. Tao, “FeatureFlow: Robust Video Interpolation via Structure-to-Texture Generation,” Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14001-14010, 2020.
    [8] H. Lee, T. Kim, T.-Y. Chung, D. Pak, Y. Ban and S. Lee, “AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation,” Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5315-5324, 2020.
    [9] S. Niklaus and F. Liu, “Context-Aware Synthesis for Video Frame Interpolation,” Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1701-1710, 2018.
    [10] X. Cheng and Z. Chen, “Multiple Video Frame Interpolation via Enhanced Deformable Separable Convolution,” Proceedings of IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
    [11] J. Dai et al., “Deformable Convolutional Networks,” Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 764-773, 2017.
    [12] X. Zhu, H. Hu, S. Lin and J. Dai, “Deformable ConvNets V2: More Deformable, Better Results,” Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9300-9308, 2019.
    [13] Z. Chen, R. Wang, H. Liu and Y. Wang, “PDWN: Pyramid Deformable Warping Network for Video Interpolation,” IEEE Open Journal of Signal Processing, vol. 2, pp. 413-424, 2021.
    [14] Z. Shi, X. Liu, K. Shi, L. Dai and J. Chen, “Video Frame Interpolation via Generalized Deformable Convolution,” IEEE Transactions on Multimedia, vol. 24, pp. 426-439, 2022.
    [15] J. Hu, L. Shen, S. Albanie, G. Sun and E. Wu, “Squeeze-and-Excitation Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 8, pp. 2011-2023, 1 Aug. 2020.
    [16] H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia, “Pyramid Scene Parsing Network,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6230-6239, 2017.
    [17] A. Radford, L. Metz and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” ArXiv:1511.06434 [cs.LG], Nov 2015.
    [18] V. Nair and G.-E. Hinton, “Rectified linear units improve restricted boltzmann machines,” Proceedings of International Conference on Machine Learning, pp.807-814, Jan 2010.
    [19] Q.-N. Tran and S.-H. Yang, “Video frame interpolation via down-up scale generative adversarial networks,” Proceedings of Computer Vision and Image Understanding, vol. 220, 103434, 2022.
    [20] J. Johnson, A. Alahi and F. Li, “Perceptual losses for real-time style transfer and super-resolution,” Proceedings of European Conference on Computer Vision, vol. 9906, pp. 694-711, 2016.
    [21] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” ArXiv:1409.1556 [cs.CV], Sep 2014.
    [22] Q. Chen and V. Koltun, “Photographic Image Synthesis with Cascaded Refinement Networks,” Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 1520-1529, 2017.
    [23] Y.-L. Chang, Z. Y. Liu, K.-Y. Lee and W. Hsu, “Free-Form Video Inpainting With 3D Gated Convolution and Temporal PatchGAN,” Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9065-9074, 2019.
    [24] J.-T. Barron, “A General and Adaptive Robust Loss Function,” Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4326-4334, 2019.
    [25] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and F.-F. Li, “ImageNet: A large-scale hierarchical image database,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 248-255, 2009.
    [26] R. Zhang, P. Isola, A.-A. Efros, E. Shechtman and O. Wang, “The Unreasonable Effectiveness of Deep Features as a Perceptual Metric,” Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 586-595, 2018.
    [27] R. Song, H. Ko and C.-C. Kuo, “MCL-3D: a database for stereoscopic image quality assessment using 2D-image-plus-depth source,” ArXiv:1405.1403 [cs.CV], Mar 2014.
    [28] T. Saito, “Nagoya University Multi-view Sequences Download List,” Nagoya University, Fujii Laboratory, [Online]. Available: http://www.fujii.nuee.nagoya-u.ac.jp/multiview-data/, May 2015.

    下載圖示 校內:2024-08-01公開
    校外:2024-08-01公開
    QR CODE