研究生: |
陳俐均 Chen, Li-Jyun |
---|---|
論文名稱: |
基於時空轉換模組之時序協調缺洞填補神經網路 Temporal Consistent Hole Filling Neural Networks with Spatial-Temporal Transformers |
指導教授: |
楊家輝
Yang, Jar-Ferr |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 英文 |
論文頁數: | 46 |
中文關鍵詞: | 基於深度影像生成渲染技術 、缺洞填補 、深度學習影片修補 、限制方向可變形卷積 、時空轉換 |
外文關鍵詞: | DIBR, Hole filling, Learning-based video inpainting, restrictive deformable convolution, spatial temporal transformer |
相關次數: | 點閱:30 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來由於消費習慣的改變,人們開始重視娛樂,追求更高品質及更刺激的視覺體驗,許多3D影像產品應聲而出,像是3D電影業的蓬勃發展及最近興起的AR或VR產品到元宇宙概念,都可以看到3D應用的蹤影。但3D的內容一般難以取得和傳輸,再加上製作時的成本昂貴,因此通常都會採用2D轉換3D的轉換系統來產生3D 內容。在基於深度影像生成渲染技術中,缺洞填補是生成3D虛擬視圖的重要關鍵,由於傳統缺洞填補演算法有耗時過久以及填補內容不夠準確的問題。因此,在本論文提出透過神經網路的方式學習缺失的訊息,在網路中我們加入3D影片的概念透過參考連續時間點的影像,來合成更高精度並具有時間一致性的立體影片。經實驗證明,我們提出的限制方向可變形卷積模塊可有效的加入3D影像的概念強化特徵,以及利用時空轉換模組可有效的搜尋時間和空間上周圍的相似區塊來做填補,相較於現行的方法,能更加精確合成空洞的缺失區域,讓修補的立體影片觀賞起來更具有流暢性。
In recent years, due to changes in consuming habits, people have begun to pay more attention on entertainment and pursue higher quality and more exciting visual experience. The applications of 3D visualization can be applied to 3D movies, AR or VR devices and metaverse. However, these 3D contents are difficult to capture and transmit, and the cost is expensive. The depth image based rendering (DIBR) is the common view synthesis technology to generate virtual views. Hole filling is a key process to determine the quality of generating virtual views. For DIBR, we should solve the problems of the traditional hole filling algorithms, such as long processing time and inaccurate and inconsistent synthetic contents. Therefore, we propose a hole filling network which utilizes the spatial temporal transformer blocks to learn the useful information effectively by searching for missing information in neighboring frames and restrictive deformable convolution block (RDCB) to get more robust features from the characteristic of warped image. Compared with the existing methods, the proposed network can generate more plausible and temporal consistent stereoscopic videos.
[1] C. Fehn, “Depth-image-based rendering (dibr), compression, and transmission for a new approach on 3D-tv,” Proceedings of SPIE 5291, pp. 93–104, 2004.
[2] D. Pathak, P. Krähenbühl, J. Donahue, T. Darrell and A. A. Efros, “Context Encoders: Feature Learning by Inpainting,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2536-2544, 2016.
[3] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu and T. S. Huang, “Generative Image Inpainting with Contextual Attention,” Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5505-5514, 2018.
[4] Han Zhang, Ian Goodfellow, Dimitris Metaxas, Augustus Odena, “Self-attention generative adversarial networks,” Proceedings of International Conference on Machine Learning, ICML 2019. International Machine Learning Society (IMLS), pp. 12744-12753, 2019.
[5] Guilin Liu, Fitsum A. Reda, Kevin J. Shih, Ting-Chun Wang, Andrew Tao, Bryan Catanzaro, “Image inpainting for irregular holes using partial convolutions,” Proceedings of the European Conference on Computer Vision (ECCV), pp. 85-100, 2018.
[6] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu and T. Huang, “Free-Form Image Inpainting With Gated Convolution,” Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4470-4479, 2019.
[7] C. Wang, H. Huang, X. Han, J. Wang, “Video inpainting by jointly learning temporal structure and spatial details,” Proceedings of the AAAI Conference on Artificial Intelligence, pp. 5232-5239, 2019.
[8] R. Xu, X. Li, B. Zhou and C. C. Loy, “Deep Flow-Guided Video Inpainting,” Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3718-3727, 2019.
[9] D. Kim, S. Woo, J. -Y. Lee and I. S. Kweon, “Deep Video Inpainting,” Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5785-5794, 2019.
[10] S. Lee, S. W. Oh, D. Won and S. J. Kim, “Copy-and-Paste Networks for Deep Video Inpainting,” Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4412-4420, 2019.
[11] S. W. Oh, S. Lee, J. -Y. Lee and S. J. Kim, “Onion-Peel Networks for Deep Video Completion,” Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4402-4411, 2019.
[12] Y. Zeng, J. Fu, and H. Chao, “Learning joint spatial-temporal transformations for video inpainting,” Proceedings of European Conference on Computer Vision (ECCV), pp. 528–543, 2020.
[13] D. Simakov, Y. Caspi, E. Shechtman and M. Irani, “Summarizing visual data using bidirectional similarity,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-8, 2008.
[14] C. Barnes, E. Shechtman, A. Finkelstein and D. B. Goldman, “PatchMatch: A randomized correspondence algorithm for structural image editing,” Proceedings of the ACM Trans. on Graphics vol. 28, no. 3, pp. 24, July 2009.
[15] X. Wang, R. Girshick, A. Gupta and K. He, “Non-local Neural Networks,” Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7794-7803, 2018.
[16] A. Dosovitskiy, et al, “An image is worth 16x16 words: Transformers for image recognition at scale,” ArXiv:2010.11929 [cs.CV], 2020.
[17] J. Dai et al., “Deformable Convolutional Networks,” Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 764-773, 2017.
[18] Y. -L. Chang, Z. Y. Liu, K. -Y. Lee and W. Hsu, “Free-Form Video Inpainting With 3D Gated Convolution and Temporal PatchGAN,” Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 9065-9074
[19] R. Liu, et al, “Decoupled spatial-temporal transformer for video inpainting,” ArXiv:2104.06637 [cs.CV], 2021.
[20] L. A. Gatys, A. S. Ecker and M. Bethge, “Image Style Transfer Using Convolutional Neural Networks,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2414-2423, 2016.
[21] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” ArXiv:1409.1556 [cs.CV], 2014.
[22] R. Song, H. Ko, C.-C. J. Kou, “MCL-3D: A Database for Stereoscopic Image Quality Assessment using 2D-Image-Plus-Depth Source,” Journal of Information Science and Engineering,” pp. 1593-1611, 2015.
[23] N. Mayer et al., “A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4040-4048, 2016.
[24] R. Zhang, P. Isola, A. A. Efros, E. Shechtman and O. Wang, “The Unreasonable Effectiveness of Deep Features as a Perceptual Metric,” Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 586-595, 2018.
[25] Y. L. Lee, “Hole Filling Neural Networks for Depth-image-based Rendering,” M. S. Thesis, National Cheng Kung University, Tainan, Taiwan, July 2020.