成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳俐均 Chen, Li-Jyun
論文名稱：	基於時空轉換模組之時序協調缺洞填補神經網路 Temporal Consistent Hole Filling Neural Networks with Spatial-Temporal Transformers
指導教授：	楊家輝 Yang, Jar-Ferr
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering
論文出版年：	2022
畢業學年度：	110
語文別：	英文
論文頁數：	46
中文關鍵詞：	基於深度影像生成渲染技術、缺洞填補、深度學習影片修補、限制方向可變形卷積、時空轉換
外文關鍵詞：	DIBR, Hole filling, Learning-based video inpainting, restrictive deformable convolution, spatial temporal transformer
相關次數：	點閱：30 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近年來由於消費習慣的改變，人們開始重視娛樂，追求更高品質及更刺激的視覺體驗，許多3D影像產品應聲而出，像是3D電影業的蓬勃發展及最近興起的AR或VR產品到元宇宙概念，都可以看到3D應用的蹤影。但3D的內容一般難以取得和傳輸，再加上製作時的成本昂貴，因此通常都會採用2D轉換3D的轉換系統來產生3D 內容。在基於深度影像生成渲染技術中，缺洞填補是生成3D虛擬視圖的重要關鍵，由於傳統缺洞填補演算法有耗時過久以及填補內容不夠準確的問題。因此，在本論文提出透過神經網路的方式學習缺失的訊息，在網路中我們加入3D影片的概念透過參考連續時間點的影像，來合成更高精度並具有時間一致性的立體影片。經實驗證明，我們提出的限制方向可變形卷積模塊可有效的加入3D影像的概念強化特徵，以及利用時空轉換模組可有效的搜尋時間和空間上周圍的相似區塊來做填補，相較於現行的方法，能更加精確合成空洞的缺失區域，讓修補的立體影片觀賞起來更具有流暢性。

In recent years, due to changes in consuming habits, people have begun to pay more attention on entertainment and pursue higher quality and more exciting visual experience. The applications of 3D visualization can be applied to 3D movies, AR or VR devices and metaverse. However, these 3D contents are difficult to capture and transmit, and the cost is expensive. The depth image based rendering (DIBR) is the common view synthesis technology to generate virtual views. Hole filling is a key process to determine the quality of generating virtual views. For DIBR, we should solve the problems of the traditional hole filling algorithms, such as long processing time and inaccurate and inconsistent synthetic contents. Therefore, we propose a hole filling network which utilizes the spatial temporal transformer blocks to learn the useful information effectively by searching for missing information in neighboring frames and restrictive deformable convolution block (RDCB) to get more robust features from the characteristic of warped image. Compared with the existing methods, the proposed network can generate more plausible and temporal consistent stereoscopic videos.

摘 要	I
Abstract	II
致 謝	III
Contents	IV
Lists of Tables	VII
List of Figures	VIII
Chapter 1	Introduction	1
1	Research Background	1
2	Motivations	2
3	Briefs of DIBR Systems	4
3.1	3D Warping	4
3.2	Hole Filling	5
4	Thesis Organization	6
Chapter 2	Related Work	8
1	Depth image based Rendering System	8
1.1	3D Warping	8
1.2	Hole Filling	10
2	Deep Learning-based Methods	10
2.1	Learning-based Image Inpainting Methods	11
2.2	Learning-based Video Inpainting Methods	        11
3	Transformer	12
3.1	Multi-head self attention	13
3.2	Feed Forward Network	14
4	Deformable convolution	14
Chapter 3	Proposed Hole Filling Neural Network	16
1	Overview of Proposed Hole filling System	16
2	Backbone Network	17
3	Restrictive Deformable Convolution Block(RDCB)	20
4	Spatial Temporal Transformer Block (STTB)	21
4.1	Lightweight Self Attention (LSA)	23
5	Network Training	24
5.1	Training Mask Generation	25
5.2	Loss Functions	26
Chapter 4	Experimental Results	29
1	Environmental Settings and Dataset	29
2	Comparisons with Other Methods	33
3	Ablation Study	37
3.1	Verification of restrictive deformable convolution block (RDCB)	37
3.2	Verification of the performance of MSA and LSA	38
3.3	The different patch size in patch-to-token and token-to-patch module	38
3.4	Verification of Loss functions	39
4	Practical applications in DIBR systems	40
Chapter 5	Conclusions	42
Chapter 6	Future Work	43
References	44
                                    

[1] C. Fehn, “Depth-image-based rendering (dibr), compression, and transmission for a new approach on 3D-tv,” Proceedings of SPIE 5291, pp. 93–104, 2004.
[2] D. Pathak, P. Krähenbühl, J. Donahue, T. Darrell and A. A. Efros, “Context Encoders: Feature Learning by Inpainting,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2536-2544, 2016.
[3] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu and T. S. Huang, “Generative Image Inpainting with Contextual Attention,” Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5505-5514, 2018.
[4] Han Zhang, Ian Goodfellow, Dimitris Metaxas, Augustus Odena, “Self-attention generative adversarial networks,” Proceedings of International Conference on Machine Learning, ICML 2019. International Machine Learning Society (IMLS), pp. 12744-12753, 2019.
[5] Guilin Liu, Fitsum A. Reda, Kevin J. Shih, Ting-Chun Wang, Andrew Tao, Bryan Catanzaro, “Image inpainting for irregular holes using partial convolutions,” Proceedings of the European Conference on Computer Vision (ECCV), pp. 85-100, 2018.
[6] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu and T. Huang, “Free-Form Image Inpainting With Gated Convolution,” Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4470-4479, 2019.
[7] C. Wang, H. Huang, X. Han, J. Wang, “Video inpainting by jointly learning temporal structure and spatial details,” Proceedings of the AAAI Conference on Artificial Intelligence, pp. 5232-5239, 2019.
[8] R. Xu, X. Li, B. Zhou and C. C. Loy, “Deep Flow-Guided Video Inpainting,” Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3718-3727, 2019.
[9] D. Kim, S. Woo, J. -Y. Lee and I. S. Kweon, “Deep Video Inpainting,” Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5785-5794, 2019.
[10] S. Lee, S. W. Oh, D. Won and S. J. Kim, “Copy-and-Paste Networks for Deep Video Inpainting,” Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4412-4420, 2019.
[11] S. W. Oh, S. Lee, J. -Y. Lee and S. J. Kim, “Onion-Peel Networks for Deep Video Completion,” Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4402-4411, 2019.
[12] Y. Zeng, J. Fu, and H. Chao, “Learning joint spatial-temporal transformations for video inpainting,” Proceedings of European Conference on Computer Vision (ECCV), pp. 528–543, 2020.
[13] D. Simakov, Y. Caspi, E. Shechtman and M. Irani, “Summarizing visual data using bidirectional similarity,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-8, 2008.
[14] C. Barnes, E. Shechtman, A. Finkelstein and D. B. Goldman, “PatchMatch: A randomized correspondence algorithm for structural image editing,” Proceedings of the ACM Trans. on Graphics vol. 28, no. 3, pp. 24, July 2009.
[15] X. Wang, R. Girshick, A. Gupta and K. He, “Non-local Neural Networks,” Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7794-7803, 2018.
[16] A. Dosovitskiy, et al, “An image is worth 16x16 words: Transformers for image recognition at scale,” ArXiv:2010.11929 [cs.CV], 2020.
[17] J. Dai et al., “Deformable Convolutional Networks,” Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 764-773, 2017.
[18] Y. -L. Chang, Z. Y. Liu, K. -Y. Lee and W. Hsu, “Free-Form Video Inpainting With 3D Gated Convolution and Temporal PatchGAN,” Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 9065-9074
[19] R. Liu, et al, “Decoupled spatial-temporal transformer for video inpainting,” ArXiv:2104.06637 [cs.CV], 2021.
[20] L. A. Gatys, A. S. Ecker and M. Bethge, “Image Style Transfer Using Convolutional Neural Networks,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2414-2423, 2016.
[21] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” ArXiv:1409.1556 [cs.CV], 2014.
[22] R. Song, H. Ko, C.-C. J. Kou, “MCL-3D: A Database for Stereoscopic Image Quality Assessment using 2D-Image-Plus-Depth Source,” Journal of Information Science and Engineering,” pp. 1593-1611, 2015.
[23] N. Mayer et al., “A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4040-4048, 2016.
[24] R. Zhang, P. Isola, A. A. Efros, E. Shechtman and O. Wang, “The Unreasonable Effectiveness of Deep Features as a Perceptual Metric,” Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 586-595, 2018.
[25] Y. L. Lee, “Hole Filling Neural Networks for Depth-image-based Rendering,” M. S. Thesis, National Cheng Kung University, Tainan, Taiwan, July 2020.

校內：2024-07-28公開
校外：2024-07-28公開

簡易檢索 / 詳目顯示

相關論文