| 研究生: |
陳品諺 Chen, Pin-Yen |
|---|---|
| 論文名稱: |
基於光流引導傳播與擴散模型之視訊內修復 Flow-Based Propagation and Diffusion Model for Video Inpainting |
| 指導教授: |
楊家輝
Yang, Jar-Ferr |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 英文 |
| 論文頁數: | 48 |
| 中文關鍵詞: | 視訊內修復 、光流引導傳播 、去雜機率擴散模型 、影像重繪 、融合框架 |
| 外文關鍵詞: | Video inpainting, Flow-guided propagation, Denoising diffusion probabilistic model (DDPM), RePaint, Hybrid framework |
| 相關次數: | 點閱:16 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
視訊內修復是目前電腦視覺領域的熱門話題。視訊內修復旨在對影片中受損或被遮蔽的區域進行內容填補,為保持視訊內容精確的空間和時間連貫性,其仍然具有挑戰性。傳統利用光流與特徵對齊的演算法雖能透過像素傳遞保留時序穩定性,但面對大面積遮罩仍易出現細節缺失,這凸顯了對具有強大生成能力的模型的需求。近年來,擴散模型因其卓越的性能而成為影像和視訊生成領域的重要技術,但其在對長序列處理時,常因語意飄移導致跨幀不一致。
本研究提出一套基於光流引導傳播和影像重繪去雜機率擴散模型的融合框架,將高精度光流導向的影像傳播機制與擴散模型的生成優勢整合於同一流程。系統透過計算光流將修補結果做像素傳播至全序列,以縮小需生成的未知區域,再用影像重繪去雜機率擴散模型針對缺失區域進行填補。實驗結果顯示,本論文所提的融合框架可同時兼顧視訊內修復之細節與時間一致性。
Video inpainting has emerged as a prominent research problem in computer vision. The task seeks to restore content in corrupted or occluded regions of a video while simultaneously preserving spatial fidelity and temporal coherence, a requirement that remains highly challenging. Classical approaches that rely on optical-flow estimation and feature alignment propagate pixels across frames to maintain temporal stability, yet they often fail to reconstruct fine-grained details under large occlusions, underscoring the need for more powerful generative models. Diffusion models have recently achieved state-of-the-art performance in both image and video synthesis; however, when deployed on long sequences they are prone to semantic drift, resulting in inter-frame inconsistencies.
In this research, we propose a hybrid framework that integrates flow-guided propagation and the RePaint denoising diffusion probabilistic model (DDPM) together. High-precision bidirectional optical flow first propagates the current inpainted content through the entire video sequence, thereby shrinking the unknown regions that require synthesis. Then, the RePaint DDPM then inpaints the remaining defect regions. Simulation results show the proposed hybrid framework could deliver high-fidelity spatial details while maintaining temporal consistency.
[1] Bertalmio, Marcelo, Andrea L. Bertozzi, and Guillermo Sapiro. "Navier-stokes, fluid dynamics, and image and video inpainting." Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001. Vol. 1. IEEE, 2001.
[2] Huang, Jia-Bin, et al. "Temporally coherent completion of dynamic video." ACM Transactions on Graphics (ToG) 35.6 (2016): 1-11.
[3] Gao, Chen, et al. "Flow-edge guided video completion." European Conference on Computer Vision. Cham: Springer International Publishing, 2020.
[4] Ebdelli, Mounira, Olivier Le Meur, and Christine Guillemot. "Video inpainting with short-term windows: application to object removal and error concealment." IEEE Transactions on Image Processing 24.10 (2015): 3034-3047.
[5] Tang, Nick C., et al. "Video inpainting on digitized vintage films via maintaining spatiotemporal continuity." IEEE Transactions on Multimedia 13.4 (2011): 602-614.
[6] Ke, Lei, Yu-Wing Tai, and Chi-Keung Tang. "Occlusion-aware video object inpainting." Proceedings of the IEEE/CVF international conference on computer vision. 2021.
[7] Xu, Rui, et al. "Deep flow-guided video inpainting." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019.
[8] Zhang, Kaidong, Jingjing Fu, and Dong Liu. "Flow-guided transformer for video inpainting." European conference on computer vision. Cham: Springer Nature Switzerland, 2022.
[9] Goodfellow, Ian, et al. "Generative adversarial networks." Communications of the ACM 63.11 (2020): 139-144.
[10] Sohl-Dickstein, Jascha, et al. "Deep unsupervised learning using nonequilibrium thermodynamics." International conference on machine learning. pmlr, 2015.
[11] Lugmayr, Andreas, et al. "Repaint: Inpainting using denoising diffusion probabilistic models." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.
[12] Ho, Jonathan, Ajay Jain, and Pieter Abbeel. "Denoising diffusion probabilistic models." Advances in neural information processing systems 33 (2020): 6840-6851.
[13] Song, Jiaming, Chenlin Meng, and Stefano Ermon. "Denoising diffusion implicit models." arXiv preprint arXiv:2010.02502 (2020).
[14] Fleet, David, and Yair Weiss. "Optical flow estimation." Handbook of mathematical models in computer vision. Boston, MA: Springer US, 2006. 237-257.
[15] Horn, Berthold KP, and Brian G. Schunck. "Determining optical flow." Artificial intelligence 17.1-3 (1981): 185-203.
[16] Zach, Christopher, Thomas Pock, and Horst Bischof. "A duality based approach for realtime tv-l 1 optical flow." Joint pattern recognition symposium. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007.
[17] Bailer, Christian, Bertram Taetz, and Didier Stricker. "Flow fields: Dense correspondence fields for highly accurate large displacement optical flow estimation." Proceedings of the IEEE international conference on computer vision. 2015.
[18] Xu, Jia, René Ranftl, and Vladlen Koltun. "Accurate optical flow via direct cost volume processing." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
[19] Dosovitskiy, Alexey, et al. "Flownet: Learning optical flow with convolutional networks." Proceedings of the IEEE international conference on computer vision. 2015.
[20] Sun, Deqing, et al. "Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
[21] Hui, Tak-Wai, Xiaoou Tang, and Chen Change Loy. "Liteflownet: A lightweight convolutional neural network for optical flow estimation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
[22] Teed, Zachary, and Jia Deng. "Raft: Recurrent all-pairs field transforms for optical flow." European conference on computer vision. Cham: Springer International Publishing, 2020.
[23] Dai, Jifeng, et al. "Deformable convolutional networks." Proceedings of the IEEE international conference on computer vision. 2017.
[24] Wexler, Yonatan, Eli Shechtman, and Michal Irani. "Space-time video completion." Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004.. Vol. 1. IEEE, 2004.
[25] Barnes, Connelly, et al. "PatchMatch: A randomized correspondence algorithm for structural image editing." ACM Trans. Graph. 28.3 (2009): 24.
[26] Kim, Dahun, et al. "Deep video inpainting." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019.
[27] Zeng, Yanhong, Jianlong Fu, and Hongyang Chao. "Learning joint spatial-temporal transformations for video inpainting." European conference on computer vision. Cham: Springer International Publishing, 2020.
[28] Zhang, Francis Xiatian, et al. "Depth-aware endoscopic video inpainting." International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2024.
[29] Mao, Jiafeng, Xueting Wang, and Kiyoharu Aizawa. "Guided image synthesis via initial image editing in diffusion model." Proceedings of the 31st ACM International Conference on Multimedia. 2023.
[30] Saharia, Chitwan, et al. "Palette: Image-to-image diffusion models." ACM SIGGRAPH 2022 conference proceedings. 2022.
[31] Zhou, Bolei, et al. "Places: A 10 million image database for scene recognition." IEEE transactions on pattern analysis and machine intelligence 40.6 (2017): 1452-1464.
[32] Newson, Alasdair, et al. "Video inpainting of complex scenes." Siam journal on imaging sciences 7.4 (2014): 1993-2019.
[33] Lim, Bryan, et al. "Temporal fusion transformers for interpretable multi-horizon time series forecasting." International journal of forecasting 37.4 (2021): 1748-1764.
[34] Shi, Xingjian, et al. "Convolutional LSTM network: A machine learning approach for precipitation nowcasting." Advances in neural information processing systems 28 (2015).