| 研究生: |
吳俊德 Wu, Chun-Te |
|---|---|
| 論文名稱: |
利用自動編碼器與光流對影片進行重新編排 Video Reordering with Optical Flows and Autoencoder |
| 指導教授: |
李同益
Lee, Tong-Yee |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 英文 |
| 論文頁數: | 36 |
| 中文關鍵詞: | 影片重組 、自動編碼器 、光流 、路徑搜尋演算法 |
| 外文關鍵詞: | video resequencing, autoencoder architecture, optical flows, path finding algorithms |
| 相關次數: | 點閱:158 下載:5 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
為了解決影片重新排序的問題,我們提出了一種創新的深度學習框架來生成具有平滑運動的影片。給定一個影片或是一堆無序的圖片集合,一開始先利用我們所提出的神經網路,從圖片或影片裡的每一幀畫面中提取出特徵向量。接著,我們使用特徵向量之間的距離建構出一個完全圖。最後,根據使用者的要求,我們會使用三種不同的路徑搜尋演算法遍歷整個圖來產生影片結果。這些演算法對應於我們框架的三種不同應用:原始視頻重建,中間影格插入和影片重新排序。為了確保生成的影片裡的動作能夠「盡可能的順暢和合理」,我們在路徑搜尋演算法中將光流作為約束條件,並使用我們提出的神經網絡來計算光流之間的差異。實驗結果顯示,我們所提出的網絡在特徵提取方面比先前的研究有更好的表現。影片結果也證明了我們的框架可以適用於多種不同風格的影片或無序圖像集合,包括卡通、動畫、或是真實世界的影片。而我們的影片結果中也不會產生那些在先前研究中所出現的不合理的動作。
To solve the general video resequencing problem, we propose a novel deep learning framework to generate the natural result videos with smooth motion. Given an unordered image collection or a video, we first extract the latent vectors from the images/video frames by a novel architecture we propose. Then, we build a complete graph with the distance between latent vectors. Three different path finding algorithms are used to traverse the graph for producing video sequence results, which correspond to three applications of our framework: original video reconstruction, in-between frames insertion, and video resequencing. To ensure the motion of the resulting videos is “as smooth and reasonable as possible”, we use optical flows as the constraints in the path finding algorithms, and the network architecture we proposed is used to compute the difference of the optical flows. The experimental evaluation demonstrates that our proposed network has better performance than the previous work on the feature extraction, and the appealing result videos also show that our framework can be applied on many styles of videos or unordered image collection, including cartoon and realistic videos without unappealing motion problems in previous study.
[1] O. Fried, S. Avidan, and D. Cohen-Or. “Patch2vec: Globally consistent image patch representation.” In Computer Graphics Forum, volume 36, pages 183–194. Wiley Online Library, 2017,
Available: https://onlinelibrary.wiley.com/doi/abs/10.1111/cgf.13284
[2] J. Yu, D. Tao, J. Li, J. Chen. “Semantic preserving distance metric learning and applications.” Inform. Sci. 281 (2014) 674–686,
Available: http://dx.doi.org/10.1016/j.ins.2014.01.025
[3] Y. Yang, Y. Zhuang, D. Tao, D. Xu, J. Yu, and J. Luo. “Recognizing cartoon image gestures for retrieval and interactive cartoon clip synthesis,” IEEE Trans. Circuits Syst. Video Technol., vol. 20, no. 12, pp. 1745–1756, Dec. 2010.
[4] Alex Gammerman, Volodya Vovk, Vladimir Vapnik. “Learning by transduction.” arXiv preprint, arXiv:1301.7375, 2013
[5] A. SCHo ̈DL, R. SZELISKI, D. H. SALESIN, I. ANDESSA. “Video textures.” Proceedings of SIGGRAPH 2000(July), 489–498. ISBN 1-58113-208-5
[6] L. P. Kaelbling, M. L. Littman, A. W. Moore. “Reinforcement learning: A survey.” J. Artif. Int. Res., vol. 4, no. 1, pp. 237–285, May 1996. [Online].
Available: http://dl.acm.org/citation.cfm?id=1622737.1622748
[7] Jun Yu, Dacheng Tao, Meng Wang. “Semi-automatic cartoon generation
by motion planning.” Multimedia Systems, 17(5):409-419, 2011
[8] Charles C. Morace, Chi-Kuo Yeh, Shang-Wei Zhang, Tong-Yee Lee. “Learning a Perceptual Manifold with Deep Features for Animation Video Resequencing.” Transactions on Visualization and Computer Graphics 2018, Sep. 2018
[9] J. Zhang, J. Yu, and D. Tao. “Local deep-feature alignment for unsupervised dimension reduction.” IEEE Trans. Image Process., vol. 27, no. 5, pp. 2420–2432, May 2018.
[10] M. Osadchy, Y. L. Cun, and M. L. Miller. “Synergistic face detection and pose estimation with energy-based models.” J. Mach. Learn. Res., vol. 8, pp. 1197–1215, May 2007. [Online].
Available: http://dl.acm.org/citation.cfm?id=1248659.1248700
[11] D. Holden, J. Saito, T. Komura, and T. Joyce. “Learning motion manifolds with convolutional autoencoders.” in SIGGRAPH Asia 2015 Technical Briefs, ser. SA ’15. New York, NY, USA: ACM, 2015, pp. 18:1–18:4. [Online].
Available: http://doi.acm.org/10.1145/2820903.2820918
[12] A. Scho ̈dl and I. A. Essa. “Machine learning for video-based rendering.” in Advances in Neural Information Processing Systems 13, T. K. Leen, T. G. Dietterich, and V. Tresp, Eds. MIT Press, 2001, pp. 1002–1008. [Online].
Available: http://papers.nips.cc/paper/1874-machine-learningfor-video-based-rende ring.pdf
[13] A. Scho ̈dl and I. A. Essa. “Controlled animation of video sprites.” in Proceedings of the 2002 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, ser. SCA ’02. New York, NY, USA: ACM, 2002, pp. 121–127. [Online].
Available: http://doi.acm.org/10.1145/545261.545281
[14] Shang-Wei Zhang, Charles C.Morace, Thi Ngoc Hanh Le, Chih-Kuo Yeh, Shih-Syun Lin, Sheng-Yi Yao, Tong-Yee Lee. "Animation Video Resequencing with a Convolutional AutoEncoder." SIGGRAPH Asia 2019, Poster, Brisbane, Australia, Nov. 2019
[15] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang. “The unreasonable effectiveness of deep features as a perceptual metric.” CoRR, vol. abs/1801.03924, 018. [Online].
Available: http://arxiv.org/abs/1801.03924
[16] K. Simonyan and A. Zisserman. “Very deep convolutional networks for large-scale image recognition.” arXiv preprint, arXiv:1409.1556, 2014.
[17] K. He, X. Zhang, S. Ren, and J. Sun, “Identity Mappings in Deep Residual Networks,” CoRR, vol. abs/1603.05027, 2016. [Online].
Available: https://arxiv.org/abs/1603.05027
[18] L. A. Gatys, A. S. Ecker, and M. Bethge. “Image style transfer
using convolutional neural networks.” CVPR, 2016.
[19] G. Huang, Z. Liu, K.Q. Weinberger, L. van der Maaten. “Densely connected convolutional networks.” In: Proceedings of the IEEE conference on computer vision and pattern recognition. vol. 1, p. 3 (2017)
[20] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014. [Online].
Available: http://arxiv.org/abs/1412.6980
[21] D. Sun, X. Yang, M.-Y. Liu, and J. Kautz. “PWC-Net: CNNs
for optical flow using pyramid, warping, and cost volume.” arXiv preprint, arXiv:1709.02371, 2017.