簡易檢索 / 詳目顯示

研究生: 黎氏玉幸
Le Thi Ngoc Hanh
論文名稱: 靜止影像生成動畫、地圖藝術風格影片產生與影片重新排序而產生新動畫
Animating still images, Map art style video transferring, and Video resequencing
指導教授: 李同益
Lee, Tong-Yee
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 113
中文關鍵詞: 動畫保留曲線扭曲PCW循環扭曲地圖藝術視頻風格轉換MAViNet尋路重新排序SDPF蒸餾
外文關鍵詞: Animating, preserve-curve-warping, PCW, cycle warping, map art, video style transfer, MAViNet, path-finding, resequencing, SDPF, distillation
ORCID: 0000-0001-9667-9780
相關次數: 點閱:83下載:7
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 視頻是一種廣泛使用的媒體形式,由於其能夠傳達運動、模擬現實和吸引觀眾,因此在計算機圖形應用中具有重要意義。 在本論文中,我們對計算機輔助視頻生成的三個應用感興趣:從靜止圖像生成循環視頻、在視頻上傳輸地圖藝術風格以及從任意起始幀重新排序視頻。

    通過向圖像內的對象添加微妙的運動,可以從單個靜止圖像創建循環視頻,從而形成攝影和視頻元素的組合。 雖然現有技術已經成功地製作了此類圖像的動畫,但仍然存在一些缺點需要進一步研究。 其中包括檢索匹配視頻所需的冗長計算時間以及與跨多個區域一致控制所需運動相關的挑戰。 在這項工作中,我們通過引入採用新變形方法的交互式系統來解決這些問題。 我們的方法依靠用戶註釋將運動引入特定對象,並利用兩個不同的階段:保留曲線扭曲和循環扭曲,以生成循環視頻。 通過各種具有挑戰性的實驗和評估,我們證明了我們方法的有效性。 我們說明,我們的系統儘管簡單且輕量,但可以有效地解決靜態圖像動畫的挑戰,從而產生逼真的運動和視覺上吸引人的視頻。 此外,我們提出的系統允許專業知識有限的普通用戶輕鬆製作引人注目的動畫,而不需要視頻數據庫或機器學習模型。

    在改變其風格的同時保留圖像或視頻的內容是評估新的神經風格轉移算法的一個重要方面。 然而,將地圖藝術風格轉移到特定視頻(其中內容由地圖背景和動畫對象組成)提出了重大挑戰。 我們引入了一個綜合系統,可以解決將地圖藝術風格轉移到此類視頻相關的困難。 我們的系統接受三個輸入:任意視頻、地圖圖像和預先存在的地圖藝術圖像。 然後,它會生成藝術視頻,同時確保地圖的功能並保持細節的一致性。 為了應對這一挑戰,我們提出了一種稱為地圖藝術視頻網絡(MAViNet)的新穎網絡,以及定制的目標函數和包含各種動畫內容和不同地圖結構的多樣化訓練集。 我們在具有挑戰性的案例中廣泛評估我們的方法,並與相關工作進行大量比較。 結果表明,我們的方法在視覺質量方面顯著優於最先進的方法,並成功滿足了該研究領域中提到的標準。

    雖然視頻長期以來被認為是一種流行的可視化形式,但其中的動畫序列是觀眾講述故事的一種手段。 創建動畫需要熟練的藝術家付出大量的人力才能在內容和運動方向上實現合理的動畫。 這對於涉及復雜內容、多個移動對象和復雜動作的動畫來說尤其具有挑戰性。 我們研究了一個交互式框架,該框架允許用戶根據他們對起始框架的偏好生成新序列。 與之前的工作和現有商業應用相比,我們的方法的主要區別在於我們的系統產生具有任意起始幀的新穎序列,同時保持內容和運動方向的一致程度。 為了有效地實現這一目標,我們首先利用一個名為 RSFNet 的網絡來學習給定視頻幀集中的特徵相關性。 隨後,我們開發了一種新穎的尋路算法 SDPF,它結合了源視頻中的運動方向知識來估計平滑且合理的序列。 通過大量的實驗,我們證明我們的框架可以在卡通和自然場景中生成新的動畫,超越以前的作品和商業應用,使用戶能夠獲得更可預測的結果。

    Video is a widely used media form that holds significant importance in computer graphics applications due to its ability to convey motion, simulate reality, and engage viewers. In this dissertation, we are interested in computer-aid video generation on three applications: generating looping video from a still image, transferring map art style on video, and resequencing a video from an arbitrary starting frame.

    A looping video can be created from a single still image by adding subtle motion to objects within the image, resulting in a combination of photography and video elements. While existing techniques have been successful in animating such images, there are still some drawbacks that require further investigation. These include the lengthy computation time required to retrieve matched videos and the challenges associated with controlling desired motion across multiple regions consistently. In this work, we address these issues by introducing an interactive system that incorporates a new warping method. Our approach relies on user annotations to introduce motion to specific objects and utilizes two distinct phases: preserve-curve warping and cycle warping, to generate a looping video. Through various challenging experiments and evaluations, we demonstrate the effectiveness of our method. We illustrate that our system, despite its simplicity and lightweight nature, effectively tackles the challenges of animating still images, resulting in realistic motion and visually appealing videos. Moreover, our proposed system allows ordinary users with limited expertise to produce compelling animations easily, without the need for a video database or machine learning models.

    Preserving the content of an image or video while altering its style is a crucial aspect of evaluating a new neural style transfer algorithm. However, transferring a map art style to a specific video, where the "content" consists of a map background and animated objects, presents significant challenges. We introduce a comprehensive system that addresses the difficulties associated with transferring map art style to such videos. Our system takes three inputs: an arbitrary video, a map image, and a pre-existing map art image. It then generates an artistic video while ensuring the functionality of the map and maintaining consistency in details. To tackle this challenge, we propose a novel network called Map Art Video Network (MAViNet), along with tailored objective functions and a diverse training set containing various animated contents and different map structures. We extensively evaluate our method on challenging cases and conduct numerous comparisons with related works. The results demonstrate that our method significantly outperforms state-of-the-art approaches in terms of visual quality and successfully fulfills the mentioned criteria in this research domain.

    While videos have long been recognized as a prevalent form of visualization, the animation sequences within them serve as a means of storytelling for viewers. Creating animations necessitates significant human effort from skilled artists to achieve plausible animation in terms of both content and motion direction. This is particularly challenging for animations that involve complex content, multiple moving objects, and intricate movements. We investigate an interactive framework that allows users to generate new sequences based on their preferences for the starting frame. The key distinction of our approach compared to previous work and existing commercial applications is that our system produces novel sequences with arbitrary starting frames while maintaining a consistent degree of both content and motion direction. To accomplish this effectively, we first utilize a proposed network called RSFNet to learn the feature correlation within the given video's frameset. Subsequently, we develop a novel path-finding algorithm, SDPF, which incorporates the knowledge of motion directions from the source video to estimate smooth and plausible sequences. Through extensive experiments, we demonstrate that our framework can generate new animations in both cartoon and natural scenes, surpassing previous works and commercial applications by enabling users to achieve more predictable results.

    摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . v Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Tables...................................................... viii List of Figures ................................................... ix Chapter 1. Introduction............................................... 1 Chapter 2. Research Background ....................................... 5 2.1. Animating single images . . . . . . . . . . . . . . . . . . . . . 5 2.2. Image/Video style transfer . . . . . . . . . . . . . . . . . . . 7 2.3. Video resequencing . . . . . . . . . . . . . . . . . . . . . . . 8 2.3.1 Feature extraction and dimension reduction . . . . . . . . . . . 8 2.3.2 Images sequence ordering . . . . . . . . . . . . . . . . . . . . 9 Chapter 3. Animating Still Natural Images by Warping................. 11 3.1. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.1 System overview . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.2 Animating region extraction . . . . . . . . . . . . . . . . . . 14 3.1.3 Flow generation . . . . . . . . . . . . . . . . . . . . . . . . 17 3.1.4 Animation generation . . . . . . . . . . . . . . . . . . . . . 21 3.2. Experimental Results . . . . . . . . . . . . . . . . . . . . . . 28 3.2.1 Our results and discussion . . . . . . . . . . . . . . . . . . 28 3.2.2 Qualitative Evaluation . . . . . . . . .. . . . . . . . . . . . 31 3.2.3 Objective evaluation . . . . . . . . . . . . . .. . . . . . . . 34 3.2.4 Ablation study . . . . . . . . . . . . . . . . . . . . . . . . 36 Chapter 4. Structure-aware Video Style Transfer with Map Art ........ 41 4.1. System overview . . . . . . . . . . . . . . . . . . . . . . . . 43 4.2. Map Art Video Network . . . . . . . . . . . . . . . . . . . . . 44 4.2.1 MArt-Encoder (MArt-E) . . . . . . . . . . . . . . . . . . . . . 45 4.2.2 Multi-layer Transformer Module . . . . . . . . . . . . . . . . 46 4.2.3 MArt-Decoder (MArt-D) . . . . . . . . . . . . . . . . . . . . . 47 4.2.4 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.3. Experimental Results . . . . . . . . . . . . . . . . . . . . . . 53 4.3.1 Our results and discussion . . . . . . .. . . . . . . . . . . . 55 4.3.2 Ablation study . . . . . . . . . . . .. . . . . . . . . . . . . 56 4.3.3 Visual comparisons . . . . . . . . . . . . . . . . . . . . . . 60 4.3.4 Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . 62 Chapter 5. Resequencing Video from Arbitrary Starting Frame by Distillation Pathfinding ............................................ 68 5.1. System overview . . . . . . . . . . . .. . . . . . . . . . . . . 71 5.2. Methodology . . . . . . . . . . .. . . . . . . . . . . . . . . . 72 5.2.1 Graph generation with RSFNet . .. . . . . . . . . . . . . . . . 72 5.2.2 RSFNet Structure . . . . . . . . . . . .. . . . . . . . . . . . 73 5.2.3 Learning-Based Euclidean metric . . . . . . . . . . . . . . . . 77 5.2.4 Single-source Distillation path-finding . . . . . . . . . . . . 78 5.3. Experimental Results . . . . . . . . . . . . . . . . . . . . . . 87 5.3.1 Our results and discussion . . . . . . . . . . .. . . . . . . . 88 5.3.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . 90 5.3.3 Comparisons to prior works . . . . . . . . . . . . . . . . . . 93 5.3.4 Ablation Study . . . . . . . . . . . .. . . . . . . . . . . . . 96 Chapter 6. Conclusions and Future Works ............................ 101 References.......................................................... 103

    [1] 2004. url: https://www.adobe.com/products/photoshop.html.
    [2] Jie An, Siyu Huang, Yibing Song, Dejing Dou, Wei Liu, and Jiebo Luo. “ArtFlow:
    Unbiased image style transfer via reversible neural flows”. In: Proceedings of the
    IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, pp. 862–
    871.
    [3] George B Arfken and Hans J Weber. Mathematical methods for physicists. 1999.
    [4] Kiran S Bhat, Steven M Seitz, Jessica K Hodgins, and Pradeep K Khosla. “Flow-based video synthesis and editing”. In: ACM SIGGRAPH 2004 Papers. 2004, pp. 360–363.
    [5] Eva Cetinic and James She. “Understanding and creating art with AI: Review and outlook”. In: ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 18.2 (2022), pp. 1–22.
    [6] Alex J Champandard. “Semantic style transfer and turning two-bit doodles into fine artworks”. In: arXiv preprint arXiv:1603.01768 (2016).
    [7] Dongdong Chen, Jing Liao, Lu Yuan, Nenghai Yu, and Gang Hua. “Coherent online video style transfer”. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, pp. 1105–1114.
    [8] Dongdong Chen, Lu Yuan, Jing Liao, Nenghai Yu, and Gang Hua. “Stylebank: An explicit representation for neural image style transfer”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017, pp. 1897–1906.
    [9] Yi-Wen Chen, Yi-Hsuan Tsai, Chu-Ya Yang, Yen-Yu Lin, and Ming-Hsuan Yang.
    “Unseen object segmentation in videos via transferable representations”. In: Asian Conference on Computer Vision. Springer. 2018, pp. 615–631.
    [10] Yuan Chen, Yang Zhao, Shujie Li, Wangmeng Zuo, Wei Jia, and Xiaoping Liu. "Blind quality assessment for cartoon images”. In: IEEE Transactions on Circuits
    and Systems for Video Technology 30.9 (2019), pp. 3282–3288.
    [11] Yung-Yu Chuang, Dan B Goldman, Ke Colin Zheng, Brian Curless, David H Salesin, and Richard Szeliski. “Animating pictures with stochastic motion textures”. In: ACM SIGGRAPH 2005 Papers. 2005, pp. 853–860.
    [12] Yingying Deng, Fan Tang, Weiming Dong, Haibin Huang, Chongyang Ma, and
    Changsheng Xu. “Arbitrary Video Style Transfer via Multi-Channel Correlation”.
    In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. 2. 2021,
    pp. 1210–1217.
    [13] Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas,
    Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers, and Thomas Brox.
    “Flownet: Learning optical flow with convolutional networks”. In: Proceedings of
    the IEEE international conference on computer vision. 2015, pp. 2758–2766.
    [14] Vincent Dumoulin, Jonathon Shlens, and Manjunath Kudlur. “A learned representation for artistic style”. In: arXiv preprint arXiv:1610.07629 (2016).
    [15] Ed-Fairburn, Original artwork and illustration. 2021. url: https://edfairburn.
    com/.
    [16] Artist Draws Detailed Human Portraits Emerging From Map Contours and Patterns [Interview]. 2021. url: https://mymodernmet.com/map-art-ed-fairburn/.
    [17] Yuki Endo, Yoshihiro Kanamori, and Shigeru Kuriyama. “Animating landscape:
    self-supervised learning of decoupled motion and appearance for single-image video synthesis”. In: ACM Transactions on Graphics (TOG) 38.6 (2019), pp. 1–19.
    [18] Eirik Eng. “Qt GUI Toolkit: Porting graphics to multiple platforms using a GUI
    toolkit”. In: Linux Journal 1996.31es (1996), 2–es.
    [19] William T Freeman, Edward H Adelson, and David J Heeger. “Motion without
    movement”. In: ACM Siggraph Computer Graphics 25.4 (1991), pp. 27–30.
    [20] Ohad Fried, Shai Avidan, and Daniel Cohen-Or. “Patch2vec: Globally consistent image patch representation”. In: Computer Graphics Forum. Vol. 36. 7. Wiley Online Library. 2017, pp. 183–194.
    [21] Alex Gammerman, Volodya Vovk, and Vladimir Vapnik. “Learning by transduction". In: arXiv preprint arXiv:1301.7375 (2013).
    [22] C Gao, D Gu, F Zhang, and Y Yu Reconet. “Real-time coherent video style transfer network”. In: Asian Conference on Computer Vision„, ACCV. Vol. 18. 2018.
    [23] Chang Gao, Derun Gu, Fangjun Zhang, and Yizhou Yu. “Reconet: Real-time coherent video style transfer network”. In: Asian Conference on Computer Vision. Springer. 2018, pp. 637–653.
    [24] Ruohan Gao, Bo Xiong, and Kristen Grauman. “Im2flow: Motion hallucination from static images for action recognition”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, pp. 5937–5947.
    [25] Eduardo SL Gastal and Manuel M Oliveira. “Shared sampling for real-time alpha matting”. In: Computer Graphics Forum. Vol. 29. 2. Wiley Online Library. 2010, pp. 575–584.
    [26] Leon A Gatys, Alexander S Ecker, and Matthias Bethge. “A neural algorithm of
    artistic style”. In: arXiv preprint arXiv:1508.06576 (2015).
    [27] Leon A Gatys, Alexander S Ecker, and Matthias Bethge. “Image style transfer using convolutional neural networks”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, pp. 2414–2423.
    [28] Leon A Gatys, Alexander S Ecker, and Matthias Bethge. “Image style transfer using convolutional neural networks”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, pp. 2414–2423.
    [29] Amy Gooch. Non-Photorealistic Rendering. July 2001. isbn: 9781568811338. doi: 10.1201/9781439864173.
    [30] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. “Deep feedforward networks”. In: Deep learning 1 (2016).
    [31] Gaël Guennebaud, Benoît Jacob, et al. Eigen v3. http://eigen.tuxfamily.org. 2010.
    [32] Yan Gui, Li-zhuang Ma, Chao Yin, and Zhi-hua Chen. “Preserving global features of fluid animation from a single image using video examples”. In: Journal of Zhejiang University SCIENCE C 13.7 (2012), pp. 510–519.
    [33] Agrim Gupta, Justin Johnson, Alexandre Alahi, and Li Fei-Fei. “Characterizing and improving stability in neural style transfer”. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, pp. 4067–4076.
    [34] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep residual learning for image recognition”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, pp. 770–778.
    [35] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Identity mappings in deep residual networks”. In: European conference on computer vision. Springer.
    2016, pp. 630–645.
    [36] Aleksander Holynski, Brian Curless, Steven M Seitz, and Richard Szeliski. “Animating Pictures with Eulerian Motion Fields”. In: arXiv preprint arXiv:2011.15128 (2020).
    [37] Alain Hore and Djemel Ziou. “Image quality metrics: PSNR vs. SSIM”. In: 2010
    20th international conference on pattern recognition. IEEE. 2010, pp. 2366–2369.
    [38] Haozhi Huang, Hao Wang, Wenhan Luo, Lin Ma, Wenhao Jiang, Xiaolong Zhu,
    Zhifeng Li, and Wei Liu. “Real-time neural style transfer for videos”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, pp. 783–791.
    [39] Xun Huang and Serge Belongie. “Arbitrary style transfer in real-time with adaptive instance normalization”. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, pp. 1501–1510.
    [40] Sergey Ioffe and Christian Szegedy. “Batch normalization: Accelerating deep network training by reducing internal covariate shift”. In: International conference on machine learning. PMLR. 2015, pp. 448–456.
    [41] Yang Jiao, Guangming Shi, and Trac D Tran. “Optical Flow Estimation via Motion Feature Recovery”. In: 2021 IEEE International Conference on Image Processing (ICIP). IEEE. 2021, pp. 2558–2562.
    [42] Yongcheng Jing, Yezhou Yang, Zunlei Feng, Jingwen Ye, Yizhou Yu, and Mingli
    Song. “Neural style transfer: A review”. In: IEEE transactions on visualization and
    computer graphics 26.11 (2019), pp. 3365–3385.
    [43] Justin Johnson, Alexandre Alahi, and Li Fei-Fei. “Perceptual losses for real-time style transfer and super-resolution”. In: European conference on computer vision. Springer. 2016, pp. 694–711.
    [44] Christina de Juan and Bobby Bodenheimer. “Cartoon textures”. In: Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation. 2004, pp. 267–276.
    [45] Leslie Pack Kaelbling, Michael L Littman, and Andrew W Moore. “Reinforcement learning: A survey”. In: Journal of artificial intelligence research 4 (1996), pp. 237–285.
    [46] Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. “Analyzing and improving the image quality of stylegan”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, pp. 8110–8119.
    [47] Asif Khan and Amos Storkey. “Hamiltonian Operator Disentanglement of Content and Motion in Image Sequences”. In: arXiv preprint arXiv:2112.01641 (2021).
    [48] Kyung-Rae Kim, Whan Choi, Yeong Jun Koh, Seong-Gyun Jeong, and Chang-Su Kim. “Instance-level future motion estimation in a single image based on ordinal regression”. In: Proceedings of the IEEE International Conference on Computer Vision. 2019, pp. 273–282.
    [49] Diederik P Kingma and Jimmy Ba. “Adam: A method for stochastic optimization”.In: arXiv preprint arXiv:1412.6980 (2014).
    [50] Diederik P Kingma and Jimmy Ba. “Adam: A method for stochastic optimization”.In: arXiv preprint arXiv:1412.6980 (2014).
    [51] Kris M Kitani, Brian D Ziebart, James Andrew Bagnell, and Martial Hebert. “Activity forecasting”. In: European Conference on Computer Vision. Springer. 2012, pp. 201–214.
    [52] Yu-Chi Lai, Bo-An Chen, Kuo-Wei Chen, Wei-Lin Si, Chih-Yuan Yao, and Eugene
    Zhang. “Data-driven npr illustrations of natural flows in chinese painting”. In: IEEE transactions on visualization and computer graphics 23.12 (2016), pp. 2535–2549.
    [53] Xueting Li, Sifei Liu, Jan Kautz, and Ming-Hsuan Yang. “Learning linear transformations for fast image and video style transfer”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, pp. 3809–3817.
    [54] Jing Liao, Yuan Yao, Lu Yuan, Gang Hua, and Sing Bing Kang. “Visual attribute
    transfer through deep image analogy”. In: arXiv preprint arXiv:1705.01088 (2017).
    [55] Zicheng Liao, Neel Joshi, and Hugues Hoppe. “Automated video looping with progressive dynamism”. In: ACM Transactions on Graphics (TOG) 32.4 (2013), pp. 1–10.
    [56] Chih-Yang Lin, Yun-Wen Huang, and Timothy K Shih. “Creating waterfall animation on a single image”. In: Multimedia Tools and Applications 78.6 (2019), pp. 6637–6653.
    [57] Honglin Lin, Mengmeng Wang, Yong Liu, and Jiaxin Kou. “Correlation-based and content-enhanced network for video style transfer”. In: Pattern Analysis and Applications (2022), pp. 1–13.
    [58] Songhua Liu, Tianwei Lin, Dongliang He, Fu Li, Meiling Wang, Xin Li, Zhengxing
    Sun, Qian Li, and Errui Ding. “Adaattn: Revisit attention mechanism in arbitrary
    neural style transfer”. In: Proceedings of the IEEE/CVF International Conference on
    Computer Vision. 2021, pp. 6649–6658.
    [59] Yun Liu, Ming-Ming Cheng, Xiaowei Hu, Kai Wang, and Xiang Bai. “Richer convolutional features for edge detection”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017, pp. 3000–3009.
    [60] Elizaveta Logacheva, Roman Suvorov, Oleg Khomenko, Anton Mashikhin, and Victor Lempitsky. “Deeplandscape: Adversarial modeling of landscape videos”. In: European Conference on Computer Vision. Springer. 2020, pp. 256–272.
    [61] Fujun Luan, Sylvain Paris, Eli Shechtman, and Kavita Bala. “Deep photo style transfer”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017, pp. 4990–4998.
    [62] Wei-Chiu Ma, De-An Huang, Namhoon Lee, and Kris M Kitani. “Forecasting interactive dynamics of pedestrians with fictitious play”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, pp. 774–782.
    [63] Behrooz Mahasseni, Michael Lam, and Sinisa Todorovic. “Unsupervised video summarization with adversarial lstm networks”. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2017, pp. 202–211.
    [64] Charles C Morace, Thi-Ngoc-Hanh Le, Sheng-Yi Yao, Shang-Wei Zhang, and TongYee Lee. “Learning a perceptual manifold with deep features for animation video resequencing”. In: Multimedia Tools and Applications (2022), pp. 1–21.
    [65] Eric N Mortensen and William A Barrett. “Intelligent scissors for image composition”. In: Proceedings of the 22nd annual conference on Computer graphics and interactive techniques. 1995, pp. 191–198.
    [66] Roozbeh Mottaghi, Hessam Bagherinezhad, Mohammad Rastegari, and Ali Farhadi.“Newtonian scene understanding: Unfolding the dynamics of objects in static images”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, pp. 3521–3529.
    [67] Vinod Nair and Geoffrey E Hinton. “Rectified linear units improve restricted boltzmann machines”. In: Icml. 2010.
    [68] Chigozie Nwankpa, Winifred Ijomah, Anthony Gachagan, and Stephen Marshall. “Activation functions: Comparison of trends in practice and research for deep learning”. In: arXiv preprint arXiv:1811.03378 (2018).
    [69] Augustus Odena, Vincent Dumoulin, and Chris Olah. Deconvolution and checkerboard artifacts. Distill (2016). 2016.
    [70] Makoto Okabe, Ken Anjyo, Takeo Igarashi, and Hans-Peter Seidel. “Animating pictures of fluid using video examples”. In: Computer Graphics Forum. Vol. 28. 2. Wiley Online Library. 2009, pp. 677–686.
    [71] Makoto Okabe, Ken Anjyor, and Rikio Onai. “Creating fluid animation from a single image using video database”. In: Computer Graphics Forum. Vol. 30. 7. Wiley Online Library. 2011, pp. 1973–1982.
    [72] Makoto Okabe, Yoshinori Dobashi, and Ken Anjyo. “Animating pictures of water scenes using video retrieval”. In: The Visual Computer 34.3 (2018), pp. 347–358.
    [73] Margarita Osadchy, Yann Le Cun, and Matthew L Miller. “Synergistic face detection and pose estimation with energy-based models.” In: Journal of Machine Learning Research 8.5 (2007).
    [74] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al.
    “Pytorch: An imperative style, high-performance deep learning library”. In: arXiv
    preprint arXiv:1912.01703 (2019).
    [75] 2021. url: https://www.pixaloopapp.com/.
    [76] 2017. url: https://plotaverseapps.com/.
    [77] Jordi Pont-Tuset, Federico Perazzi, Sergi Caelles, Pablo Arbeláez, Alexander SorkineHornung, and Luc Van Gool. “The 2017 DAVIS Challenge on Video Object Segmentation”. In: arXiv:1704.00675 (2017).
    [78] Ekta Prashnani, Maneli Noorkami, Daniel Vaquero, and Pradeep Sen. “A Phase-Based Approach for Animating Images Using Video Examples”. In: Computer Graphics Forum. Vol. 36. 6. Wiley Online Library. 2017, pp. 303–311.
    [79] William H Press, William H Press, Brian P Flannery, Saul A Teukolsky, William T
    Vetterling, Brian P Flannery, and William T Vetterling. Numerical recipes in Pascal:
    the art of scientific computing. Vol. 1. Cambridge university press, 1989.
    [80] Alec Radford, Luke Metz, and Soumith Chintala. “Unsupervised representation learning with deep convolutional generative adversarial networks”. In: arXiv preprint arXiv:1511.06434 (2015).
    [81] Ana Daniela Peres Rebelo, Guedes De Oliveira Inês, and DE Verboom Damion. “The Impact of Artificial Intelligence on the Creativity of Videos”. In: ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 18.1 (2022), pp. 1–27.
    [82] Paul L. Rosin and J. Collomosse. “Image and Video-Based Artistic Stylisation”. In: Computational Imaging and Vision. 2013.
    [83] Manuel Ruder, Alexey Dosovitskiy, and Thomas Brox. “Artistic style transfer for
    videos”. In: German conference on pattern recognition. Springer. 2016, pp. 26–36.
    [84] Manuel Ruder, Alexey Dosovitskiy, and Thomas Brox. “Artistic style transfer for
    videos and spherical images”. In: International Journal of Computer Vision 126.11
    (2018), pp. 1199–1219.
    [85] Arno Schödl and Irfan A Essa. “Controlled animation of video sprites”. In: Proceedings of the 2002 ACM SIGGRAPH/Eurographics symposium on Computer animation. 2002, pp. 121–127.
    [86] Arno Schödl, Richard Szeliski, David H Salesin, and Irfan Essa. “Video textures”. In: Proceedings of the 27th annual conference on Computer graphics and interactive techniques. 2000, pp. 489–498.
    [87] Arno Schödl, Richard Szeliski, David H Salesin, and Irfan Essa. “Video textures”. In: Proceedings of the 27th annual conference on Computer graphics and interactive techniques. 2000, pp. 489–498.
    [88] Ahmed Selim, Mohamed Elgharib, and Linda Doyle. “Painting style transfer for head portraits using convolutional neural networks”. In: ACM Transactions on Graphics (ToG) 35.4 (2016), pp. 1–18.
    [89] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. “Grad-cam: Visual explanations from deep networks via gradient-based localization”. In: Proceedings of the IEEE international conference on computer vision. 2017, pp. 618–626.
    [90] Chiao-Yin Shih, Ya-Hsuan Chen, and Tong-Yee Lee. “Map art style transfer with multi-stage framework”. In: Multimedia Tools and Applications 80.3 (2021), pp. 4279–4293.
    [91] Karen Simonyan and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition”. In: arXiv preprint arXiv:1409.1556 (2014).
    [92] Karen Simonyan and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition”. In: arXiv preprint arXiv:1409.1556 (2014).
    [93] Saurabh Singh and Shankar Krishnan. “Filter response normalization layer: Eliminating batch dependence in the training of deep neural networks”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, pp. 11237–11246.
    [94] Thomas Strothotte and Stefan Schlechtweg. “Non-Photorealistic Computer Graphics: Modeling, Rendering, and Animation”. In: Morgan Kaufmann (Jan. 2002).
    [95] Deqing Sun, Xiaodong Yang, Ming-Yu Liu, and Jan Kautz. “Pwc-net: Cnns for
    optical flow using pyramid, warping, and cost volume”. In: Proceedings of the IEEE
    conference on computer vision and pattern recognition. 2018, pp. 8934–8943.
    [96] Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. “Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, pp. 6924–6932.
    [97] Eric W Weisstein. “Moore neighborhood”. In: From MathWorld–A Wolfram Web Resource. http://mathworld. wolfram. com/MooreNeighborhood. html (2005).
    [98] Wikipedia contributors. Linear interpolation — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Linear_interpolation&
    oldid=986522475. [Online; accessed 8-January-2021]. 2020.
    [99] Cheng Xu, Wei Qu, Xuemiao Xu, and Xueting Liu. “Multi-scale Flow-based Occluding Effect and Content Separation for Cartoon Animations”. In: IEEE Transactions on Visualization and Computer Graphics (2022).
    [100] Kai Xu, Longyin Wen, Guorong Li, Honggang Qi, Liefeng Bo, and Qingming Huang. “Learning self-supervised space-time CNN for fast video style transfer”. In: IEEE Transactions on Image Processing 30 (2021), pp. 2501–2512.
    [101] Takuma Yagi, Karttikeya Mangalam, Ryo Yonetani, and Yoichi Sato. “Future person localization in first-person videos”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, pp. 7593–7602.
    [102] Yi Yang, Yueting Zhuang, Dacheng Tao, Dong Xu, Jun Yu, and Jiebo Luo. “Recognizing cartoon image gestures for retrieval and interactive cartoon clip synthesis”. In: IEEE transactions on circuits and systems for video technology 20.12 (2010), pp. 1745–1756.
    [103] Mao-Chuang Yeh and Shuai Tang. “Improved style transfer by respecting inter-layer correlations”. In: arXiv preprint arXiv:1801.01933 (2018).
    [104] Jun Yu, Dongquan Liu, Dacheng Tao, and Hock Soon Seah. “On combining multiple features for cartoon character retrieval and clip synthesis”. In: IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 42.5 (2012), pp. 1413–1427.
    [105] Jun Yu, Dacheng Tao, Meng Wang, and Jun Cheng. “Semi-automatic cartoon generation by motion planning”. In: Multimedia systems 17.5 (2011), pp. 409–419.
    [106] Jun Yu, Dapeng Tao, Jonathan Li, and Jun Cheng. “Semantic preserving distance metric learning and applications”. In: Information Sciences 281 (2014), pp. 674–686.
    [107] Hang Zhang and Kristin Dana. “Multi-style generative network for real-time transfer”. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops. 2018, pp. 0–0.
    [108] Jian Zhang, Jun Yu, and Dacheng Tao. “Local deep-feature alignment for unsupervised dimension reduction”. In: IEEE transactions on image processing 27.5 (2018), pp. 2420–2432.
    [109] Jin Zhang. Visualization for information retrieval. Vol. 23. Springer Science & Business Media, 2007.
    [110] Lei Zhang, Hua Huang, and Hongbo Fu. “EXCOL: An EXtract-and-COmplete Layering approach to cartoon animation reusing”. In: IEEE transactions on visualization and computer graphics 18.7 (2011), pp. 1156–1169.
    [111] Lingzhi Zhang, Tarmily Wen, and Jianbo Shi. “Deep image blending”. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2020, pp. 231–240.
    [112] Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. “The unreasonable effectiveness of deep features as a perceptual metric”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018, pp. 586–595.
    [113] Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. “The unreasonable effectiveness of deep features as a perceptual metric”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018, pp. 586–595.
    [114] Shang-Wei Zhang, Charles C. Morace, Thi Ngoc Hanh Le, Chih-Kuo Yeh, ShengYi Yao, Shih-Syun Lin, and Tong-Yee Lee. “Animation Video Resequencing with a Convolutional AutoEncoder”. In: SIGGRAPH Asia 2019 Posters. 2019, pp. 1–2.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE