研究生: |
劉溢茗 Liu, Yi-Ming |
---|---|
論文名稱: |
基於圖片及重要特徵之深度學習產生3D動畫研究 3D Animation based on Deep learning for Input Images and their Important features |
指導教授: |
李同益
Lee, Tong-Yee |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 英文 |
論文頁數: | 37 |
中文關鍵詞: | ACAP 特徵 、深度學習 、關鍵點 、關鍵格 、隱含空間 、內插 、卷積對抗自編碼器 |
外文關鍵詞: | ACAP feature, deep learning, keypoints, keyframe, latent space, interpolation, Convolutional Adversarial Autoencoder |
相關次數: | 點閱:120 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
三維動畫的生成相較於二維動畫困難許多,傳統方法中有不少使用三維骨架 (Skeleton) 的方法,藉由調整骨架 (Skeleton) 對模型作形變,然而大部分的方法不能保證過程中符合人體工學,並且都需要透過人工調整,計算骨架和人力成本的消耗都較高。
近年來已經有不少方法可以從二維圖片重建三維模型,然而若要做成三維動畫,需要將影片中每個幀數都重建模型,需要花費大量時間,如果只挑其中幾個關鍵格 (Keyframe) 重建,再結合內插(Interpolation)的方式,又不能保證中間內插的動畫格為合理的結果。
本篇研究我們提出一個只需要輸入幾張圖片當作關鍵格 (Keyframe) ,將其轉換為合理且流暢三維動畫的方法,輸入數張圖片,由訓練好的神經網路模型幀測特徵點並形成二維骨架 (2D-Skeleton),與三維模型的不同視角截圖進行比較,選出最相像的模型後,轉換成 ACAP 特徵,輸出到下一個用卷積對抗自編碼器 (Convolutional Adversarial Autoencoder) 訓練好的模型,編碼器 (Encoder) 會將 ACAP 特徵轉換成隱含空間向量 (Latent Vectors), 將這些向量作為控制點建圖,再利用 B 樣條 (B-Spline Curve) 繪製路徑,從這個路徑上採樣節點 (Node),可以得到一個序列,將這個序列用解碼器 (Decoder) 解碼成 ACAP 特徵,最後將其轉換回三維模型並渲染 (Render),便可以得到由二維圖片轉換而成的三維動畫。
Generating 3d animation is more challenging then making 2d animation. Traditionally, there are quite a few methods using 3d skeleton to make 3d animation. Deform 3d models by adjusting the skeleton of it is a common way. However, most of them don’t guarantee that the deformation is ergonomic. It usually takes a lot of human resources to do that. Which also consumes a huge amount of cauculation time.
Recently, there are many options to reconstruct 3d models from images. It will take too much time if we reconstruct every frame of a video and make it a 3d animation. Even though we managed to pick some keyframes of the video and using interpolation to make the rest of the frames, we can’t really make sure the result is reasonable.
In this work, we propose a system to generate reasonable and smooth 3d animations base on images. The properly trained neural network detects the keypoints of input images and the different views of captured images from 3d models. We build skeleton from these keypoints and compare that of the input image and captured images from 3d models. After we pick the 3d model with the most similar pose to the input image, we transfer those to ACAP (as consistent as possible) features. Using these as input of a model trained by Convolutional Adversarial Autoencoder, the encoder will transfer it to latent vectors. Then we use them as control points to make a graph. Using B-Spline Curve to smoothen the path and sample points on it. We will have a sequence. After decode it to ACAP features by decoder. We could finally transfer them back to 3d models and render. Therefore, we accomplish the goal to make 3d animations based on images.
[1] Rıza Alp Guler *, Natalia Neverova and Iasonas Kokkinos " DensePose: Dense Human Pose Estimation In The Wild" 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
[2] Au, Oscar Kin-Chung, et al. "Skeleton extraction by mesh contraction. " ACM transactions on graphics (TOG) 27.3 (2008): 1-10
[3] Gao, Lin, et al. "Sparse data driven mesh deformation." IEEE transactions on visualization and computergraphics (2019).
[4] Makhzani, Alireza, et al. " Adversarial autoencoders." arXiv preprint arXiv: 1511.05644 (2015).
[5] Tan, Qingyang, et al. "Variational autoencoders for deforming 3d mesh models." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
[6] K. He, G. Gkioxari, P. Dollar, and R. Girshick. Mask R- ´CNN. In ICCV, 2017.
[7] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS, 2015.
[8] H. -K. Chu and T. -Y. Lee, "Multiresolution Mean Shift Clustering Algorithm for Shape Interpolation," in IEEE Transactions on Visualization and Computer Graphics, vol. 15, no. 5, pp. 853-866, Sept.-Oct. 2009, doi: 10.1109/TVCG.2009.40.
[9] M. Eisenberger et al., "NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One Go," 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 7469-7479, doi:10.1109/CVPR46437.2021.00739.
[10] R. W. Summer and J. Popovic, ‘‘Deformation Transfer for Triangle Meshes,’’Proc. ACM SIGGRAPH ’04, pp.399-405, 2004.
[11] T.-Y. Lee, Y.-S. Wang, and T.-G. Chen, ‘‘Segmenting a Deforming Mesh into Near-Rigid Components,’’ The Visual Computer, vol.22, no.9, pp.729-739, 2006.
[12] Gao, Lin, et al. "Efficient and flexible deformation representation for data-driven surface modeling." ACM Transactions on Graphics (TOG) 35.5 (2016): 1-17.
[13] J. Liu, N. Akhtar, and A. Mian, “Skepxels: Spatio-temporal image
representation of human skeleton joints for action recognition,” CoRR,
vol. 1711.05941v4, pp. 1–12, Nov. 2018.
[14] Carlos Caetano, Francois Bremond, and William Robson Schwartz. 2019. SkeletonImage Representation for 3D Action Recognition Based on Tree Structure and
Reference Joints. In SIBGRAPI Conference on Graphics, Patterns and Images. IEEE,
16–23
[15] Groueix, Thibault, et al. "3d-coded: 3d correspondences by deep deformation." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
[16] Loper, Matthew, et al. "SMPL: A skinned multi-person linear model." ACM transactions on graphics (TOG) 34.6 (2015): 1-16.
[17] Varol, Gul, et al. "Learning from synthetic humans." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
[18] T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick. Microsoft COCO: common objects in context. In ECCV, 2014.
[19] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016
[20] ] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie. Feature pyramid networks for object detection. In CVPR, 2017
[21] R. A. Guler, G. Trigeorgis, E. Antonakos, P. Snape, S. Zafeiriou, and I. Kokkinos. Densereg: Fully convolutional dense shape regression in-the-wild. In CVPR, 2017.
[22] Jiang, Zi-Hang, et al. "Disentangled representation learning for 3D face shape." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
[23] Wu, Qianyi, et al. "Alive caricature from 2d to 3d." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
[24] Jiang, Boyi, et al. "Disentangled Human Body Embedding Based on Deep Hierarchical Neural Network." IEEE Transactions on Visualization and Computer Graphics (2020).
[25] Qi, Charles R., et al. "Pointnet: Deep learning on point sets for 3d classification and segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
[26] Pin-Han Chen. "Keyframe Animation based on Important Features" 2020.
[27] M. Hoffmann and I. Juhasz, "Shape control of cubic B-spline and NURBS curves by knot modifications," Proceedings Fifth International Conference on Information Visualisation, 2001, pp. 63-68, doi: 10.1109/IV.2001.942040.
[28] Computer Graphics and Visualization
[29] D.L. James and C.D. Twigg, “Skinning Mesh Animations,” Proc. ACM SIGGRAPH ’05, pp. 399-407, 2005.