| 研究生: |
陳彥宏 Chen, Yen-Hong |
|---|---|
| 論文名稱: |
3D物件之姿勢像素對齊的隱式函數 Posture Pixel-aligned implicit function for 3D Object digitization |
| 指導教授: |
王宗一
Wang, Tzone-I |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 中文 |
| 論文頁數: | 32 |
| 中文關鍵詞: | 3D人體重建 、3D人體姿態估計 、綁定骨架 、統一姿勢 |
| 外文關鍵詞: | 3D human body reconstruction, 3D pose estimation, Auto Rigging, Unified pose network |
| 相關次數: | 點閱:229 下載:24 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
為了讓電腦更能理解人類行為、參與人類的生活、與人類互動,讓其能夠獲取人體3D姿態和形狀就顯得尤其重要。通常電腦視覺看到的物體是二維的平面(2D) 影像,因此如何將2D影像轉成3D影像一直都是一個熱門的課題。如果以人工建模,則需耗費大量時間與人力,若是能讓電腦自動生成3D物件,則不但能夠節省大量時間與人力資源,且可讓電腦具有與人類及時互動的能力。在電腦視覺領域裡,類神經網路之發展已日漸完善,各式各樣的網路都有其各自擅長之功能;且個人硬體設備日漸強大,因此讓電腦自動生成3D物件且能與人類及時互動的問題解決,在今日已經成為可能了。
在3D物件重建研究領域,2019年由Shunsuke Saito等人提出使用畫素排列隱性函數(Pixel-Aligned Implicit Function -PIFu)的3D重建方法,由於該類神經網路在記憶體使用上相當有效率、運算速度快、速且僅以單圖便可以實現重建功能等優點,是個相當卓越的類神經網路。儘管如此,該類神經網路所生成物件由於姿勢並未統一,產生電腦自動綁定骨架(Rigging)不易的問題,影響3D重建後物件的組件能夠利用關節點順利移動,因此是個必須克服的問題。本研究重點在於如何將3D物件統一以T姿勢重建,研究成果對於後續電腦能自動綁定骨架將有重大之幫助。
本文提出了姿勢特徵合成人體引式函數,可以還原出原本T-pose的3D角色,證明了姿勢特徵合成隱式函數與T-pose隱式函數相等,且其結果也能夠應用在未來自動綁定骨架上。
To better understand human behavior and interact with them, it is important for a computer to see 3D postures and shapes of human bodies. Usually, objects in computer vision are two-dimensional (2D) flat images. Converting 2D images to 3D objects is always a popular topic. Manually modelling takes a lot of time and manpower. This can be avoided if computer can automatically generate 3D objects and interact with them in real time. Computer vision technologies experienced a rapid advance in artificial intelligence neural network recently. With the strengthening of hardware, creating 3D objects from 2D vision through neural networks automatically in real time, though still a long way to go, has become possible now. Ever since Shunsuke Saito published the PIFu (Pixel-Aligned Implicit Function) in 2019 for 3D object construction in the area. Owing to its low memory consuming, fast processing speed, and able to construct using a single 2D image, it is truly an excellent function for neural network models. However, the postures of the objects constructed by such networks are not uniform, which makes it a problem that it is not easy for the computer to automatically bind skeletons (Rigging) to the objects to let the parts of objects to move freely around the joint points. This is a problem to be overcome to reconstruction a moveable 3D object.
This research focuses on how to construct 3D objects uniformly in a T-Pose. The contribution greatly helps computers to do auto-rigging by the proposed posture feature synthesis object body implicit function that can restore the original T-pose 3D objects and proves that the posture feature synthesis implicit function performs equally well to the T-pose implicit function and its feasibility in applying to objects auto-rigging.
[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, “Resnet: Deep Residual Learning for Image Recognition.” In Computer Vision and Pattern Recognition, CVPR 2015.
[2] Alejandro Newell, Kaiyu Yang, Jia Deng, “Stacked Hourglass Networks for Human Pose Estimation.” In European Conference on Computer Vision, ECCV, 2016.
[3] Yanghua Jin, Jiakai Zhang, Minjun Li, Yingtao Tian, Huachun Zhu, Zhihao Fang, “Towards the Automatic Anime Characters Creation with Generative Adversarial Networks”, Doujinshi in Comiket 92, summer 2017.
[4] M. Sela, E. Richardson, and R. Kimmel. Unrestricted facial geometry reconstruction using image-to-image translation. In Proceedings of the IEEE International Conference on Computer Vision, ICCV, 2017.
[5] Albert Pumarola, Antonio Agudo, Aleix M. Martinez, Alberto Sanfeliu, Francesc Moreno-Noguer, “GANimation: Anatomically-aware Facial Animation from a Single Image”, ECCV 2018.
[6] Gül Varol, Duygu Ceylan, Bryan Russell, Jimei Yang, Ersin Yumer, Ivan Laptev, Cordelia Schmid, “BodyNet: Volumetric Inference of 3D Human Body Shapes.” In European Conference on Computer Vision, ECCV, 2018.
[7] A. Kanazawa, M. J. Black, D. W. Jacobs, and J. Malik. Endto-end recovery of human shape and pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2018.
[8] T. Alldieck, M. A. Magnor, W. Xu, C. Theobalt, and G. Pons-Moll. Video based reconstruction of 3d people models. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2018.
[9] Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Morishima, Angjoo Kanazawa, Hao Li, “PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization.” In proceeding of International Conference on Computer Vision, ICCV, Seoul, Korea, 2019.
[10] Lars Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, Andreas Geiger, “Occupancy Networks: Learning 3D Reconstruction in Function Space.” In Computer Vision and Pattern Recognition, CVPR, 2019.
[11] Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, Steven Lovegrove, “DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation.” In Computer Vision and Pattern Recognition, CVPR, 2019.
[12] D. Xiang, H. Joo, and Y. Sheikh. Monocular total capture: Posing face, body, and hands in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2019.
[13] Y. Xu, S.-C. Zhu, and T. Tung. Denserac: Joint 3d pose and shape estimation by dense render-and-compare. In Proceedings of the IEEE International Conference on Computer Vision, CVPR, 2019.
[14] X. Zeng, X. Peng, and Y. Qiao. Df2net: A dense-finefiner network for detailed 3d face reconstruction. In The IEEE International Conference on Computer Vision (ICCV), October 2019.
[15] G. Pavlakos, V. Choutas, N. Ghorbani, T. Bolkart, A. A. Osman, D. Tzionas, and M. J. Black. Expressive body capture: 3d hands, face, and body from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2019.
[16] B. L. Bhatnagar, G. Tiwari, C. Theobalt, and G. Pons-Moll. Multi-garment net: Learning to dress 3d people from images. In Proceedings of the IEEE International Conference on Computer Vision, ICCV, 2019.
[17] T. Alldieck, M. Magnor, B. L. Bhatnagar, C. Theobalt, and G. Pons-Moll. Learning to reconstruct people in clothing from a single RGB camera. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2019.
[18] Pramook Khungurn, “Talking Head Anime”. https://pkhungurn.github.io/talking-head-anime/
[19] Crossous, “Pmx file format”. https://www.jianshu.com/p/d051639b6aed
[20] “vmd file format”. https://mikumikudance.fandom.com/wiki/VMD_file_format